如何在Java中取消对HTML字符实体的转义?
问题描述:
基本上,我想解码给定的HTML文档,并替换所有特殊字符,例如" "
-> " "
,">"
-> ">"
.
Basically I would like to decode a given Html document, and replace all special chars, such as " "
-> " "
, ">"
-> ">"
.
在.NET中,我们可以使用HttpUtility.HtmlDecode
.
In .NET we can make use of HttpUtility.HtmlDecode
.
Java中的等效功能是什么?
What's the equivalent function in Java?
答
I have used the Apache Commons StringEscapeUtils.unescapeHtml4() for this:
转义包含实体的字符串 转义到包含 实际的Unicode字符 对应于逃生.技术支持 HTML 4.0实体.
Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.