如何在Java中取消对HTML字符实体的转义?

问题描述:

基本上,我想解码给定的HTML文档,并替换所有特殊字符,例如" "-> " "">"-> ">".

Basically I would like to decode a given Html document, and replace all special chars, such as " " -> " ", ">" -> ">".

在.NET中,我们可以使用HttpUtility.HtmlDecode.

In .NET we can make use of HttpUtility.HtmlDecode.

Java中的等效功能是什么?

What's the equivalent function in Java?

我使用了Apache Commons

I have used the Apache Commons StringEscapeUtils.unescapeHtml4() for this:

转义包含实体的字符串 转义到包含 实际的Unicode字符 对应于逃生.技术支持 HTML 4.0实体.

Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.