如何使用 DOM 解析器解析 xhtml 忽略 DOCTYPE 声明

如何使用 DOM 解析器解析 xhtml 忽略 DOCTYPE 声明

问题描述:

我在使用 DOM 解析器解析带有 DOCTYPE 声明的 xhtml 时遇到问题.

I face issue parsing xhtml with DOCTYPE declaration using DOM parser.

错误:java.io.IOException:服务器返回 HTTP 响应代码:URL 为 503:http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd%20

Error: java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd%20

声明:DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

Declaration: DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

有没有办法将 xhtml 解析为 Document 对象而忽略 DOCTYPE 声明.

Is there a way to parse the xhtml to a Document object ignoring the DOCTYPE declaration.

一个对我有用的解决方案是给 DocumentBuilder 一个返回空流的假 Resolver.这里有一个很好的解释(看 kdgregory 的最后一条消息)

A solution that works for me is to give the DocumentBuilder a fake Resolver that returns an empty stream. There's a good explanation here (look at the last message from kdgregory)

http://forums.sun.com/thread.jspa?threadID=5362097

这里是 kdgregory 的解决方案:

here's kdgregory's solution:

documentBuilder.setEntityResolver(new EntityResolver()
        {
            public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException
            {
                return new InputSource(new StringReader(""));
            }
        });