Categories
Uncategorized

SAX EntityResolver

I was trying to resolve entities (&weirdChar;) in an XML file. Easy enough, use a validating parser. But here’s the tricky bit: get the entity definitions from the classpath. This should still be easy, as SAX provides an EntityResolver.

Unfortunately, the interactions between JAXP and SAX make life complicated. I found that you have to ignore the SAXParser (from JAXP) and instead focus on the XMLReader interface (part of plain old SAX).

This is what I came up with. First, a small driver.

  public void parseIt() {
    SAXParserFactory spf = SAXParserFactory.newInstance();
    spf.setValidating(true);
    XMLReader reader = spf.newSAXParser().getXMLReader();
    reader.setEntityResolver(new MyResolver());
    // Look for test.xml on the classpath.
    InputStream testXmlStream = App.class.getClassLoader().getResourceAsStream("test.xml");
    reader.parse(new InputSource(testXmlStream));
  }

That references the EntityResolver implementation I wrote:

  class MyResolver implements EntityResolver2 {
    public InputSource resolveEntity(String name, String publicId, String baseURI, String systemId)
      throws SAXException, IOException {
      InputStream stream = getClass().getClassLoader().getResourceAsStream(systemId);
      return new InputSource(stream);
    }
  }

Actually, I had to use EntityResolver2 for reasons I don’t entirely understand.

On top of this, I found that I had to include xerces 2.8 explicitly as a dependency. The version bundled with Java 1.5 is Xerces 2.6.2, which has a bug: It passes the entity resolver an absolutized systemId. Which makes it very difficult to resolver further. What a pain in the arse.

But it does now work, and I can successfully resolve entities off the classpath.