Greetings. I am always very picky about adding new dependencies,
specially XML parsers. I have been in situations when two libraries I
need need two different XML parser implementations.
This time I understand that the external Xerces parser is being used in
order to guarantee that some features are supported and to be sure the
parser is properly secured, correct me if I am mistaken. The embedded
Oracle and OpenJDK JVM use a copy of Xerces, and I know this is not
guaranteed to be always true, another JVM can use another parser (IBM?),
the default parser could change, or there is another default installed
on the class path. JAXP already provide a way to know if the parser
support a feature. Why not try to use the Java default parser and if
there is no support for those features, try to use Xerces. Something like:
> trait SecurityHelpers {
> import SecurityHelpers._
>
> def secureXML: XMLLoader[Elem] = {
> val parserFactory =
> if (useDefaultParser) {
> newDefaultFactory()
> } else {
> val f = SAXParserFactory.newInstance(
> "org.apache.xerces.jaxp.SAXParserFactoryImpl",
> SecurityHelpers.getClass.getClassLoader)
> setupFactory(f)
> }
>
> val saxParser = parserFactory.newSAXParser();
> XML.withSAXParser(saxParser)
> }
> }
>
> object SecurityHelpers extends SecurityHelpers {
> private lazy val useDefaultParser: Boolean = {
> try {
> // used only for testing feature compatibility
> newDefaultFactory()
> true
> } catch {
> case e: SAXNotRecognizedException =>
> // TODO Log default JVM parser is not enough
> false
> }
> }
>
> private def newDefaultFactory() = setupFactory(SAXParserFactory.newInstance())
>
> private def setupFactory(parserFactory: SAXParserFactory) = {
> parserFactory.setNamespaceAware(false)
> parserFactory.setFeature("
http://xml.org/sax/features/external-general-entities", false);
> parserFactory.setFeature("
http://xml.org/sax/features/external-parameter-entities", false);
> parserFactory.setFeature("
http://apache.org/xml/features/disallow-doctype-decl", true)
> parserFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
> parserFactory
> }
> }
this way, people like me that run on a JVM that support those features,
can avoid shipping another parser, and allows us to be sure we use the
same parser for everything. In case someone run Lift on a JVM that don't
support those features, they can ship Xerces too, and problem solved. It
will not be possible to run Lift with a parser that doesn't understand
those features, because a SAXNotRecognizedException will be thrown.