XHML Parsing & Javascript

Stephen Turner

unread,

Jun 13, 2011, 2:09:49 PM6/13/11

to Flying Saucer Users

Hello,

We are using FS to produce PDF versions of XHTML pages in a web app
with some success.

Where we are getting tripped up is with pages that have inline
javascript (in a <script> tag) that uses '&&'. This is causing the XML
parser used in the PDF rendering to throw an exception:

org.xml.sax.SAXParseException: The entity name must immediately follow
the '&' in the entity reference.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)

The FS userguide states that "we ignore script tags", but apparently
the XML parser does not. Is there a way to exclude the contents of
script tags from the XML parsing?

Thanks,
Steve

Patrick Wright

unread,

Jun 13, 2011, 2:29:40 PM6/13/11

to flying-sa...@googlegroups.com

Hi Steve

Unfortunately, this is out of our hands. Flying Saucer basically
starts from a DOM Document instance. As a convenience to our users, we
provide basic integration of the XML reader APIs that ship with the
JDK, so that either the JDK XML parser, or another one implementing
those APIs, can be used. But if there are problems with the parsing
you need to figure this out outside of FS. My guess is you should use
a very forgiving parser, perhaps Neko or TagSoup, see if it skips over
your problem, and then use some XSL magic to wipe the script element
from the input.

When we have a DOM Document we do indeed skip any script elements in
that Document.

Sorry I don't have a better answer for you.

Best,
Patrick

Askar Kalykov

unread,

Jun 13, 2011, 10:54:59 PM6/13/11

to flying-sa...@googlegroups.com

&& isn't valid xml content.

if xhtml content is under your control, you might to wrap script tag content into CDATA ( <![CDATA[ script_here ]]> ), this is valid way to escape xml unfriendly character sequences.

Stephen Turner

unread,

Jun 14, 2011, 3:22:07 PM6/14/11

to Flying Saucer Users

Thanks for the fast response! I suspected as much.

The CDATA approach did work - thanks for that suggestion.

Thanks,
Steve

Reply all

Reply to author

Forward