I'm attempting to process an XML file that follows the ONIX standard[1]
using nokogiri 1.4.
Most files work fine, but I ran into one this morning that had a named
entity (–) in it, which triggered an exception. See [2] for a
sample XML file, test script and output.
The ONIX spec is defined via a DTD, and if you dig through it there's
~1500 named entities that are permitted. Is there currently any way for
me to stop nokogiri raising an exception on files with entities?
cheers
-- James Healy <ji...@deefa.com> Tue, 10 Nov 2009 14:51:43 +1100
[1] http://www.editeur.org/15/Previous-Releases/
[2] http://gist.github.com/230595
Thank you so much for the sample script! That makes my life much easier! :-D
> The ONIX spec is defined via a DTD, and if you dig through it there's
> ~1500 named entities that are permitted. Is there currently any way for
> me to stop nokogiri raising an exception on files with entities?
The only way to get it to stop complaining is by loading the DTD.
Once you load the DTD, then libxml2 will know how to properly deal
with the named entities.
Do you really need to use the Reader API? It's quite easy to get it
to load the DTD if you're parsing with the DOM api. I'm not so sure
that is the case with the Reader API.
--
Aaron Patterson
http://tenderlovemaking.com/
I need to deal with files that range from < 1kB to > 300Mb, so the DOM
api isn't really the best option. I guess I could go SAX if that helps,
the Reader api just makes things so easy though.
Is there an example of how to load the DTD in the DOM and/or SAX apis?
Maybe with those I can work out if the reader API has similar support.
-- James Healy <ji...@deefa.com> Tue, 10 Nov 2009 15:48:41 +1100
Hrm... Not that I know of. I'm going to have to research this.
Would you mind filing a ticket to research this? I've been crazy busy
this week (because of RubyConf). I know I'll forget otherwise, and I
don't want to let my users down. :-)
Done, as ticket #165, although it looks like it may be a dup of #104?
Thanks for the support, I appreciate how responsive you are to queries.
-- James Healy <ji...@deefa.com> Sat, 14 Nov 2009 17:54:32 +1100