Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
XML files with entities
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
James Healy  
View profile  
 More options Nov 9 2009, 10:54 pm
From: James Healy <ji...@deefa.com>
Date: Tue, 10 Nov 2009 14:54:28 +1100
Local: Mon, Nov 9 2009 10:54 pm
Subject: XML files with entities
Hi folks,

I'm attempting to process an XML file that follows the ONIX standard[1]
using nokogiri 1.4.

Most files work fine, but I ran into one this morning that had a named
entity (&ndash;) in it, which triggered an exception. See [2] for a
sample XML file, test script and output.

The ONIX spec is defined via a DTD, and if you dig through it there's
~1500 named entities that are permitted. Is there currently any way for
me to stop nokogiri raising an exception on files with entities?

cheers

-- James Healy <ji...@deefa.com>  Tue, 10 Nov 2009 14:51:43 +1100

[1] http://www.editeur.org/15/Previous-Releases/
[2] http://gist.github.com/230595


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aaron Patterson  
View profile  
 More options Nov 9 2009, 11:20 pm
From: Aaron Patterson <aaron.patter...@gmail.com>
Date: Mon, 9 Nov 2009 20:20:59 -0800
Local: Mon, Nov 9 2009 11:20 pm
Subject: Re: [nokogiri-talk] XML files with entities

On Mon, Nov 9, 2009 at 7:54 PM, James Healy <ji...@deefa.com> wrote:

> Hi folks,

> I'm attempting to process an XML file that follows the ONIX standard[1]
> using nokogiri 1.4.

> Most files work fine, but I ran into one this morning that had a named
> entity (&ndash;) in it, which triggered an exception. See [2] for a
> sample XML file, test script and output.

Thank you so much for the sample script!  That makes my life much easier!  :-D

> The ONIX spec is defined via a DTD, and if you dig through it there's
> ~1500 named entities that are permitted. Is there currently any way for
> me to stop nokogiri raising an exception on files with entities?

The only way to get it to stop complaining is by loading the DTD.
Once you load the DTD, then libxml2 will know how to properly deal
with the named entities.

Do you really need to use the Reader API?  It's quite easy to get it
to load the DTD if you're parsing with the DOM api.  I'm not so sure
that is the case with the Reader API.

--
Aaron Patterson
http://tenderlovemaking.com/


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Healy  
View profile  
 More options Nov 9 2009, 11:51 pm
From: James Healy <ji...@deefa.com>
Date: Tue, 10 Nov 2009 15:51:02 +1100
Local: Mon, Nov 9 2009 11:51 pm
Subject: Re: [nokogiri-talk] Re: XML files with entities

Aaron Patterson wrote:
> > The ONIX spec is defined via a DTD, and if you dig through it there's
> > ~1500 named entities that are permitted. Is there currently any way for
> > me to stop nokogiri raising an exception on files with entities?

> The only way to get it to stop complaining is by loading the DTD.
> Once you load the DTD, then libxml2 will know how to properly deal
> with the named entities.

> Do you really need to use the Reader API?  It's quite easy to get it
> to load the DTD if you're parsing with the DOM api.  I'm not so sure
> that is the case with the Reader API.

I need to deal with files that range from < 1kB to > 300Mb, so the DOM
api isn't really the best option. I guess I could go SAX if that helps,
the Reader api just makes things so easy though.

Is there an example of how to load the DTD in the DOM and/or SAX apis?
Maybe with those I can work out if the reader API has similar support.

-- James Healy <ji...@deefa.com>  Tue, 10 Nov 2009 15:48:41 +1100


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aaron Patterson  
View profile  
 More options Nov 13 2009, 9:20 pm
From: Aaron Patterson <aaron.patter...@gmail.com>
Date: Fri, 13 Nov 2009 18:20:58 -0800
Local: Fri, Nov 13 2009 9:20 pm
Subject: Re: [nokogiri-talk] Re: XML files with entities

Hrm...  Not that I know of.  I'm going to have to research this.
Would you mind filing a ticket to research this?  I've been crazy busy
this week (because of RubyConf).  I know I'll forget otherwise, and I
don't want to let my users down.  :-)

--
Aaron Patterson
http://tenderlovemaking.com/


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Healy  
View profile  
 More options Nov 14 2009, 1:55 am
From: James Healy <ji...@deefa.com>
Date: Sat, 14 Nov 2009 17:55:49 +1100
Local: Sat, Nov 14 2009 1:55 am
Subject: Re: [nokogiri-talk] Re: XML files with entities

Aaron Patterson wrote:
> Hrm...  Not that I know of.  I'm going to have to research this.
> Would you mind filing a ticket to research this?  I've been crazy busy
> this week (because of RubyConf).  I know I'll forget otherwise, and I
> don't want to let my users down.  :-)

Done, as ticket #165, although it looks like it may be a dup of #104?

Thanks for the support, I appreciate how responsive you are to queries.

-- James Healy <ji...@deefa.com>  Sat, 14 Nov 2009 17:54:32 +1100


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aaron Patterson  
View profile  
 More options Dec 6 2009, 5:47 pm
From: Aaron Patterson <aaron.patter...@gmail.com>
Date: Sun, 6 Dec 2009 14:47:17 -0800 (PST)
Local: Sun, Dec 6 2009 5:47 pm
Subject: Re: XML files with entities
On Nov 13, 10:55 pm, James Healy <ji...@deefa.com> wrote:

> Aaron Patterson wrote:
> > Hrm...  Not that I know of.  I'm going to have to research this.
> > Would you mind filing a ticket to research this?  I've been crazy busy
> > this week (because of RubyConf).  I know I'll forget otherwise, and I
> > don't want to let my users down.  :-)

> Done, as ticket #165, although it looks like it may be a dup of #104?

> Thanks for the support, I appreciate how responsive you are to queries.

Finally figured it out.  Looks like it's possible with the current
release of nokogiri:

  http://gist.github.com/250477

The only crappy part is that it takes time to load the DTD from the
internets.  If you're willing to perform some superhacks, you can
trick it in to loading the DTD from your filesystem though.

--
Aaron Patterson
http://tenderlovemaking.com/


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2010 Google