Any plans for Reader#next and expand?

15 views
Skip to first unread message

szimek

unread,
Jun 20, 2009, 3:42:28 PM6/20/09
to nokogiri-talk
Hi,

are there any plans for implementing Reader#next and Reader#expand
methods? Right now I'm using libxml-ruby for parsing files that are
actually huge lists of similar XML nodes. I'm simply running:

begin
node = reader.expand
import_data_to_db(node)
end while reader.next != 0

so I don't have to load whole XML file into memory, but I get all
benefits of Node methods.

The problem with libxml-ruby is that it seems that Reader#expand leaks
memory (or maybe I'm just using it incorrectly), but nokogiri seems
only to have Reader#read method.

BTW. Does nokogiri work with ruby 1.9.1? For simple test cases in
libxml-ruby I get huge performance boost over 1.8.6, but unfortunately
for more complex ones I get lots of segfaults as well.

Aaron Patterson

unread,
Jun 20, 2009, 4:16:08 PM6/20/09
to nokogi...@googlegroups.com
On Sat, Jun 20, 2009 at 12:42 PM, szimek <szi...@gmail.com> wrote:

Hi,

are there any plans for implementing Reader#next and Reader#expand
methods? Right now I'm using libxml-ruby for parsing files that are
actually huge lists of similar XML nodes. I'm simply running:

begin
 node = reader.expand
 import_data_to_db(node)
end while reader.next != 0

so I don't have to load whole XML file into memory, but I get all
benefits of Node methods.

The problem with libxml-ruby is that it seems that Reader#expand leaks
memory (or maybe I'm just using it incorrectly), but nokogiri seems
only to have Reader#read method.

Nokogiri expands the node for you automatically, so you should be able to change your code to do this:

reader.each do |node|
  import_data_to_db node
end
 
BTW. Does nokogiri work with ruby 1.9.1? For simple test cases in
libxml-ruby I get huge performance boost over 1.8.6, but unfortunately
for more complex ones I get lots of segfaults as well.

Yes, we've always targeted 1.9, though (for my sanity) we only support the most recent patch level.  If something doesn't work with 1.9, we consider that to be a bug.  :-)

--
Aaron Patterson
http://tenderlovemaking.com/

szimek

unread,
Jun 20, 2009, 4:54:12 PM6/20/09
to nokogiri-talk
Thanks for quick answer.

> Nokogiri expands the node for you automatically, so you should be able to
> change your code to do this:
>
> reader.each do |node|
>   import_data_to_db node
> end

But in the docs it's written that node in the code above is an
instance of Nokogiri::XML::Reader, not Nokogiri::XML::Node, so I can't
really call node.children or node.xpath on this object, right? Besides
nokogiri's Reader#each uses Reader#read, which reads *every* node in
the XML file, while Reader#next (used in my example) goes to the next
node on the same level, so I can iterate only over list elements,
skipping all nodes inside them.

Eljay

unread,
Aug 9, 2017, 6:29:16 PM8/9/17
to nokogiri-talk
+1 on the missing Reader#next. It allows for a much faster scanning of large XML files. A similar piece of code written with Ruby libxml on the one hand (which support the next method) and Nokogiri using the step by step Reader#each method show a 4 to 5x performance improvement

Mike Dalessio

unread,
Aug 11, 2017, 2:38:18 AM8/11/17
to nokogiri-talk
Hi there! Thanks for providing your thoughts.

However, since you're replying to an email thread that's eight years old, I would really appreciate if you could provide more context. Can you share your code? Can you help me understand your use case?

Thank you!


--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-talk+unsubscribe@googlegroups.com.
To post to this group, send email to nokogi...@googlegroups.com.
Visit this group at https://groups.google.com/group/nokogiri-talk.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages