App hangs while erroring out after nokigiri parses large xml file?

16 views
Skip to first unread message

ke...@kevinkuchta.com

unread,
Feb 26, 2017, 8:09:32 PM2/26/17
to nokogiri-talk
I noticed some weird behavior while using nokogiri to parse a ~300mb xml file: when there's an error in the code, ruby hangs for about 3 minutes before exiting.  Here's a bare-bones example that reproduces this for me:

require 'nokogiri'


class Foo
 
def open_it
   
@doc = File.open('enwikivoyage-latest-pages-articles.xml') { |f| Nokogiri::XML(f) }
    puts
"about to exit"
    whatever
 
end
end


Foo.new.open_it


In this example I open the file, prints something, then trigger a NameError (since `whatever` doesn't exist).  What happens is that "about to exit" prints after a few seconds, then the program hangs for 3 mintues before finally exiting:

$ time ruby test.rb
about to
exit
test
.rb:7:in `open_it': undefined local variable or method `foobar' for #<Foo:0x007fcbbca35a68> (NameError)
 from test.rb:11:in `<main>'

ruby test
.rb  169.80s user 6.98s system 98% cpu 2:58.66 total

This example uses a largish xml dump from https://dumps.wikimedia.org/enwikivoyage/latest/.  The same issue showed up when I googled around for "large example xml file" and tried this one: http://www.ins.cwi.nl/projects/xmark/Assets/standard.gz.

So far this only seems to happen when I try to reference an undefined variable.  If I replace `whatever` in the example code with `exit`, `raise 'foo'`, or even `raise NameError.new` the program exits within a few seconds:

$ time ruby test.rb
about to
exit
ruby test
.rb  4.46s user 0.55s system 99% cpu 5.020 total

Any idea what's up with this?  Is it a (minor) bug, or am I doing something wrong?  I'm using ruby 2.4.0p0 with nokogiri 1.7.0.1, although I see the same issue with 2.3.1p112 and 1.7.0.

Mike Dalessio

unread,
Feb 27, 2017, 10:23:33 AM2/27/17
to nokogiri-talk
Hi Kevin,

Thanks for asking this question.

The file you're parsing is, if I'm looking at the right one, 317MB of XML data. I was able to reproduce what you're seeing.

It's not clear what's going on here, but I don't think it has to do with Nokogiri. I noticed if `@doc` is changed to a local variable, e.g. `doc`, then this does not occur.

It may be worth asking on ruby-talk what's going on. It could be something like `did_you_mean` that ends up looking in the local object scope for potential call sites, but I don't claim to understand the implementation details of `did_you_mean`.

Sorry I can't be of more help. Maybe someone else on this list has an idea?



--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-talk+unsubscribe@googlegroups.com.
To post to this group, send email to nokogi...@googlegroups.com.
Visit this group at https://groups.google.com/group/nokogiri-talk.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages