Failure in Node.xpath C extension

16 views
Skip to first unread message

Uncle Jefferson

unread,
Apr 2, 2012, 10:37:53 AM4/2/12
to nokogiri-talk

Quick question. I need to debug the evaluate method of Node (c
extension) in Nokogiri.

We have a 'randomly' occurring failure in that method when parsing
Primo documents that makes it return an empty array after an apache/
passenger restart....but only sometimes.

Is the simple way to put some debug statements in the C code and
rebuild the gem, or is there some switch that I can use to turn on
some debugging inside the C code. I read over it and did not see
anything that jumped out at me, but I am not a C programmer.

Any help or shoves in the right direction would be most appreciated.

Thanks for your time this morning,

Tim

Mike Dalessio

unread,
Apr 2, 2012, 10:50:09 AM4/2/12
to nokogi...@googlegroups.com
Howdy,

Sorry to hear you're having issues!

My advice would be, before you try to debug C extensions in a live production passenger instance, to log all the relevant inputs when the query "fails" for you. Try to determine whether all your inputs are what you expect -- the document, the xpath query, etc.

If you determine that your inputs are all correct, then you've probably most of the way to a reproducible test case that you can send to the list for investigation.

Hope this helps! Thanks for using Nokogiri.

-m
---
mike dalessio / @flavorjones


Tim

unread,
Apr 2, 2012, 11:20:58 AM4/2/12
to nokogi...@googlegroups.com
Morning Mike,

Before I post to the list and make a fool out of myself here is a little more detail.

Fedora core 12 ( yes, its old but also same failures on fc 15 and fc 16 )

Rails 3.0.6

Passenger 3.0.11

Apache 2.2.15

We use Nokogiri ( 1.5.0 ) although I have tried 1.5.1 and 1.5.2...same result.

We call Nokogiri thru HappyMapper 0.4.1.

All inputs are correct, they have been verified by several sets of eyes. When a document fails it is in the initial parse of the root and its first child level. We are parsing documents from Primo ( academic articles ) pulled using open-uri parse/read methods. I did notice using open-uri's 'open' method returns an IO object that could cause an issue like this as the cursor would be pointing to that object unless closed, but that does not apply here.

Here is the first few line of the xml doc:
<sear:SEGMENTS xmlns:prim="http://www.exlibrisgroup.com/xsd/primo/primo_nm_bib" xmlns:sear="http://www.exlibrisgroup.com/xsd/jaguar/search">
  <sear:JAGROOT>
    <sear:RESULT>
      <sear:QUERYTRANSFORMS/>
      <sear:FACETLIST ACCURATE_COUNTERS="true">
        <sear:FACET COUNT="20" NAME="creator">
          <sear:FACET_VALUES VALUE="2" KEY="Nederkoorn, Chantal"/>
          <sear:FACET_VALUES VALUE="1" KEY="Wilson, Dermot J."/>
          <sear:FACET_VALUES VALUE="18" KEY="Shulman, Martha Rose"/>
          <sear:FACET_VALUES VALUE="3" KEY="Gerber, Leah"/>
          <sear:FACET_VALUES VALUE="37" KEY="Bittman, Mark"/>
          <sear:FACET_VALUES VALUE="2" KEY="Trizac, E."/>
          <sear:FACET_VALUES VALUE="2" KEY="Carlson, Richard W."/>

The xpath being passed into Node.xpath is .//sear/JAGROOT.

The failure ONLY occurs after restarting Apache/touching Passenger. If it fails initially, it always fails...if it works it always works until we restart. We have tried reloading instead of restarting Apache, not touching Passenger, etc, etc....This issue only effects the article rendering of our site...the other 90% runs flawlessly so if its a config issue then it is very selective about what angers it. The exact same document will always work or always fail depending on the first try after restart. I have MD5 hashed the docs stored in /tmp from open-uri and they are exactly the same.

The final kicker is I have only one indication of error and it is from the ruby logs. Nothing in /var/log/messages, apache error logs or any other log you can think of for that matter. Just the ruby NoMethodError for trying to call an object method (current_page=) on the empty array returned on failure.

Based on the ramblings above, what info would be useful to post back to the group for help.

Thanks again for any time you spend on this,

Tim

--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To post to this group, send email to nokogi...@googlegroups.com.
To unsubscribe from this group, send email to nokogiri-tal...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nokogiri-talk?hl=en.

Tim

unread,
Apr 2, 2012, 1:29:36 PM4/2/12
to nokogi...@googlegroups.com
The xpath I provided was incorrect, typo on my part.

We are passing .//sear:JAGROOT and NOT .//sear/JAGROOT.. A thousand pardons for that one. 

Also, we dont have the libxml gem installed.

I will remove happymapper from the equation and get back to the list with what I find. Thank you for your very rapid responses, much appreciated.

Tim
---------- Forwarded message ----------
From: Aaron Patterson <aaron.p...@gmail.com>
Date: Mon, Apr 2, 2012 at 1:14 PM
Subject: Re: [nokogiri-talk] Failure in Node.xpath C extension
To: nokogi...@googlegroups.com
Are you sure this XPath is correct?  Is happy mapper munging the XPath
before it goes to nokogiri?

The reason I ask is because this particular XPath should return 0
results because it's not adding a namespace to the JAGROOT tag.  Here
is an example:

 https://gist.github.com/2285238


> The failure ONLY occurs after restarting Apache/touching Passenger. If it
> fails initially, it always fails...if it works it always works until we
> restart. We have tried reloading instead of restarting Apache, not touching
> Passenger, etc, etc....This issue only effects the article rendering of our
> site...the other 90% runs flawlessly so if its a config issue then it is
> very selective about what angers it. The exact same document will always
> work or always fail depending on the first try after restart. I have MD5
> hashed the docs stored in /tmp from open-uri and they are exactly the same.
>
> The final kicker is I have only one indication of error and it is from the
> ruby logs. Nothing in /var/log/messages, apache error logs or any other log
> you can think of for that matter. Just the ruby NoMethodError for trying to
> call an object method (current_page=) on the empty array returned on
> failure.
>
> Based on the ramblings above, what info would be useful to post back to the
> group for help.

If you can remove happy mapper from the picture, and isolate this as a
nokogiri issue, I think it would help!  Just in case, you don't happen
to have the libxml gem installed and activated, do you?

Thanks.

--
Aaron Patterson
http://tenderlovemaking.com/

Aaron Patterson

unread,
Apr 2, 2012, 1:14:49 PM4/2/12
to nokogi...@googlegroups.com
On Mon, Apr 2, 2012 at 8:20 AM, Tim <i.am.tim...@gmail.com> wrote:

Are you sure this XPath is correct? Is happy mapper munging the XPath


before it goes to nokogiri?

The reason I ask is because this particular XPath should return 0
results because it's not adding a namespace to the JAGROOT tag. Here
is an example:

https://gist.github.com/2285238

> The failure ONLY occurs after restarting Apache/touching Passenger. If it


> fails initially, it always fails...if it works it always works until we
> restart. We have tried reloading instead of restarting Apache, not touching
> Passenger, etc, etc....This issue only effects the article rendering of our
> site...the other 90% runs flawlessly so if its a config issue then it is
> very selective about what angers it. The exact same document will always
> work or always fail depending on the first try after restart. I have MD5
> hashed the docs stored in /tmp from open-uri and they are exactly the same.
>
> The final kicker is I have only one indication of error and it is from the
> ruby logs. Nothing in /var/log/messages, apache error logs or any other log
> you can think of for that matter. Just the ruby NoMethodError for trying to
> call an object method (current_page=) on the empty array returned on
> failure.
>
> Based on the ramblings above, what info would be useful to post back to the
> group for help.

If you can remove happy mapper from the picture, and isolate this as a

Reply all
Reply to author
Forward
0 new messages