I've got an XML file with nodes that each have a unique id attribute.
Something like:
<doc>
<node id="This id is Unique">
...
</node>
<node id="At John's House">
...
</node>
<node id=""What Did You Do in the War, Daddy?"">
...
</node>
</doc>
Say all the id's have been pulled out into an array, and I want to
find the node with a particular id.
I can use
xml.xpath("//node[@id='#{id_array[0]}']").first
to find the first node, but this will break with the second node
because of the apostrophe in the id attribute. (The error is
Nokogiri::XML::XPath::SyntaxError: Invalid predicate)
I can escape the query to
xml.xpath("//node[@id=\"#{id_array[1]}\"]").first
which finds the second node, but this breaks with the third node
because of the " in the id attribute. If I change the string
variable from "\"What Did You Do in the War, Daddy?\"" to ""What
Did You Do in the War, Daddy?"" then it doesn't find the node.
The workaround (which may be a clue as to why the xpath isn't working)
is to turn the nodeset into an array and use find:
xml.search("node").to_a.find { |n| n["id"]==id_array[2] }
This will work for all 3 id attributes, but is probably slower than
the straight xpath query would be.
Thoughts?
FWIW, you shouldn't have to call to_a on the NodeSet. Calling find
should work just fine.
> Thoughts?
Basically, what this problem boils down to is that XPath strings kind
of suck. You're not allowed to mix single and double quotes in XPath
string literals, so the only escaping scheme that works is by using
the XPath concat() function. Below is some code demonstrating how you
can generate escaped literals. Whether it's faster than just doing it
in Ruby is questionable though. :-/
doc = Nokogiri::XML DATA
doc.xpath('//node').map { |x| x['id'] }.each do |id|
###
# Split on single quotes, then join the list with an escaped
single quote.
# Strings containing double quotes will be surrounded by a
single quote.
# The split string ("'") will be surrounded by double quotes.
#
# This will be passed to concat() which requires more than one
argument, so
# we add an extra empty string in case there was nothing to
escape and only
# one string was returned.
escape = "'#{id.split("'").join("', \"'\", '")}', ''"
path = "//node[@id=concat(#{escape})]"
puts "Using xpath: #{path.inspect}"
p id => doc.at(path)['id']
end
__END__
<doc>
<node id="This id is Unique">
...
</node>
<node id="At John's House">
...
</node>
<node id=""What Did You Do in the War, Daddy?"">
...
</node>
</doc>
Hope that helps!
---
Aaron Patterson
http://tenderlovemaking.com