searchable.rb error

18 views
Skip to first unread message

Luciano Soares

unread,
Oct 26, 2020, 6:09:29 AM10/26/20
to nokogiri-talk

Hi folks!

I have an only-5-lines code above project,

require ‘Nokogiri’
require ‘open-uri’
cagometro = Nokogiri::HTML(URI.open(“http://www.cagometro.com/”))
cagadas = cagometro.xpath("")
puts cagadas

And I’m getting the error

Traceback (most recent call last):
4: from parser.rb:4:in <main>' 3: from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/nokogiri-1.10.10-x64-mingw32/lib/nokogiri/xml/searchable.rb:154:in xpath’
2: from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/nokogiri-1.10.10-x64-mingw32/lib/nokogiri/xml/searchable.rb:179:in xpath_internal' 1: from C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/nokogiri-1.10.10-x64-mingw32/lib/nokogiri/xml/searchable.rb:198:in xpath_impl’
C:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/nokogiri-1.10.10-x64-mingw32/lib/nokogiri/xml/searchable.rb:198:in `evaluate’: ERROR: Invalid expression: (Nokogiri::XML::XPath::SyntaxError)

Since the error is mentioning searchable.rb in nokogiri project, and is not mentioning any line of my own code, I’m thinking that this can be a nokogiri’s bug?

There is something that I can do by my side to avoid this?

Thanks in advance!

Mike Dalessio

unread,
Oct 26, 2020, 6:16:16 AM10/26/20
to nokogiri-talk
Hi! Thanks for asking this question.

If you read the error carefully, you'll see that the library is raising an exception:

> ERROR: Invalid expression: (Nokogiri::XML::XPath::SyntaxError)

--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-tal...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nokogiri-talk/41484ffc-c8ad-47b3-b927-6c17ecb366f7n%40googlegroups.com.

Wayne Brissette

unread,
Oct 26, 2020, 7:30:27 AM10/26/20
to nokogi...@googlegroups.com
To add a bit more context around what Mike stated.

Right now you have an invalid xpath. Here is how you would parse all the divs on that page both using CSS and XPATH.

require
"nokogiri"
require "open-uri"

doc = Nokogiri::HTML(URI.open("http://www.cagometro.com/"))


doc.css('div').each do |node|
  puts node.text
end


doc.xpath('.//div').each do |node|
  puts node.text
end

This walks through each div on the page and print out the text contents of it (there are 12 div nodes on that page). Nothing fancy. Now, let's say we want only H2s that fall within a div do this:

require "nokogiri"
require "open-uri"

doc = Nokogiri::HTML(URI.open("http://www.cagometro.com/"))
puts 'From css:'

doc.css('div > h2').each do |node|
  puts node.text
end

puts 'From xpath:'

doc.xpath('.//div/h2').each do |node|
  puts node.text
end


Results from both:

>From css:
O contador de cagadas oficial do Governo Bolsonaro
Confira todas as cagadas mês a mês:
>From xpath:
O contador de cagadas oficial do Governo Bolsonaro
Confira todas as cagadas mês a mês:


2 nodes total.

Finally, if you want to try to find an attribute just use:
require "nokogiri"


require "open-uri"

doc = Nokogiri::HTML(URI.open("http://www.cagometro.com/"))


puts 'From css:'

doc.css('div > h2[@class="site-description"]').each do |node|
  puts node.text
end

puts 'From xpath:'

doc.xpath('.//div/h2[@class="site-description"]').each do |node|
  puts node.text
end
Again nothing fancy here, and the results are:

>From css:
O contador de cagadas oficial do Governo Bolsonaro
>From xpath:
O contador de cagadas oficial do Governo Bolsonaro


There are a lot of various sites including the Nokogiri site which walks you through a bunch of examples similar to this and I highly recommend you spend some time walking through their examples. I find for most things CSS to be easier, but there are times xpath works better for me personally. I will warn you though, xpath support in everything short of saxon, is limited to version 1.0 of the language. One of my colleagues thinks it's possible to hook in the 2.0 or 3.0 support, but it requires some major surgery and we've not needed it since Ruby can do many of the things already that XSLT 2.0 and 3.0 added.

-Wayne

Reply all
Reply to author
Forward
0 new messages