Undefined method NoMethodError - Beginner question

57 views
Skip to first unread message

Polly Gannaway

unread,
Oct 16, 2014, 6:03:09 PM10/16/14
to nokogi...@googlegroups.com

Hi there

I'm a rookie and have no idea where to start on this. I'm using Nokogiri to work with some quite 'messy' HTML. I wrote a method to tidy up the messy strings, and it worked fine on its own. But then when I run the whole program, including Nokogiri parsing of the original HTML, I get the following error:

kamer.rb:9:in `normalise_instrumentation': undefined method `split' for #<Nokogiri::XML::NodeSet:0x007f92cb93bfb0> (NoMethodError)

Can anyone explain why this is happening? And suggest a way to stop it??

Thanks so much

Polly


Here is the code:


require 'nokogiri'

require 'open-uri'


def normalise_instrumentation(instrumentation)

messy_array = instrumentation.split('.')

normal_array = []

messy_array.each do |section|

if section =~ /\A\d+\z/

normal_array << section

end

end

return normal_array

end


doc = Nokogiri::HTML(open('http://www.cs.vu.nl/~rutger/vuko/nl/lijst_van_ooit/complete-solo.html'))

table = doc.css('table[summary=works] tr')


work_value = []

work_hash = {}


table.each do |row|

piece = [row.css('td[1]'), row.css('td[2]'), row.css('td[3]')].map { |r|

r.text.strip!

}

work_value = work_value.push(piece)

work_key = normalise_instrumentation(row.css('td[3]'))

work_hash[work_key] = work_value

end


puts work_hash


Jack Royal-Gordon

unread,
Oct 17, 2014, 2:29:30 PM10/17/14
to nokogi...@googlegroups.com
Hi Polly,

I’m guessing that you are trying to use “split” on some text that you extract from an HTML element. If the TD represents that element, add “.content” to the end of the CSS call, as in the following:

piece = [row.css('td[1]’).content, row.css('td[2]’).content, row.css('td[3]’).content].map { |r|

The important thing to remember is that CSS returns a Nokogiri object (representing, in this case, the TD element) and content() returns the text within that element. Also note that if the TD includes additional elements (e.g. div, span, p, etc) the text from within those elements is also part of the content of the TD element.

Hope this helps.

Jack

--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-tal...@googlegroups.com.
To post to this group, send email to nokogi...@googlegroups.com.
Visit this group at http://groups.google.com/group/nokogiri-talk.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages