Parsing tables with unknown numbers of rows

106 views
Skip to first unread message

David Tannenbaum

unread,
Jan 2, 2012, 2:26:05 AM1/2/12
to nokogiri-talk
I am trying to extract information from a bunch of tables in an HTML
file. The tables are all structured the same way, but I don't know how
many rows each table has. This is leading to an error, because I don't
know how many times to run my loop. Similarly, I don't know how many
tables there are in a given file, which is also causing an error. Here
is my current code:

y = 6

0.upto(1) do |x|
agency_name = @doc.css("table")[y+4].css("tr")[0].css("td")
[1].inner_text

1.upto(3) do | inner_count |
if @doc.css("table")[y+5].css("tr")[inner_count].css("td")
[1].inner_text then
person = {}
[:title, :name, :phone, :email].each_with_index do |element,
index|
node = @doc.css("table")[y+5].css("tr")
[inner_count].css("td")[index]
if node
person[element] = node.inner_text
end
end
puts format_output(person, agency_name)
end
end

y = y + 4

end

The 1.upto(3) leads to this error:

23:in `block (3 levels) in trip': undefined method `css' for
nil:NilClass (NoMethodError)

The 0.upto(1) isn't leading to an error right now because I happen to
be working on a file that does have 2 tables.

I tried addressing this with "if @doc.css("table")[y+5].css("tr")
[inner_count].css("td")[1].inner_text" but that's not helping.

Walter Lee Davis

unread,
Jan 3, 2012, 12:43:52 PM1/3/12
to nokogi...@googlegroups.com

You might be better served by running this in a loop, and iterating as many times as you have source:

<table>
<tr>
<td>foo</td>
<td>bar</td>
<td>baz</td>
</tr>
</table>

doc.css('td').each do |td|
td.whatever()
end

That way it doesn't matter at all how many of these you have. Naturally, you can nest these loops, so you could have N number of trs and N number of tds within each tr, and so forth.

Walter

>
> --
> You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
> To post to this group, send email to nokogi...@googlegroups.com.
> To unsubscribe from this group, send email to nokogiri-tal...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/nokogiri-talk?hl=en.
>

Reply all
Reply to author
Forward
0 new messages