Missing Characters

50 views
Skip to first unread message

Ryan

unread,
Nov 4, 2020, 6:38:59 PM11/4/20
to nokogiri-talk
I have a <td> element that I'm trying to pull an address from. The address is formatted like "123 Fake St.<br>City, State 65432" but when I use the nokogiri.css().text method I get back "123 Fake St.City, State 65432" without the <br>.

Why doesn't text return tags? Should I be using a different parsing strategy or passing another argument to get the full cell data or something?

Thanks in advance for any insight you can provide.

Walter Lee Davis

unread,
Nov 4, 2020, 10:26:20 PM11/4/20
to nokogiri-talk
If you want the entire HTML content of the element, you need to use the inner_html method. text returns just the textual content, as you've seen, by design.

Walter
> --
> You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-tal...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/nokogiri-talk/e0587fd7-cabc-4780-a765-e8594dc91da9n%40googlegroups.com.

Mike Dalessio

unread,
Nov 5, 2020, 6:40:56 AM11/5/20
to nokogiri-talk
Here's a demonstration of the methods you can use to serialize a document node in a few different ways:

#! /usr/bin/env ruby

require "nokogiri"

html = <<~EOF
<html>
  <body>
    <div>123 Fake St.<br>City, State 65432</div>
  </body>
</html>
EOF

doc = Nokogiri::HTML(html)

address = doc.at_css("div")

address.inspect # => "#<Nokogiri::XML::Element:0x78 name=\"div\" children=[#<Nokogiri::XML::Text:0x3c \"123 Fake St.\">, #<Nokogiri::XML::Element:0x50 name=\"br\">, #<Nokogiri::XML::Text:0x64 \"City, State 65432\">]>"

address.content # => "123 Fake St.City, State 65432"
address.text # => "123 Fake St.City, State 65432"
address.inner_text # => "123 Fake St.City, State 65432"

address.to_s # => "<div>123 Fake St.<br>City, State 65432</div>"
address.to_html # => "<div>123 Fake St.<br>City, State 65432</div>"
address.to_xml # => "<div>123 Fake St.<br/>City, State 65432</div>"
address.to_xhtml # => "<div>123 Fake St.<br />City, State 65432</div>"

address.inner_html # => "123 Fake St.<br>City, State 65432"

Ryan Faegre

unread,
Nov 5, 2020, 1:12:43 PM11/5/20
to nokogi...@googlegroups.com
Thank you so much! Just what I needed!

You received this message because you are subscribed to a topic in the Google Groups "nokogiri-talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nokogiri-talk/XoKIzgbcBpU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nokogiri-tal...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nokogiri-talk/CAGJbjKbR43d8X3Nput3m6FtsSb8ie0qyw5jWP8QA2bO4w_52Lg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages