Questions: Nokogiri::HTML(page) returned HTML/XML tree like stuff rather then HTML

23 views
Skip to first unread message

Sonrisa

unread,
Oct 29, 2017, 9:42:07 PM10/29/17
to nokogiri-talk
Hi everyone!!

I'm completely new ruby and nokogiri. I’m trying to scrape some data from a website, and I was following this tutorial: https://www.distilled.net/resources/web-scraping-with-ruby-and-nokogiri-for-beginners/ 

and here is my code: 

require 'HTTParty'
require 'Nokogiri'
require 'JSON'
require 'Pry'
require 'csv'


parse_page = Nokogiri::HTML(page)

review_titles = []

Pry.start(binding)
 
when I run these in terminal, I would get :

#(Document:0x3fe1e5c39c5c {

  name = "document",

  children = [

    #(DTD:0x3fe1e5cccebc { name = "html" }),

    #(Element:0x3fe1e5ced4b4 {

      name = "html",

      attributes = [ #(Attr:0x3fe1e5cec2f8 { name = "xmlns", value = "http://www.w3.org/1999/xhtml" }), #(Attr:0x3fe1e5cec2e4 { name = "lang", value = "en-US" })],

      children = [

        #(Text "\n"),

        #(Element:0x3fe1e5d1c55c {

          name = "head",

          attributes = [ #(Attr:0x3fe1e5d01860 { name = "profile", value = "http://gmpg.org/xfn/11" })],

          children = [

            #(Text "\n  "),

            #(Element:0x3fe1e54ad68c {

              name = "meta",

              attributes = [ #(Attr:0x3fe1e5cf9b60 { name = "http-equiv", value = "X-UA-Compatible" }), #(Attr:0x3fe1e5cf9b4c { name = "content", value = "IE=edge" })]

              }),

            #(Text "\n  "),


instead of HTML like thing....
I was wondering what did I do wrong...
Please let me know what should I change! Thank you so much!!

btw I'm running on mac, and I have:

gem -v 2.6.14

xcode-select version 2347.


Walter Lee Davis

unread,
Oct 29, 2017, 10:57:54 PM10/29/17
to nokogi...@googlegroups.com
This looks correct so far. You got a Nokogiri Document from the remote page. Now, what do you want to do with that document? Usually, the use-case for Nokogiri is to search the page for particular elements (by either CSS or XPath), and then do things with those elements.

What are you trying to do?

Walter

Reply all
Reply to author
Forward
0 new messages