Having trouble getting the right values out of nowplayingpodcast.com

5 views
Skip to first unread message

Bradley Watson

unread,
Sep 2, 2016, 8:00:40 AM9/2/16
to nokogiri-talk
Upfront disclaimer: I am green as newbies come, so please take that into consideration. 

I am trying to pull text and links from http://www.nowplayingpodcast.com/archives.htm. When I attempt to pull the table values with the class of even or odd, I get no return values. The code I'm using is:
movie = page.css('.odd')

That is using this code: 
page = Nokogiri::HTML(open("http://www.nowplayingpodcast.com/archives.htm"))

Here's where I think I'm going wrong: when a visitor first goes to that page, the podcasts are broken down into categories, and a pull down menu is provided where one can choose to see the whole list. So I'm guessing that's javascript affecting the options. Does that mean that the links are not on that page until that JS option is chosen, or are they there, but hidden? Should I not be messing with scraping a site with a javascript menu backbone?

Walter Lee Davis

unread,
Sep 2, 2016, 8:05:14 AM9/2/16
to nokogi...@googlegroups.com
Nokogiri cannot run JS on the page -- it's only concerned with what is in the HTML, not the DOM as manipulated by JS. There are other tools that can do this -- anything that runs a headless WebKit, maybe Mechanize, whatever is in Capybara... -- those can manipulate the DOM and may get you closer to your desired result.

Walter
> --
> You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-tal...@googlegroups.com.
> To post to this group, send email to nokogi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/nokogiri-talk.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages