javascript processing

113 views
Skip to first unread message

bruce

unread,
Aug 17, 2013, 9:30:15 PM8/17/13
to scrapy-users
Hi.

Trying to get to an understanding, can scrapy process/render dynamic javascript pages, so you can get the resulting text/content of the page?

It appears that it can, but I don't have an actual working example that demonstrates step by step of how this works.

thanks


Jan Wrobel

unread,
Aug 19, 2013, 5:42:39 AM8/19/13
to scrapy...@googlegroups.com
Hey,

To process dynamically generated pages, you can pass a document from
Scrapy to an external JavaScript rendering engine. Some examples of
such approach are here:

http://snipplr.com/view/66998/rendered-javascript-crawler-with-scrapy-and-selenium-rc/
http://snipplr.com/view/66997/rendered-javascript-with-webdrivers/
http://snipplr.com/view/66996/renderedinteractive-javascript-with-gtkwebkitjswebkit/

I haven't done this myself, but I'm sure someone on this list will be
able to help if you run into problems.

Cheers,
Jan
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scrapy-users...@googlegroups.com.
> To post to this group, send email to scrapy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/groups/opt_out.

bruce

unread,
Aug 19, 2013, 8:05:06 AM8/19/13
to scrapy-users, j...@scrapinghub.com
Hey Jan.

Thanks. But I'm really looking to talk to someone who's actaully done this.

I've created test apps that use casperjs/phantomjs to parse the javascript, as phantomjs is hooked into webkit.

So, I'm looking for the actual person whos done this that I can talk to.

thanks


ps jan, are you in the us? do you talk with pablo?

Paul Tremberth

unread,
Aug 19, 2013, 9:06:33 AM8/19/13
to scrapy...@googlegroups.com, j...@scrapinghub.com
You can also have a look at scrapyjs https://github.com/scrapinghub/scrapyjs by Pablo Hoffman

bruce

unread,
Aug 19, 2013, 9:36:00 AM8/19/13
to scrapy-users
ok...

how can i plainly state this.

i'm looking to talk to anyone who's actually used scrapy to process javascript websites.

there.. . i've read a bunch, i'm looking to talk to someone.

thanks

bugbag

unread,
Aug 19, 2013, 1:52:10 PM8/19/13
to scrapy...@googlegroups.com
well, I tried it before, so just sharing. I tried collegeprowler.com, which uses js rendering for retrieving student reviews. As far as i know, the script sends a xmlhttprequest, and you can view that through firebug, Net tab, XHR. you can copy the location, which is actually a long and complicated url with the actual content, and view it through scrapy shell to see if you can actually scrape it. If you are lucky enough, it's in json, so everything is in nice format. from there, you can use python's json library, or some way you like. If you are not lucky, like in bestbuy customer review, it will be a messy embeddedhtml page. but in either case, the content you want to scrape from will be inside there.

It really depends from site to site. i tried goodreads.com as well, which uses ajax. but it turns out you can also retrieve reviews by playing with urls.

hope it helps.
Reply all
Reply to author
Forward
0 new messages