not that i'm using scrapy, but i'm not sure that scrapy can handle javascript.
can it guys?
i would imagine that you'd have to use one of the headless node
processing apps, like htmlunit.
basically, you want to emulate the browser function, without the
browser overhead.
> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.
>
>
Pablo.
it's more than just firing up firebug/firefox, and inspecting the javascript.
for simple cases, that approach will work. but if you're doing serious
crawling where you're dealing with potentially dynamic javascript
that's returning data from a backend/database, you can't anticipate,
so you can't "hardcode" in your app based on what you see in firebug
when you examine it. in these cases you really need to get into using
a headless browser function which more or less repilcates/emulates
some of the browser functionality. this then gives you the ability to
"run" the javascript in the same way a browser would.
you'd basically do the analysis of the site/page you're looking to
parse, set it up using the headless (htmlunit) all, and then call the
htmlunit app from the crawling app, returning the data back to the
crawling python app..
hope this makes sense..
the
Very nice tips about the __VIEWSTATE parameter.
I've taken the opportunity to add it to the FAQ, with a link to your spider:
http://doc.scrapy.org/0.10/faq.html#what-s-this-huge-cryptic-viewstate-parameter-used-in-some-forms
Pablo.
Thank you very much for your post. I looked through your code at (http://github.com/
AmbientLighter/rpn-fas), but am having trouble understanding where you get around the postback problem in file "rnp.py" (I also don't speak Russian but am trying to understand http://rnp.fas.gov.ru/Default.aspx). I was wondering if you might do me the favor of explaining to me how exactly you solved it.
Again, thank you.
Shuro
Thanks for your reply. I understand (I think) what you mean and have a slightly more complex problem. The webpage I am interested in has 2 "submit" fields:-
1. javascript:return WebForm_OnSubmit() requires a numeric value (year)
and
2. javascript:__doPostBack('DG_Name$ctl03$ctl00',''). When you look at this in Firebug, it appears to do a POST operation with some arguments (in this case 'DG_Name$ctl03$ctl00' and '').
I need to do the 2nd option while not activating the first one. Is it your experience that both 1) and 2) are POST operations, just with different arguments?
Thanks very much for your help.