Getting past frustrating landing page.

15 views
Skip to first unread message

mitch

unread,
Aug 28, 2016, 11:44:51 AM8/28/16
to scrapy-users
I'm trying to crawl a site that hits you with a "disclaimer.php" page before you can get to any content.  You have to click an "I agree" button that requires a POST and I don't know how to build that into my spider.  Any help would be appreciated!

bruce

unread,
Aug 28, 2016, 3:49:58 PM8/28/16
to scrapy-users
hey mitch.. what's the page??

i have no clue as to the working s of scrapy.

but. if you can basically craft the process using "multiple curl/cookies.. you can get it in scapy with little probs..

the basic steps (without knowing the page)
walk through the target, using broswer some app to capture network traffic
 --firefox (livehttpheaders) as well as firebug/etc..

do the initial curl
 -get the results
 -craft xpath to extract data
 -do curl, with the requisite post/data
  --check results, store all cookies..




On Sun, Aug 28, 2016 at 11:44 AM, mitch <mitchel...@gmail.com> wrote:
I'm trying to crawl a site that hits you with a "disclaimer.php" page before you can get to any content.  You have to click an "I agree" button that requires a POST and I don't know how to build that into my spider.  Any help would be appreciated!

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscribe@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages