Hello,
I posted my first post on ScraperWiki a couple days ago and got a great help from some people. I was trying to scrape a Serbian .aspx webpage with __doPostBack. It has contracts that the government there has entered into. They won't release the raw data,
Pall Himarsson gave me some python code that worked fine ... for the first 15 pages. Then I need to jump from the start page to the second page. A web user would get there by clicking on the "..." that takes the user to page 16.
BUT the scraper won't get be past page 16. The site generates an error saying the page the scraper has sought does not exist. I can't figure out how to make the scraper jump to the next series of pages (16-30)
Any thoughts? Thanks in advance.
Here's Pall's code:
import lxml.html
import requests
s = requests.session() # create a session object
r1 = s.get(starturl) #get page 1
html = r1.text
#process page one
root = lxml.html.fromstring(html)
#pick up the javascript values
EVENTVALIDATION =
root.xpath('//input[@name="__EVENTVALIDATION"]')[0].attrib['value']
#find the __EVENTVALIDATION value
VIEWSTATE = root.xpath('//input[@name="__VIEWSTATE"]')[0].attrib['value']
#find the __VIEWSTATE value
# build a dictionary to post to the site with the values we have
collected. The __EVENTARGUMENT can be changed to fetch another result
page (3,4,5 etc.)
payload = {'__EVENTTARGET': 'ctl00$ContentPlaceHolder3$grwIzvestaji',
'__EVENTARGUMENT':
'Page$2','referer':'
http://portal.ujn.gov.rs/Izvestaji.aspx','__EVENTVALIDATION':EVENTVALIDATION,'__VIEWSTATE':VIEWSTATE,'ctl00$txtUser':'','ctl00$txtPass':'','ctl00$ContentPlaceHolder1$txtSearchIzvestaj':'','__VIEWSTATEENCRYPTED':''}
# post it
r2 =
s.post(starturl, data=payload)
# our response is now page 2
print r2.text