Newbie: How to overcome Javascript "onclick" button to scrape web page?

Terence Ng

unread,

May 7, 2013, 2:28:13 AM5/7/13

to scrapy...@googlegroups.com

This is the link I want to scrape:

http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=MMFU_U

The "English Version" tab is at the upper right hand corner in order to show the English version of the web page.

There is a button I have to press in order to read the funds information on the web page.

<div onclick="AgreeClick()" style="width:200px; padding:8px; border:1px black solid; background-color:#cccccc; cursor:pointer;">Confirmed</div>

And the function of AgreeClick is:

function AgreeClick() {
	var cookieKey = "ListFundShowDisclaimer";
	SetCookie(cookieKey, "true", null);
	Get("disclaimerDiv").style.display = "none";
	Get("blankDiv").style.display = "none";
	Get("screenDiv").style.display = "none";
	//Get("contentTable").style.display = "block";
	ShowDropDown();

How do I overcome this onclick="AgreeClick()" function to scrape the web page?

Anderson Caco

unread,

May 7, 2013, 10:39:25 AM5/7/13

to scrapy...@googlegroups.com

Use the Google Chrome inspector (Ctrl+Shift+J) and inspect the network calls.

http://i.imgur.com/EQ2ybwi.png

2013/5/7 Terence Ng <teren...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

--

Anderson Ferraz
Estagiário, baixista e sniper do time dos CT.
T (71) 3494-3514

C (75) 9924-4330
ca...@jusbrasil.com.br
www.jusbrasil.com.br

Travis Briggs

unread,

May 7, 2013, 11:52:21 AM5/7/13

to scrapy...@googlegroups.com

Actually, I think it's easier than that.

If you look at the page, the div you want (id="contentTable") is already on the page when it loads. They are simply putting a dismissable div over the page with Javascript.

Your spider will have the full source of the page, so you don't need to worry about the pop-up div.

Try this in the scrapy shell:

In [1]: fetch('http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=MMFU_U')

In [2]: hxs.select('//table//table//table//td[@class="fundPriceCell1"]//text()').extract()[:5]

Out[2]:

[u'06/05/2013',

u'0.1102%\n ',

u'02/05/2013',

u'0.1102%\n ',

u'29/04/2013']

Is that the information you want?

-Travis

Reply all

Reply to author

Forward