Scraper stops after 3 pages of a multipage Button pagination - urgent

1,522 views
Skip to first unread message

AJE

unread,
Nov 1, 2015, 3:18:26 AM11/1/15
to Web Scraper
Hi there,

I have the following issue:

general description:
I want to have the data from a directory page. The data is structured in an ul  -> li -> elements structure. I was able to set up the scraper which means that with element preview I see the right elements and with data preview I get the correct per page data with repesct to content and the right number of elements per page. The pagination is a button triggering a java or ajax load of the next page. In sum there are 500 pages.

As it is an URL with login, it does not make sense to post it here in full, but here is the snippet of the sitemap I use:

{"startUrl":"https://my.THELINK/#/","selectors":[{"parentSelectors":["_root"],"type":"SelectorElementClick","multiple":true,"id":"speakers","selector":"li.item.ng-scope","delay":"3000","clickElementSelector":"button.btn:nth-of-type(2)","clickElementUniquenessType":"uniqueText","clickType":"clickOnce","discardInitialElements":false},{"parentSelectors":["speakers"],"type":"SelectorText","multiple":false,"id":"name","selector":"h4.title","delay":"","regex":""}, ----> Here comes the rest of the data per item that looks like the last element type.

the pagination problem:

I was also able to set up the pagination element as you can see above. When I click on data preview it nicely scraps 2 pages. Then when I start to scrap the whole website, the scrapper stops after three pages. I tried various delays and checked the third page for unexpected html syntax changes. I could not find them. I also could not find a solution searching this group.

 

Therefore my questions:


a)      What could be the reason the scraper stops after three pages and what do I have to change to get it scrap the full range of pages.

b)      In case it will not work for about 500 pages, is there an solution to do it stepwise, like 10-20 pages in one scrap and then I run the scrapper starting from page 20, 40, 60 etc. as all scraped data seems to be stored in the same CVS in one session.

 

I hope you can help. Thanks a lot in advance


best regards


Mārtiņš Balodis

unread,
Nov 2, 2015, 12:24:57 PM11/2/15
to AJE, Web Scraper
Hi,
a) Maybe the page is being reloaded after using element click selector. In these cases Element click selector won't work properly.
b) You could split the scraping job by using different start urls. You could also just check whether the scraping couldn't be done with a range url.

--
You received this message because you are subscribed to the Google Groups "Web Scraper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-scraper...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages