Website with pagination...

797 views
Skip to first unread message

Time Cop

unread,
Dec 14, 2014, 9:25:45 PM12/14/14
to web-s...@googlegroups.com
tell me how I can start extracting from website that has many pages please.

ex: 1, 2, 3, 4, 5, 6, 7, 8, 9 10, 11, 12

tyvm

Scott

unread,
Dec 15, 2014, 12:03:16 AM12/15/14
to web-s...@googlegroups.com
Please refer  to the tutorial for pagination.  Alternately if the URL has standard page numbers like 1, 2, 3, etc. you can put in the URL with a number range.  For example www.blahblahblah.com/page[1-100]  This would scrape the pages in URL blahblahblah for pages 1 - 100.

Time Cop

unread,
Dec 15, 2014, 11:45:10 AM12/15/14
to web-s...@googlegroups.com
Url looks like this: viewforum.php?f=3&topicdays=0&start=0

Note: the second 0 changes (last), NOT the first in the url.

Note: The numbers are at the bottom and the top of the page, both change the page, but I tried using both in the script, the page changes if I click 'Data Preview' button, but after it changes one page, I get this error:

Uncaught ReferenceError: utilityFn is not definedshared.js:5 (anonymous function)

How to resolve?

Thanks! 

Scott

unread,
Dec 16, 2014, 12:09:24 AM12/16/14
to web-s...@googlegroups.com
Please post your sitemap.

Time Cop

unread,
Dec 16, 2014, 9:00:46 AM12/16/14
to web-s...@googlegroups.com
I would like to apply pagination to my scraper sitemap, but I am having alot of difficulty. I have done the following so far, but I believe I am not connecting the selectors together correctly. My goal: To scrape a website (noted below) for the titles (text) and links separated on multiple pages (pagination). So far, I have been able to use a text selector to select all of the text titles on the first page, but cannot go on to multiple pages. Can you help me get only the titles from the rest of the pages? Here's the sitemap to get only the titles:

{"startUrl":"http://www.warez-bb.org/viewforum.php?f=3","selectors":[{"parentSelectors":["_root"],"type":"SelectorText","multiple":true,"id":"Application Title","selector":"div.list-rows:nth-of-type(n+14) a.topictitle","regex":"","delay":""}],"_id":"apps"}

Thanks!

Time Cop

unread,
Dec 16, 2014, 11:54:55 AM12/16/14
to web-s...@googlegroups.com
As for the links, I will first get the text with pagination instructions from you, if after that i still cannot figure out how to get the links with pagination, I will get back with you. I appreciate any further feedback from you!

Thanks!

Mārtiņš Balodis

unread,
Dec 23, 2014, 3:50:33 AM12/23/14
to Time Cop, web-scraper
Hi,
Try watching the pagination tutorial. Basically you just need to use link selector to select pagination links. All the selectors that are made as a child selector to the Link selector will be executed in the page where the Link selector is leading to.


On Tue, Dec 16, 2014 at 6:54 PM, Time Cop <time...@gmail.com> wrote:
As for the links, I will first get the text with pagination instructions from you, if after that i still cannot figure out how to get the links with pagination, I will get back with you. I appreciate any further feedback from you!

Thanks!

--
You received this message because you are subscribed to the Google Groups "Web Scraper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-scraper...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Mārtiņš Balodis

unread,
Mar 2, 2015, 3:30:21 AM3/2/15
to Emrah Er, web-scraper
Hi,
The notation for that looks like this - [0-1000:20] . Here is the documentation: 

On Mon, Mar 2, 2015 at 8:21 AM, Emrah Er <emra...@gmail.com> wrote:
I have pagination buttons like;

1 2 3 4 5 6 7 8 9 10 20 30 40 50 NEXT 

In each page I have 20 links and every page has urls like (starting from page 1); 


When I select pagination links, scraper jumps to page 50 then to 48, then to 49 and opens the links in the 48th page to scrape the data in the links. Is it possible to enter url with a range but also using something like "by", as follows,

Thanks.

--

Emrah Er

unread,
Mar 2, 2015, 3:34:07 AM3/2/15
to web-s...@googlegroups.com, emra...@gmail.com
Thanks, I found the solution in documentation after I posted the question. I should have read the f. manual. better :)
Reply all
Reply to author
Forward
0 new messages