Newbie: Trouble combining element scroll with link scraping

1,255 views
Skip to first unread message

Conrad Peart

unread,
Sep 11, 2014, 6:33:04 PM9/11/14
to web-s...@googlegroups.com
First off ... amazing plugin. Thx.

I'm a newbie and am unable to get the start URL page to scroll to the end and then to send each of the linked pages off for scraping. Only the first 9 records are generated out of a possible 200. Any ideas? Thx.

Exported code below:

{"selectors":[{"parentSelectors":["listing"],"type":"SelectorText","multiple":false,"id":"price","selector":"span.stack.fnt-nrm-bld:nth-of-type(2)","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorLink","multiple":true,"id":"mls_url","selector":"td.left a","delay":"1000"},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_address","selector":"div.l-hdr-col2 p span","regex":"","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_price","selector":"span#lblPrice","regex":"","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_ref","selector":"span#lblReferenceNumber","regex":"","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_bedroom","selector":"tr:nth-of-type(1) td.one:nth-of-type(1)","regex":"\\d","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_bathfull","selector":"tr:nth-of-type(1) td.one:nth-of-type(2)","regex":"\\d","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_bathhalf","selector":"tr:nth-of-type(2) td.one:nth-of-type(1)","regex":"\\d","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_storeys","selector":"tr:contains('Storeys')","regex":"\\d+","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_age","selector":"tr:contains('Built in')","regex":"","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_desc","selector":"span#lblPublicRemarks","regex":"","delay":""}],"startUrl":"http://m.realtor.ca/PropertyResults.aspx?&Area=h3c&TransactionTypeId=2&PriceMin=500000&PriceMax=600000&BedRange=0-0&BathRange=0-0&StoreyRange=0-0&NumberofDays=0&OwnershipTypeGroupID=2&SearchFormType=CON&ApplicationId=47&CultureId=1&Longitude=-73.5400009155273&Latitude=45.500244140625#Page=1","_id":"mls_lite"}

Mārtiņš Balodis

unread,
Sep 13, 2014, 4:59:41 AM9/13/14
to web-s...@googlegroups.com
Hi,
You need to use the Element scroll down selector. Using this selector you need to select the elements that appearing after scrolling. The selector will scroll down the page until there are no new elements appearing. After scrolling is done all of the found elements will be passed to the child selectors of the element scroll down selector. In this case the child selector is a link selector. I also attached an example sitemap.

{"selectors":[{"parentSelectors":["listing"],"type":"SelectorText","multiple":false,"id":"price","selector":"span.stack.fnt-nrm-bld:nth-of-type(2)","regex":"","delay":""},{"parentSelectors":["items"],"type":"SelectorLink","multiple":false,"id":"mls_url","selector":"td.left a","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_address","selector":"div.l-hdr-col2 p span","regex":"","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_price","selector":"span#lblPrice","regex":"","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_ref","selector":"span#lblReferenceNumber","regex":"","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_bedroom","selector":"tr:nth-of-type(1) td.one:nth-of-type(1)","regex":"\\d","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_bathfull","selector":"tr:nth-of-type(1) td.one:nth-of-type(2)","regex":"\\d","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_bathhalf","selector":"tr:nth-of-type(2) td.one:nth-of-type(1)","regex":"\\d","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_storeys","selector":"tr:contains('Storeys')","regex":"\\d+","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_age","selector":"tr:contains('Built in')","regex":"","delay":""},{"parentSelectors":["mls_url"],"type":"SelectorText","multiple":false,"id":"mls_desc","selector":"span#lblPublicRemarks","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorElementScroll","multiple":true,"id":"items","selector":"tr","delay":"2000"}],"startUrl":"http://m.realtor.ca/PropertyResults.aspx?&Area=h3c&TransactionTypeId=2&PriceMin=500000&PriceMax=600000&BedRange=0-0&BathRange=0-0&StoreyRange=0-0&NumberofDays=0&OwnershipTypeGroupID=2&SearchFormType=CON&ApplicationId=47&CultureId=1&Longitude=-73.5400009155273&Latitude=45.500244140625#Page=1","_id":"mls_lite"}

Conrad P

unread,
Sep 15, 2014, 10:29:19 AM9/15/14
to web-s...@googlegroups.com
Aha! I didn't think to add the Element scroll selector prior to the Link selector. That made all the difference. Thanks!

Dusan Pavic

unread,
Oct 8, 2017, 3:03:39 PM10/8/17
to Web Scraper
Hi, i have one doubt regarding the Element scroll down selector - not quite sure how to make it work, what to select?
i took this website as an example: https://dribbble.com/designers
Here is my code, marked with "???" at the end...

{"startUrl":"https://dribbble.com/designers","selectors":[{"parentSelectors":["has_next"],"type":"SelectorLink","multiple":true,"id":"profile","selector":"a.url","delay":""},{"parentSelectors":["profile"],"type":"SelectorLink","multiple":false,"id":"website","selector":"a.elsewhere-website","delay":""},{"parentSelectors":["profile"],"type":"SelectorLink","multiple":false,"id":"twitter","selector":"a.elsewhere-twitter","delay":""},{"parentSelectors":["profile"],"type":"SelectorLink","multiple":false,"id":"instagram","selector":"a.elsewhere-instagram","delay":""},{"parentSelectors":["profile"],"type":"SelectorText","multiple":false,"id":"name","selector":"h2.vcard a.url","regex":"","delay":""},{"parentSelectors":["profile"],"type":"SelectorText","multiple":false,"id":"location","selector":"span.locality","regex":"","delay":""},{"parentSelectors":["profile"],"type":"SelectorText","multiple":false,"id":"bio","selector":"div.bio","regex":"","delay":""},{"parentSelectors":["profile"],"type":"SelectorText","multiple":false,"id":"followers","selector":"ul.full-tabs-links > li.followers span.count","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorElementScroll","multiple":true,"id":"has_next","selector":"???","delay":"2000"}],"_id":"dribbble-designers"}

_________________________

Dusan Pavic

unread,
Dec 1, 2017, 2:50:29 PM12/1/17
to Web Scraper

so, i've been doing some practicing and i might have the right challenge for you! :)

a website with scroll-down-to-load-more shows a More button (that has to be clicked) on exactly tenth loading! :)


here is the link: https://angel.co/freelancers/design


and here is my best try so far:


{"startUrl":"https://angel.co/freelancers/design","selectors":[{"parentSelectors":["_root"],"type":"SelectorLink","multiple":true,"id":"more-click","selector":"a.js-more-link","delay":"4000"},{"parentSelectors":["loader"],"type":"SelectorText","multiple":true,"id":"name","selector":"a.u-colorGray3","regex":"","delay":""},{"parentSelectors":["loader"],"type":"SelectorText","multiple":false,"id":"category","selector":"div.line","regex":"","delay":""},{"parentSelectors":["loader"],"type":"SelectorLink","multiple":true,"id":"linkedin","selector":"a.icon.fontello-linkedin","delay":""},{"parentSelectors":["loader"],"type":"SelectorLink","multiple":true,"id":"twitter","selector":"a.icon.fontello-twitter","delay":""},{"parentSelectors":["loader"],"type":"SelectorLink","multiple":true,"id":"rss","selector":"a.icon.fontello-rss","delay":""},{"parentSelectors":["loader"],"type":"SelectorLink","multiple":true,"id":"website","selector":"span.link:nth-of-type(4) a.link_el","delay":""},{"parentSelectors":["loader"],"type":"SelectorText","multiple":true,"id":"why-me","selector":"div.s-grid0-colMd24:nth-of-type(3) div.u-inlineBlock.content-container","regex":"","delay":""},{"parentSelectors":["loader"],"type":"SelectorText","multiple":true,"id":"location","selector":"div.dotted-border.location-border","regex":"","delay":""},{"parentSelectors":["_root","more-click"],"type":"SelectorElementScroll","multiple":true,"id":"loader","selector":"div.more_field","delay":""}],"_id":"angel-co-freelancers-design"}


Any suggestions ?? 





_________________________________

Reply all
Reply to author
Forward
0 new messages