Tool is not working properly

136 views
Skip to first unread message

Abhishek Singh

unread,
Nov 28, 2017, 6:56:40 AM11/28/17
to Web Scraper
{"startUrl":"https://www.yellowpages.com.au/search/listings?clue=plumbers&locationClue=Waterways%2C+VIC+3195&mappable=true&selectedViewMode=list","selectors":[{"parentSelectors":["business_link","pagination"],"type":"SelectorLink","multiple":true,"id":"business_link","selector":"div.cell:nth-of-type(n+3) div.body a.listing-name","delay":""},{"parentSelectors":["business_link"],"type":"SelectorText","multiple":false,"id":"business_name","selector":"h1.listing-name","regex":"","delay":""},{"parentSelectors":["business_link"],"type":"SelectorText","multiple":false,"id":"address","selector":"p.listing-address","regex":"","delay":""},{"parentSelectors":["business_link"],"type":"SelectorText","multiple":false,"id":"phone_no","selector":"div.main a.click-to-call span.text div","regex":"","delay":""},{"parentSelectors":["business_link"],"type":"SelectorText","multiple":false,"id":"email_id","selector":"a.contact.contact-email span.glyph","regex":"","delay":""},{"parentSelectors":["business_link"],"type":"SelectorLink","multiple":false,"id":"website","selector":"a.contact.contact-url","delay":""},{"parentSelectors":["_root","pagination"],"type":"SelectorLink","multiple":true,"id":"pagination","selector":"a.pagination.navigation","delay":""}],"_id":"plumbers_waterways"}

Please help I need this to be fixed.
The issues I am facing is that it is extracting the data which is not even present in the list please tell if I have done any mistake?

Darren Reid

unread,
Nov 28, 2017, 5:54:37 PM11/28/17
to Web Scraper
Hi Abhishek, turns out I have had a lot of expertise in scraping Yellow Pages Australia - it's the sole reason I got into web scraping in the first place.
Firstly the Metadata needs to include one url for each of the search result pages (it's a pain, but I haven't found a way around that.)
Then the first selector is the 'links' to the company details.
Then use an 'element' selector to highlight the smallest element that contains all of the data you want to scrape on the company listing.
The 'text' selectors are then made a child of the 'element' selector. Note - I cannot find a way to scrape the email address or website data.
As for the pagination selector, use the multi select function and select each subsequent page. The pagination selector is to be a child of the '_root' and also the 'link" selectors (hold Ctrl and click both) 

Your selector graph should look something like this....  

So here's a sitemap that does what you want, except scrape the email and website.

{"selectors":[{"parentSelectors":["_root"],"type":"SelectorLink","multiple":true,"id":"business_link","selector":"div.body a.listing-name","delay":""},{"parentSelectors":["element"],"type":"SelectorText","multiple":false,"id":"business_name","selector":"h1.listing-name","regex":"","delay":""},{"parentSelectors":["element"],"type":"SelectorText","multiple":false,"id":"address","selector":"p.listing-address","regex":"","delay":""},{"parentSelectors":["element"],"type":"SelectorText","multiple":false,"id":"phone_no","selector":"div.main a.click-to-call span.text div","regex":"","delay":""},{"parentSelectors":["element"],"type":"SelectorText","multiple":false,"id":"email_id","selector":"a.contact.contact-email span.glyph","regex":"","delay":""},{"parentSelectors":["element"],"type":"SelectorLink","multiple":false,"id":"website","selector":"a.contact.contact-url","delay":""},{"parentSelectors":["_root","business_link"],"type":"SelectorLink","multiple":true,"id":"pagination","selector":"a.pagination","delay":""},{"parentSelectors":["business_link"],"type":"SelectorElement","multiple":false,"id":"element","selector":"div.business-details","delay":""}],"startUrl":["https://www.yellowpages.com.au/search/listings?clue=plumbers&locationClue=Waterways%2C+VIC+3195&mappable=true&selectedViewMode=list","https://www.yellowpages.com.au/search/listings?clue=plumbers&eventType=pagination&locationClue=Waterways%2C+VIC+3195&mappable=true&pageNumber=2&referredBy=UNKNOWN","https://www.yellowpages.com.au/search/listings?clue=plumbers&eventType=pagination&locationClue=Waterways%2C+VIC+3195&mappable=true&pageNumber=3&referredBy=UNKNOWN","https://www.yellowpages.com.au/search/listings?clue=plumbers&eventType=pagination&locationClue=Waterways%2C+VIC+3195&mappable=true&pageNumber=4&referredBy=UNKNOWN"],"_id":"aaa"}

Hope this clarifies things :)

Darren.
Reply all
Reply to author
Forward
0 new messages