Exporting multiple Site Maps

114 views
Skip to first unread message

Novice Scraper

unread,
Jun 17, 2016, 9:05:13 AM6/17/16
to Web Scraper
I have created over 100 site maps that I would like to export however.

I understand the workflow for exporting sitemaps one by one but I am finding the task very long winded.

Can anyone please advise (if) where I can access the sitemaps and copy / export them in one or few operations.

I am assuming they are accessible somewhere on my Mac but have no idea where to look or what to look for.

Thank you in advance

[OSX El Capitan]



Exporting_Site_Map.png

Mārtiņš Balodis

unread,
Jun 20, 2016, 11:14:46 AM6/20/16
to Novice Scraper, Web Scraper
Hi,
Web scraper uses a database engine that uses a built-in database in chrome. Here is how you can access it:
1. click chrome menu > more tools > extensions
2. find web scraper and click "background page"
3. navigate to resources tab
4. Expand IndexedDB
5. There should be a table called "_puch_scraper-sitemaps"
6. You can see all sitemaps there

--
You received this message because you are subscribed to the Google Groups "Web Scraper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-scraper...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Novice Scraper

unread,
Jun 20, 2016, 12:06:07 PM6/20/16
to Web Scraper
Hi Mārtiņš 
thank you so much for taking the time to write and explain how I get in to the database, it is very much appreciated.

I have managed to follow your instructions (very clear and easy to follow) however I can not see the JSON text that I need to copy, for eample below


{"selectors":[{"parentSelectors":["item_element"],"type":"SelectorLink","multiple":true,"id":"item_link","selector":"1","delay":""},{"parentSelectors":["_root","page_link"],"type":"SelectorLink","multiple":false,"id":"page_link","selector":"1","delay":""},{"parentSelectors":["item_element"],"type":"SelectorText","multiple":false,"id":"town","selector":"1","regex":"{town}","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":false,"id":"town-1","selector":"1","regex":"\\w-(\\w+)-\\d+   {town}","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":false,"id":"status","selector":"{{ for sale }}","regex":"","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":false,"id":"agency_name","selector":"{{  NAME  }}","regex":"","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":false,"id":"email","selector":"{{ EMAIL }}","regex":"","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":false,"id":"telephone","selector":"{{ +34  }}","regex":"","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":false,"id":"reference","selector":"1","regex":"","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":false,"id":"property_type","selector":"1","regex":"{type}","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":false,"id":"price","selector":"1","regex":"(\\d[\\d,.]*)","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":false,"id":"title","selector":"1","regex":"","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"bedrooms","selector":"1","regex":"Bedrooms?:\\s*(\\d+)","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"bathrooms","selector":"1","regex":"Bathrooms?:\\s*(\\d+)","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"build","selector":"1","regex":"Built size:\\s*(\\d+) (\\d[\\d,.]*)","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"plot","selector":"1","regex":"Plot size:\\s*(\\d+) (\\d[\\d,.]*)","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":false,"id":"description","selector":"1","regex":"","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"views","selector":"1","regex":"Views:\\s*(\\w*\\s*\\w*)  {views}","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"pool","selector":"1","regex":"Pool:\\s*(\\w*\\s*\\w*)  {pool}","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"terrace","selector":"1","regex":"Terrace size:\\s*(\\d*)  {terrace}","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"orientation","selector":"1","regex":"Orientation:\\s*(\\w*\\s*\\w*)","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"parking","selector":"1","regex":"Parking:\\s*(\\w*|\\d*)  {parking}","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"heating","selector":"1","regex":"Heating:\\s*(\\d+)  {heating}","delay":""},{"parentSelectors":["item_link"],"type":"SelectorText","multiple":true,"id":"garden","selector":"1","regex":"Garden:\\s*(\\w+)    {garden}","delay":""},{"parentSelectors":["item_element"],"type":"SelectorImage","multiple":false,"id":"featured_image","selector":"1","downloadImage":false,"delay":""},{"parentSelectors":["_root","page_link"],"type":"SelectorElement","multiple":true,"id":"item_element","selector":"1","delay":""}],"startUrl":"URL","_id":"a____master"}

If you could help me further that would be very good of you.

Thank you in advance

Mārtiņš Balodis

unread,
Jun 20, 2016, 1:27:33 PM6/20/16
to Novice Scraper, Web Scraper
Yeah the it might not be that easy to copy the json. Another way would be to access the data via a storage class that is made for Web Scraper. Here is how you can do that:

1. click chrome menu > more tools > extensions
2. find web scraper and click "background page"
3. navigate to console tab
4. Paste this code and press enter. It should print out JSON of all the sitemaps.
store.getAllSitemaps(function(sitemaps){ sitemaps.forEach(function(sitemap){console.log(sitemap._id, JSON.stringify(sitemap))});});
Reply all
Reply to author
Forward
0 new messages