Hey all,
Had a pretty great discovery in regards to Web Scraper, thought I would share to the group. Please note - I am not an affiliate of the following, nor can I guarantee it's success for you.
By default, Web Scraper saves all sitemaps and scraped data to the instance of Chrome you run it on. Meaning, the data is not easily accessible if you use Web Scraper on multiple computers. The original supported alternative by the developer was CouchDB, a native JSON document store, but it needs to be installed by the user locally or on a dedicated server, and takes some knowledge and configuration to be able to access the data remotely (ie; different Chrome instances of Web Scraper). I wasn't successful in making this work, so I looked for other options.
What I tried using was IBM's Cloudant, a similar tool to CouchDB - in fact, it's essentially a SaaS database based on Couch DB. Cloudant is, as it's name suggests, in the cloud. I had assumed Web Scraper would talk to Cloudant the same as it does for CouchDB, if I could only configure the right paths in Web Scraper's settings, and I was correct. I am using Web Scraper on multiple computers, so each instance of Chrome is different, and now all my Web Scraper data is instantly accessible and updated wherever I use it!
You'll need to sign up for a Cloudant account. It's free under a certain amount of data usage a month, so make sure it will work for your purposes. Your steps are listed below:
1. Create a Cloudant account here:
https://cloudant.com/2. Create one database with the name: "scraper-sitemaps" (every sitemap will be saved as a document within this database).
3. In Web Scraper's settings, you will configure the two storage fields like so:
sitemap db:
https://USERNAME:PASS...@USERNAME.cloudant.com/scraper-sitemapsData db:
https://USERNAME:PASS...@USERNAME.cloudant.com/NOTE: USERNAME is your Cloudant username, and PASSWORD is your Cloudant password.
NOTE 2: If you are running Web Scraper on one Google Chrome account, but multiple computers, you only have to update the settings once. If you run Web Scraper on multiple Chrome accounts, and want all data synced to Cloudant, you would need to update the settings for each separate install.
Essentially, the paths above contain your unique user data that allows Web Scraper to authenticate itself through Cloudant's API.
That's essentially it. You can look at your data in Cloudant, but you can delete and add all sitemaps through Web Scraper's interface. I have not done much testing with editing my data right in Cloudant, so I can't guarantee you won't have problems if you try.
Updating to Cloudant is very quick. I can create a sitemap on one computer, move immediately to the other, and the sitemap is reflected on the other Chrome instance. It's great! Now, I did set this up awhile ago - I don't think anything has changed, but if it doesn't work for you, please let me know.
Hope this helps!
- Calvin