Help: how to use scraperwiki library to save data locally

128 views
Skip to first unread message

Jonathan Cox

unread,
Jan 9, 2015, 9:33:04 PM1/9/15
to scrap...@googlegroups.com
Hey guys,
I'm trying to run ScraperWiki locally on my computer (I'm running Ubuntu Linux) using the scraperwiki library, and I was wondering what the easiest way is to save the data returned by the scraper into a local sqlite database. I'm a newcomer to python, scraperwiki, and databases in general, so the more detailed your explanation, the better!
Thanks!

Steven Maude

unread,
Jan 10, 2015, 7:47:05 AM1/10/15
to scrap...@googlegroups.com
Hi Jonathan,

The easiest way to save data to SQLite with scraperwiki-python is to use
scraperwiki.sql.save()

If you look at the documentation here:
https://github.com/scraperwiki/scraperwiki-python - you can see that it
works on Python dictionaries.

So, you need to construct a dictionary for each data entry you wish to
create from your scraped data.

A single dictionary represents a row in the database, giving you data
for one particular thing. Each dictionary key represents a database
column name and the dictionary value corresponding to that key is the
actual data that will be stored in the column of that row.

Example:

import scraperwiki

example_data = {'product_url': 'http://some.product.url',
'product_name': 'Reticulated Fizztron',
'price': 199.99,
'manufacturer': 'Fizztron Industries'}

scraperwiki.sql.save(['product_url'], example_data)

scraperwiki.sql.save() takes a list of unique keys - the column(s) that
will be unique for each entry - here, in this example, my assumption is
that each product URL represents one *and only one* product. (We don't
have multiple products on one page, for instance). Depending on your
data, you may need to specify a combination of unique keys.

Here, example_data is just one dictionary, but you can also replace that
with a list of dictionaries to save them all in one
scraperwiki.sql.save() call.

Another thing worth mentioning as it's not immediately apparent: Python
dictionaries are ordered arbitrarily (albeit deterministically), which
results in the database columns created by scraperwiki.sql.save() having
an arbitrary order.

If you prefer to have a specific column order, you can use a Python
OrderedDict instead. This behaves a lot like a standard dictionary, but
the order in which you add keys is retained:

from collections import OrderedDict
import scraperwiki

example_data = OrderedDict()
example_data['product_url'] = 'http://some.product.url'
example_data['product_name'] = 'Reticulated Fizztron'
example_data['price'] = 199.99
example_data['manufacturer'] = 'Fizztron Industries'

scraperwiki.sql.save(['product_url'], example_data)

This doesn't matter too much, but can make the data easier to read if
you're looking at the tables by eye.

Hope that helps,

Steve

Jonathan Cox

unread,
Jan 11, 2015, 9:11:41 AM1/11/15
to scrap...@googlegroups.com
Hi Steven,
Thanks for the prompt response. I see that using the
scraperwiki.sql.save() function created a scraperwiki.sqlite file in the
same directory. Now what is the best way to open and view that database?
I want to be able to run SQL queries on it and export it as csv.
Thanks again!
Jonathan

Páll Hilmarsson

unread,
Jan 11, 2015, 9:41:47 AM1/11/15
to scrap...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi.

If you’re on OS X then Base is good:

http://menial.co.uk/base/

Never tried this on Windows:

http://sqlitebrowser.org/

Command line: https://www.sqlite.org/cli.html

I like the Python library dataset: http://dataset.readthedocs.org/en/latest/

All the best,

pallih


On 11 Jan 2015, at 14:11, Jonathan Cox <jonatha...@gmail.com> wrote:

Hi Steven,
Thanks for the prompt response. I see that using the scraperwiki.sql.save() function created a scraperwiki.sqlite file in the same directory. Now what is the best way to open and view that database? I want to be able to run SQL queries on it and export it as csv.
Thanks again!
Jonathan



pal...@gogn.in | github.com/pallih | twitter.com/pallih
gpg: 9272 A7A6 83AB 998E C80C 4A03 9EDE 79DE F524 1090

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQIcBAEBCgAGBQJUsouUAAoJEJ7eed71JBCQr4gP/0c5UaXsjTJj3J5wiD+rN/If
QqvIeW8hR0ChkAS/qMSSXc84Riq4vPj0rAx+XQaaknWyikV0I4CuMIGVjtwBcqkA
lWMhP1hjjoz1Z0jogJKHM2yD1NNOLeZGyg8AoW+Znp+NPju243NHfttfWVCYSGVw
fqT0vtvgIROBq1RHn7BASeTsnCS8b5VlvK7XS3Wlq5yTPFiqXN3H0q46504BBuDW
dA6i37AE2QIPVl4rk0eXX1zvRocqF3p4CJOYzajEQdMsitpRaqNWVXeHAcEwGsjA
2Hv+6DFhcMaTnJIjFvr35J6v2MkgdiYpncVvEgzfZJp+BvTiiXYE8iM7HjIBtYMa
IxQCZaq3O1Fi5I3VE5UflF+EbUiwfTUZ9TpSzYupQddWmC/UHTxo4pvmqfyOnoNk
1/lRIAd1CwmtN1JCh5hT8EEaq6huDGBeAdhwzgx8hYhC/BhuZ3AYupdBY48z8WCf
Jpf5yTM1syXoaI5wHa/M45WLK0pNHNtZ9uHiC813QAAnD087CqSd9IC0XBVQgEz6
NpNgFIFfDeWytt2dR6qvyGNJdhAZgNCOJcy8/FSmnlJ0XpMjC6ra73P2/i4yDs4Y
TbfoHhYUFpc7ZCNtPzMkmwrqBlVgFT9TQqkMERIquirutOcnIOryA0OskBChOUTw
YCvurVZuuDpiIgrpTp4Z
=N41D
-----END PGP SIGNATURE-----

'Dragon' Dave McKee

unread,
Jan 12, 2015, 6:24:07 AM1/12/15
to scrap...@googlegroups.com
If you're looking to run queries over it, you should just be able to install sqlite3 (sudo apt-get install sqlite3); run it via `sqlite3 scraperwiki.sqlite`

You can also export tables to csv from the command line 
echo ".mode csv
select * from swdata;" | sqlite3 scraperwiki.sqlite > swdata.csv

(note the newline - the commands in the echo are exactly the same as you'd run insta

Ian uses http://sqliteman.yarpen.cz/ on Windows for viewing databases in general; it also has a Linux version, but not sure how easy installation is.


--
You received this message because you are subscribed to the Google Groups "ScraperWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scraperwiki+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages