Possible to SQL join two table from different scrapers?

193 views
Skip to first unread message

Massimo Magris

unread,
Aug 16, 2012, 2:34:39 AM8/16/12
to scrap...@googlegroups.com
Hello,

is it possible to have a query like
SELECT A.URL, A.Nome
FROM A LEFT JOIN B ON A.[Nome] = B.[Nome]
WHERE (((B.Nome) Is Null));

Or to do something like that?

thanks, best regards
Massimo

Zarino Zappia

unread,
Aug 16, 2012, 3:10:33 AM8/16/12
to scrap...@googlegroups.com
Hi Massimo,

You want to use scraperwiki.sqlite.attach()

https://scraperwiki.com/docs/python/python_help_documentation/

Z

Massimo Magris

unread,
Aug 16, 2012, 3:46:29 AM8/16/12
to scrap...@googlegroups.com
Hello, Thanks fot the quick replay

I can attach to table A and table B from different scrapers,
but how can I select join the two tables A and B?

As workaround I've done in this way (slow and not elegant) :
- attach and copy scraper data A in a new table A (on a scraper C)
- attach and copy scraper data B in a new table B (on a scraper C)

then in the scraper C i'm able to run this select
sql "SELECT A.URL, A.Nome"
sql " A.URL, A.Nome"
sql sql " FROM A LEFT JOIN B ON A.[Nome] = B.[Nome]"
sql sql " WHERE (((B.Nome) Is Null)); "       

missing scraperwiki.sqlite.select(sql)  

There is a way to do this, without "copy/duplicate" the data
but get directly?

Thanks


2012/8/16 Zarino Zappia <zar...@scraperwiki.com>



--
Massimo Magris


Soluzioni Zucchetti - Gestionale AdHocRevolution - InfoBusiness Business Intelligence
PLM/MCAD/CAE/CAM/IT integration

 
+39 0427-733038 - T-Ufficio
+39 346-8294902 - M-Cellulare
+39 0427-733038 - F-Fax ufficio
skype: massimo.magris
Via De Amicis 12
33085 - Maniago, (PN - Pordenone)
Italy
PIVA/VAT IT 01640530935
 

Il contenuto del presente messaggio e-mail, inclusi eventuali allegati, è Riservato esclusivamente alle persone indicate fra i destinatari.

Qualora aveste ricevuto questo documento per errore, ci scusiamo per l'accaduto e Vi invitiamo ad Eliminarlo in modo definitivo dal Vostro Sistema.

Vi ricordiamo che la diffusione, l'utilizzo e/o la conservazione dei dati ricevuti per errore costituiscono violazioni alle disposizioni del D.lgs. n.196/2003 - "Codice in materia di protezione dei dati personali".  Grazie.

 

This message and its attachments (if any) may contain confidential, proprietary or legally privileged information and it is intended only for the use of the addressee named above.

No confidentiality or privilege is waived or lost by any mistransmission.

If you are not the intended recipient of this message you are hereby notified that you must not use, disseminate, copy it in any form or take any action in reliance on it.

If you have received this message in error, please, delete it (and any copies of it). Thank you.


Zarino Zappia

unread,
Aug 16, 2012, 5:31:47 AM8/16/12
to scrap...@googlegroups.com
Imagine you have two scrapers, one which contains a list of people (first_name, last_name, id), and another which contains a list of pets (pet_name, species, owner_id). People own pets. Not all people have pets, and not all pets have owners. We want to see who owns which pet. Let's use attach!!

Examples here on ScraperWiki:


When you run the first scraper, "attach-example-people", it attaches to the second scraper, and pulls out all the owner-pet pairs using a join.

Likewise, when you run the second scraper, "attach-example-pets", it pulls out the same data, by attaching to the first scraper.

Magic.

I'll add the scrapers to the Documentation sometime this week, for future reference.

Note that we refer to the attached table using the scraper name, followed by a dot, followed by the table name. And in this case, we have to wrap the scraper name with back-ticks ( ` ) because SQLite doesn't allow hyphens in database names. If you wanted to avoid all the back-ticks (and make the SQL statements shorter too), you could supply a second argument to the scraperwiki.sqlite.attach() call, which is an alias by which you can refer to the attached database, rather than having to use the scraper's name. eg:

scraperwiki.sqlite.attach('attach-example-people', 'people')
# I can now use something like "select * from people.swdata"

Zarino Zappia
VP of Product, ScraperWiki
@zarino

Massimo Magris

unread,
Aug 16, 2012, 5:36:54 AM8/16/12
to scrap...@googlegroups.com
Excellent! thanks.

ScraperWiki is really powerful.

I'm new to scraping and also to python, but in a one day
I was able to get a lot's of useful data from internet.

Congratulations for your good job.

Thanks again.

Regards
Massimo


2012/8/16 Zarino Zappia <zar...@scraperwiki.com>

Anton Vanhoucke

unread,
Oct 13, 2012, 4:48:24 PM10/13/12
to scrap...@googlegroups.com
Say I have these two tables. They have the datetime as unique keys, but one contains waves and the other contains wind

Do I build two scrapers and one view that joins the data? Or can I build one scraper for the two urls that stores the combined data in one datastore?

I started scraping waves here:
https://scraperwiki.com/scrapers/waves/

(these are the urls:
http://www.mumm.ac.be/NL/Models/Operational/Wind/table.php?station=hoekvanholland
and
http://www.mumm.ac.be/NL/Models/Operational/Waves/table.php?station=hoekvanholland)

Scraperwiki is a great site by the way! Thanks!

Anton.

Anton Vanhoucke

unread,
Oct 14, 2012, 9:43:13 AM10/14/12
to scrap...@googlegroups.com
I fixed it with one scraper. Thanks anyway.
Reply all
Reply to author
Forward
0 new messages