Best Storage for Relational Data In Pandas

102 views
Skip to first unread message

Sayth Renshaw

unread,
Jun 1, 2016, 8:36:15 PM6/1/16
to PyData
Morning

If I have created currently 5 CSV's from XML data I have parsed and saved using Pandas read_to.csv, what should I use with pandas that is good for my next steps which is to create the relationships (files have ids already) tidy data eg split text columns with '-' and spaces etc to multiple columns, then restore the data with changes for analysis?

Sorry if its a simple question, I started using postgres however it seems overkill and the main issue is that I only have personal laptops and it would not really be portable to keep installing postgres on each they aren't the highest powered thing laptops.

Thoughts

Sayth

Hugo Shi

unread,
Jun 1, 2016, 8:41:07 PM6/1/16
to PyData
Sqlite is probably the easiest, and its built
In to python 

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Miki Tebeka

unread,
Jun 2, 2016, 12:02:55 AM6/2/16
to PyData
I'd look into HDF5 (PyTables) - http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables .
As Hugo said sqlite3 is also a good option.

Sayth Renshaw

unread,
Jun 3, 2016, 3:55:15 AM6/3/16
to PyData


On Thursday, 2 June 2016 14:02:55 UTC+10, Miki Tebeka wrote:
I'd look into HDF5 (PyTables) - http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables .
As Hugo said sqlite3 is also a good option.


I had started with sqlite however it did not have context handlers which just a little annoying that with open doesn't work. Although could use something like peewee (http://docs.peewee-orm.com/en/latest/) i guess to up the ease in using 
had seen pytables and hdf5 on the pandas site wanted further info before going down that route, seems some just keep using csv however not practical I think for data that is going to be updated weekly.

Saw a few discussions on pickle and msgpack are they appropriate ?

Thanks for the thoughts, are there any key considerations to take into account I might not be aware of with csv files that are related?

Sayth
Reply all
Reply to author
Forward
0 new messages