Yuri D'Elia
unread,Dec 19, 2016, 12:58:00 PM12/19/16Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to pystat...@googlegroups.com
This might be naïve (and very late to the party, since I've been using
pandas for years now), but are there some recommendations on which
method is recommended to save DataFrames, considering:
- datatype preservation
- fastest I/O throughput
- smallest file size
I looked briefly, and couldn't see anything on the documentation. Only
some tests on SO.
I personally didn't have trouble deciding so far (as I pick whatever
format is already available to read), but given some constraints the
choice is not always obvious.
If I want to preserve all objects currently stored in a frame without
loss, is there really an alternative to pickling?
I assume hdf is a serious contender for both write and especially read
throughput, but when nullable types are involved it's not necessarily
restored _identically_.
For sparse data structures, is there anything recommended?
For highly redundant datasets, compressed text formats might still be
a good choice.