Saving and restoring dataframes

12 views

Skip to first unread message

Yuri D'Elia

unread,

Dec 19, 2016, 12:58:00 PM12/19/16

to pystat...@googlegroups.com

This might be naïve (and very late to the party, since I've been using
pandas for years now), but are there some recommendations on which
method is recommended to save DataFrames, considering:

- datatype preservation
- fastest I/O throughput
- smallest file size

I looked briefly, and couldn't see anything on the documentation. Only
some tests on SO.

I personally didn't have trouble deciding so far (as I pick whatever
format is already available to read), but given some constraints the
choice is not always obvious.

If I want to preserve all objects currently stored in a frame without
loss, is there really an alternative to pickling?

I assume hdf is a serious contender for both write and especially read
throughput, but when nullable types are involved it's not necessarily
restored _identically_.

For sparse data structures, is there anything recommended?

For highly redundant datasets, compressed text formats might still be
a good choice.

Yuri D'Elia

unread,

Dec 19, 2016, 1:05:16 PM12/19/16

to pystat...@googlegroups.com

On Mon, Dec 19 2016, Yuri D'Elia wrote:
> This might be naïve (and very late to the party, since I've been using
> pandas for years now), but are there some recommendations on which
> method is recommended to save DataFrames, considering: