Saving and restoring dataframes

Yuri D'Elia

unread,

Dec 19, 2016, 1:00:14 PM12/19/16

to pyd...@googlegroups.com

This might be naïve (and very late to the party, since I've been using
pandas for years now), but are there some recommendations on which
method is recommended to save DataFrames, considering:

- datatype preservation
- fastest I/O throughput
- smallest file size

I looked briefly, and couldn't see anything on the documentation. Only
some tests on SO.

I personally didn't have trouble deciding so far (as I pick whatever
format is already available to read), but given some constraints the
choice is not always obvious.

If I want to preserve all objects currently stored in a frame without
loss, is there really an alternative to pickling?

I assume hdf is a serious contender for both write and especially read
throughput, but when nullable types are involved it's not necessarily
restored _identically_.

For sparse data structures, is there anything recommended?

For highly redundant datasets, compressed text formats might still be
a good choice.

Miki Tebeka

unread,

Dec 19, 2016, 11:17:32 PM12/19/16

to PyData, wav...@thregr.org

I've been using pickle for most things. It's fast and flexible enough for my needs.

Pietro Battiston

unread,

Dec 20, 2016, 3:34:15 AM12/20/16

to pyd...@googlegroups.com

Il giorno lun, 19/12/2016 alle 18.59 +0100, Yuri D'Elia ha scritto:
> [...]

> I assume hdf is a serious contender for both write and especially
> read
> throughput, but when nullable types are involved it's not necessarily
> restored _identically_.
>

Can you provide an example?

(By the way: I use hdf with compression a lot and I'm quite happy with
it)

Pietro

Reply all

Reply to author

Forward