Hash of a DataFrame

1,659 views

Skip to first unread message

Maxim Kurnikov

unread,

Jul 27, 2015, 10:11:55 AM7/27/15

to PyData

Hi.

I'm implementing cache for pandas dataframes, so I have to obtain value for hash(). DataFrame is mutable, so I can't do it directly, that's why I call underlying numpy array to do so.

I already described my problem here (some code samples and my approach are outlined there)

http://stackoverflow.com/questions/31567401/get-the-same-hash-value-for-a-pandas-dataframe-each-time

In addition, I tried another approach for hash():

In [3]: dataset = pd.read_csv(settings.TRAIN_FILE)

In [4]: hash(dataset.values.tostring())
Out[4]: 4839855946750815940

In [5]: hash(dataset.values.tostring())
Out[5]: -9009064184528202427

In [6]: df = pd.DataFrame({'A': [1]})

In [7]: hash(df.values.tostring())
Out[7]: -3099646879006235965

In [8]: hash(df.values.tostring())
Out[8]: -3099646879006235965

- same problem, no luck. DataFrame from file is giving me different hash each time, created just now - same hash value.

I've also tried pickle, it seems that pickle.dump() + pickle.load() "convert" DataFrame from good state of "just created" DataFrame to external one.

I will appreciate any help, maybe another approach for the problem.

Thank you.

p.s. as always, sorry for a bad grammar, I hope you can tolerate it.

Jan Schulz

unread,

Jul 27, 2015, 11:44:33 AM7/27/15

to pyd...@googlegroups.com

See here for a caching decorator:
https://github.com/TomAugspurger/engarde/issues/3
Schöne Grüße aus Dresden,

Jan
--
Jan Schulz
mail: ja...@gmx.net
web: http://www.katzien.de

> --
> You received this message because you are subscribed to the Google Groups
> "PyData" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pydata+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages