Hash of a DataFrame

1,659 views
Skip to first unread message

Maxim Kurnikov

unread,
Jul 27, 2015, 10:11:55 AM7/27/15
to PyData
Hi.

I'm implementing cache for pandas dataframes, so I have to obtain value for hash(). DataFrame is mutable, so I can't do it directly, that's why I call underlying numpy array to do so. 

I already described my problem here (some code samples and my approach are outlined there)

In addition, I tried another approach for hash(): 
In [3]: dataset = pd.read_csv(settings.TRAIN_FILE)

In [4]: hash(dataset.values.tostring())
Out[4]: 4839855946750815940

In [5]: hash(dataset.values.tostring())
Out[5]: -9009064184528202427

In [6]: df = pd.DataFrame({'A': [1]})

In [7]: hash(df.values.tostring())
Out[7]: -3099646879006235965

In [8]: hash(df.values.tostring())
Out[8]: -3099646879006235965
 - same problem, no luck. DataFrame from file is giving me different hash each time, created just now - same hash value.

I've  also tried pickle, it seems that pickle.dump() + pickle.load() "convert" DataFrame from good state of "just created" DataFrame to external one.  
I will appreciate any help, maybe another approach for the problem.

Thank you. 

p.s. as always, sorry for a bad grammar, I hope you can tolerate it. 
 



Jan Schulz

unread,
Jul 27, 2015, 11:44:33 AM7/27/15
to pyd...@googlegroups.com
See here for a caching decorator:
https://github.com/TomAugspurger/engarde/issues/3
Schöne Grüße aus Dresden,

Jan
--
Jan Schulz
mail: ja...@gmx.net
web: http://www.katzien.de
> --
> You received this message because you are subscribed to the Google Groups
> "PyData" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pydata+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages