Hi.
I'm implementing cache for pandas dataframes, so I have to obtain value for hash(). DataFrame is mutable, so I can't do it directly, that's why I call underlying numpy array to do so.
I already described my problem here (some code samples and my approach are outlined there)
In addition, I tried another approach for hash():
In [3]: dataset = pd.read_csv(settings.TRAIN_FILE)
In [4]: hash(dataset.values.tostring())
Out[4]: 4839855946750815940
In [5]: hash(dataset.values.tostring())
Out[5]: -9009064184528202427
In [6]: df = pd.DataFrame({'A': [1]})
In [7]: hash(df.values.tostring())
Out[7]: -3099646879006235965
In [8]: hash(df.values.tostring())
Out[8]: -3099646879006235965
- same problem, no luck. DataFrame from file is giving me different hash each time, created just now - same hash value.
I've also tried pickle, it seems that pickle.dump() + pickle.load() "convert" DataFrame from good state of "just created" DataFrame to external one.
I will appreciate any help, maybe another approach for the problem.
Thank you.
p.s. as always, sorry for a bad grammar, I hope you can tolerate it.