Hi, I saw recommendations to use select_column() to select just one column from a table stored in a tables:
So I tried to use it, but am running into errors. Am on pandas 0.23.0 .
hdf = pd.HDFStore(h5File)
print(hdf.info())
..gives output:
<class 'pandas.io.pytables.HDFStore'>
File path: db/trips.h5
/df frame_table (typ->appendable,nrows->216357,ncols->10,indexers->[index])
So I hope that clarifies m object is store in table format and not fixed.
print(hdf.select('df',columns=['trip_id']))
..works properly.
trip_id
0 88921931
1 88921934
2 88921937
...
[216357 rows x 1 columns]
But doing select_column itself:
print(hdf.select_column('df','trip_id') )
..causes an error:
KeyError Traceback (most recent call last)
<ipython-input-36-0c1bb812da21> in <module>()
18 '''
19 return returnList
---> 20 readColumnDB('trips','trip_id')
<ipython-input-36-0c1bb812da21> in readColumnDB(tablename, column)
8 # hdf.create_index('df')
9 # print(hdf.get('df'))
---> 10 print(hdf.select_column('df','trip_id') )
11 #print(hdf.select('df',columns=[column]))
12 returnList = []
~/.local/lib/python3.5/site-packages/pandas/io/pytables.py in select_column(self, key, column, **kwargs)
775
776 """
--> 777 return self.get_storer(key).read_column(column=column, **kwargs)
778
779 def select_as_multiple(self, keys, where=None, selector=None, columns=None,
~/.local/lib/python3.5/site-packages/pandas/io/pytables.py in read_column(self, column, where, start, stop, **kwargs)
3779 a.tz, True), name=column)
3780
-> 3781 raise KeyError("column [%s] not found in the table" % column)
3782
3783
KeyError: 'column [trip_id] not found in the table'
I read something about indexables, indexes etc but am not able to figure out how to set that column as an index or index the table by that column or something. I came across a
create_table_index(self, key, **kwargs)
function, ran it, it ran (I don't know what it did though), but even after that the select_column is throwing the same error.
If I do:
print(hdf.select_column('df','index') )
..Then that works fine.
0 0
1 1
2 2
3 3
4 4
...
Name: index, Length: 216357, dtype: int64
Is anybody else using select_column without a hitch?