pandas HDFStore.select_column() giving KeyError

46 views
Skip to first unread message

Nikhil VJ

unread,
May 24, 2018, 7:41:02 AM5/24/18
to PyData
Hi, I saw recommendations to use select_column() to select just one column from a table stored in a tables:


So I tried to use it, but am running into errors. Am on pandas 0.23.0 .

hdf = pd.HDFStore(h5File)
print(hdf.info())

..gives output:
<class 'pandas.io.pytables.HDFStore'>
File path: db/trips.h5
/df            frame_table  (typ->appendable,nrows->216357,ncols->10,indexers->[index])

So I hope that clarifies m object is store in table format and not fixed.

print(hdf.select('df',columns=['trip_id']))

..works properly.

         trip_id
0       88921931
1       88921934
2       88921937
...
[216357 rows x 1 columns]


But doing select_column itself:

print(hdf.select_column('df','trip_id') )


..causes an error:

KeyError                                  Traceback (most recent call last)
<ipython-input-36-0c1bb812da21> in <module>()
     
18     '''
     19     return returnList
---> 20 readColumnDB('
trips','trip_id')


<ipython-input-36-0c1bb812da21> in readColumnDB(tablename, column)
      8     # hdf.create_index('
df')
      9     # print(hdf.get('
df'))
---> 10     print(hdf.select_column('
df','trip_id') )
     11     #print(hdf.select('
df',columns=[column]))
     12     returnList = []


~/.local/lib/python3.5/site-packages/pandas/io/pytables.py in select_column(self, key, column, **kwargs)
    775
    776         """
--> 777         return self.get_storer(key).read_column(column=column, **kwargs)
    778
    779     def select_as_multiple(self, keys, where=None, selector=None, columns=None,


~/.local/lib/python3.5/site-packages/pandas/io/pytables.py in read_column(self, column, where, start, stop, **kwargs)
   3779                                       a.tz, True), name=column)
   3780
-> 3781         raise KeyError("column [%s] not found in the table" % column)
   3782
   3783


KeyError: '
column [trip_id] not found in the table'


I read something about indexables, indexes etc but am not able to figure out how to set that column as an index or index the table by that column or something. I came across a 
create_table_index(self, key, **kwargs)

function, ran it, it ran (I don't know what it did though), but even after that the select_column is throwing the same error.

If I do:
print(hdf.select_column('df','index') )

..Then that works fine.
0              0
1              1
2              2
3              3
4              4
...
Name: index, Length: 216357, dtype: int64

Is anybody else using select_column without a hitch?

Nikhil VJ

unread,
May 24, 2018, 10:54:54 AM5/24/18
to pyd...@googlegroups.com

going by which it really should work and it isn't, I've filed an issue on https://github.com/pandas-dev/pandas/issues/21188
Reply all
Reply to author
Forward
0 new messages