pandas HDFStore.select_column() giving KeyError

46 views

Skip to first unread message

Nikhil VJ

unread,

May 24, 2018, 7:41:02 AM5/24/18

to PyData

Hi, I saw recommendations to use select_column() to select just one column from a table stored in a tables:

- http://pandas.pydata.org/pandas-docs/version/0.15.1/io.html#advanced-queries

- https://github.com/pandas-dev/pandas/issues/6379#issuecomment-35306725

- https://groups.google.com/d/msg/pydata/StIpoTp09U0/0lElfzek9KwJ

So I tried to use it, but am running into errors. Am on pandas 0.23.0 .

hdf = pd.HDFStore(h5File)
print(hdf.info())

..gives output:

<class 'pandas.io.pytables.HDFStore'>
File path: db/trips.h5
/df            frame_table  (typ->appendable,nrows->216357,ncols->10,indexers->[index])

So I hope that clarifies m object is store in table format and not fixed.

print(hdf.select('df',columns=['trip_id']))

..works properly.

         trip_id
0       88921931
1       88921934
2       88921937
...
[216357 rows x 1 columns]

But doing select_column itself:

print(hdf.select_column('df','trip_id') )

..causes an error:

KeyError                                  Traceback (most recent call last)
<ipython-input-36-0c1bb812da21> in <module>()
     18     '''
     19     return returnList
---> 20 readColumnDB('trips','trip_id')


<ipython-input-36-0c1bb812da21> in readColumnDB(tablename, column)
      8     # hdf.create_index('df')
      9     # print(hdf.get('df'))
---> 10     print(hdf.select_column('df','trip_id') )
     11     #print(hdf.select('df',columns=[column]))
     12     returnList = []


~/.local/lib/python3.5/site-packages/pandas/io/pytables.py in select_column(self, key, column, **kwargs)
    775 
    776         """
--> 777         return self.get_storer(key).read_column(column=column, **kwargs)
    778 
    779     def select_as_multiple(self, keys, where=None, selector=None, columns=None,


~/.local/lib/python3.5/site-packages/pandas/io/pytables.py in read_column(self, column, where, start, stop, **kwargs)
   3779                                       a.tz, True), name=column)
   3780 
-> 3781         raise KeyError("column [%s] not found in the table" % column)
   3782 
   3783 


KeyError: 'column [trip_id] not found in the table'

I read something about indexables, indexes etc but am not able to figure out how to set that column as an index or index the table by that column or something. I came across a

create_table_index(self, key, **kwargs)

function, ran it, it ran (I don't know what it did though), but even after that the select_column is throwing the same error.

If I do:

print(hdf.select_column('df','index') )

..Then that works fine.

0              0
1              1
2              2
3              3
4              4
...
Name: index, Length: 216357, dtype: int64

Is anybody else using select_column without a hitch?

Nikhil VJ

unread,

May 24, 2018, 10:54:54 AM5/24/18

to pyd...@googlegroups.com

Hi, after seeing the code example on http://pandas.pydata.org/pandas-docs/version/0.15.1/io.html#advanced-queries,

going by which it really should work and it isn't, I've filed an issue on https://github.com/pandas-dev/pandas/issues/21188

Reply all

Reply to author

Forward

0 new messages