using where or where_terms (bquery) in ctable

7 views
Skip to first unread message

Sanchita Agarwal

unread,
Oct 21, 2019, 4:57:56 PM10/21/19
to bcolz
I have recently started using bcolz and learning to use it for several use cases.
I created a ctable with nbytes ~ 155 GB and cbytes ~ 5GB (totally awesome, the way bcolz compresses such huge data)
Now that I have the data, my objective is to search through such large data from 1000s of .blp files.

These are the steps I tried:

>>>import bcolz
>>>import bquery
>>>a= bquerry.open('<path to the table>',mode='r')
>>>print(a)
ctable((41603242,), [('lines', '<U1000')])
  nbytes: 154.98 GB; cbytes: 5.21 GB; ratio: 29.76
  cparams := cparams(clevel=5, shuffle=1, cname='lz4', quantize=0)
  rootdir := '<path_to_table>'

Note : The table is typically a log with <date> <timestamp> <levelname> <text> .         # I have placed entire line as string for easy search through

>>>a.where_terms([('lines','in',['INFO'])])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/venv/lib64/python3.6/site-packages/bquery/ctable.py", line 739, in where_terms
    ctable_ext.apply_where_terms(ctable_iter, op_list, value_list, boolarr)
  File "bquery/ctable_ext.pyx", line 1283, in bquery.ctable_ext.apply_where_terms (bquery/ctable_ext.c:43492)
  File "bquery/ctable_ext.pyx", line 1313, in bquery.ctable_ext.apply_where_terms (bquery/ctable_ext.c:42740)
TypeError: an integer is required

How should I approach this? Also using bquery would be a good idea to search through such huge data quickly or using list comprehensions would be quicker?

I did not understand how bcolz/bquery handles searching operations and what is the time complexity? Could someone shed some light on this and guide me.
Reply all
Reply to author
Forward
0 new messages