I have recently started using bcolz and learning to use it for several use cases.
I created a ctable with nbytes ~ 155 GB and cbytes ~ 5GB (totally awesome, the way bcolz compresses such huge data)
Now that I have the data, my objective is to search through such large data from 1000s of .blp files.
These are the steps I tried:
>>>import bcolz
>>>import bquery
>>>a= bquerry.open('<path to the table>',mode='r')
>>>print(a)
ctable((41603242,), [('lines', '<U1000')])
nbytes: 154.98 GB; cbytes: 5.21 GB; ratio: 29.76
cparams := cparams(clevel=5, shuffle=1, cname='lz4', quantize=0)
rootdir := '<path_to_table>'
Note : The table is typically a log with <date> <timestamp> <levelname> <text> . # I have placed entire line as string for easy search through
>>>a.where_terms([('lines','in',['INFO'])])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/venv/lib64/python3.6/site-packages/bquery/ctable.py", line 739, in where_terms
ctable_ext.apply_where_terms(ctable_iter, op_list, value_list, boolarr)
File "bquery/ctable_ext.pyx", line 1283, in bquery.ctable_ext.apply_where_terms (bquery/ctable_ext.c:43492)
File "bquery/ctable_ext.pyx", line 1313, in bquery.ctable_ext.apply_where_terms (bquery/ctable_ext.c:42740)
TypeError: an integer is required
How should I approach this? Also using bquery would be a good idea to search through such huge data quickly or using list comprehensions would be quicker?
I did not understand how bcolz/bquery handles searching operations and what is the time complexity? Could someone shed some light on this and guide me.