Hey There -
I'm new to the cassandra database and really data basing in general. I've been able to get information on and off of my DB, however I'm coming across an issue that perhaps someone might be able to shine some light on. When I try and query my column family (example below) I get a timed out failure. However, I put the query to a smaller "count" size and I don't get any failures. I'm sure this has something to do with the way I've got things set up, but below is an example of the issue I'm seeing. I believe that all of the columns I'm searching through are indexed keys, but frankly I'm not 100% sure. Any information pertaining to how the count contributes to the time out of the connection is greatly appreciated. I understand that count is obviously the number of items to return, but how do I get all of the items, not just a couple. Would multiple queries be a possibility? If so, how would I make sure that I've got the "next set" of keys returned? Thanks in advance for the help and information. Cheers.
Column Family Name = FileStats with 50 static columns
{column_name: cliname, validation_class: UTF8Type, index_type: KEYS},
{column_name: proj, validation_class: UTF8Type, index_type: KEYS},
{column_name: user, validation_class: UTF8Type, index_type: KEYS},
%pool = ConnectionPool(keyspace='file_stats',server_list=['cassaws002','cassaws003','cassaws004','cassaws005'],pool_timeout=5.0,max_retries=3)
%cf = ColumnFamily(pool,'FileStats')
%def SearchFrameData(column_family=None,count=10,**kwargs):
if column_family:
expr_list=[]
for key, value in kwargs.items():
exp = create_index_expression(key, value)
expr_list.append(exp)
clause = create_index_clause(expr_list, count=count)
data = {}
for key, col in column_family.get_indexed_slices(clause):
data[key] = col
return data
matched=SearchFrameData(column_family=cf,cliname='lon',proj='zz0325',user='sjones')
############ Returns correction without Time out, count = 10 #####################
This returns 10 matched items. However I have over 100-2k+ items that should match. When I set my "count" to something higher than 40, I get a timed out. What is the problem here? How do I let cassandra know that this is a large query, and to not time out?
############ Error with count at 100 ###############
%matched=SearchFrameData(column_family=cf,cliname='lon',proj='zz0325',user='sjones',count=100)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/site-packages/pycassa/columnfamily.py", line 712, in get_indexed_slices
key_slices = self.pool.execute('get_indexed_slices', cp, clause, sp, cl)
File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 554, in execute
return getattr(conn, f)(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 150, in new_f
return new_f(self, *args, **kwargs)
File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 150, in new_f
return new_f(self, *args, **kwargs)
File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 150, in new_f
return new_f(self, *args, **kwargs)
File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 150, in new_f
return new_f(self, *args, **kwargs)
File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 150, in new_f
return new_f(self, *args, **kwargs)
File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 145, in new_f
(self._retry_count, exc.__class__.__name__, exc))
pycassa.pool.MaximumRetryException: Retried 6 times. Last failure was timeout: timed out