How to get all the keys in a Cassandra database?

442 views
Skip to first unread message

caribbean

unread,
Jun 8, 2010, 7:44:32 PM6/8/10
to jassandra-user
Hi,

In jassandra (http://code.google.com/p/jassandra/), the java client,
it deletes a record like this

// 6. Delete like this
cf.delete(userName, colFirst);
map = criteria.select();
Assert.assertEquals(2, map.get(userName).size());

Suppose now I have already created a database and stored it on my hard
drive. Next time before I create a new database, I want to delete all
the records in the old database. In the example above, I need to know
the userName and colFirst. I know colFirst, but don't know userName
(the keys). Now how can I get all the keys in previous database so
that I can delete all old records?

Thanks,

Dop Sun

unread,
Jun 8, 2010, 8:49:02 PM6/8/10
to jassandra-user
There are two different scenarios here:
1. you want delete the old one completely
2. you want to delete only some keys (along with columns)

For first one, you actually can shut down Cassandra, and remove the
file which is storing the data for the specific column family, then,
all data, keys and columns defined in that column family will be
deleted.

For second one, you may need first select those key out, using the
select API with key range. ICriteria can use keyList and keyRange.
While using keyRange, you can specific start and finish:

Here are quote from Thrift API:
starting with start, ending with finish (both inclusive) and at most
count long. The empty string ("") can be used as a sentinel value to
get the first/last existing key (or first/last column in the column
predicate parameter).

So, if you know the keys distribution, you can specify both start and
finish and do it batchly, but if you don't have pre-defined key range,
then, you can guess, or specify empty string as the start and end. But
if your database is huge, you may be forced to guess, like get keys
start from A first and getting B. Otherwise, cassandra will try to
return huge dataset, and out of memory exception may be returned.

Dop Sun

unread,
Jun 9, 2010, 8:32:06 AM6/9/10
to jassandra-user
criteria = cf.createCriteria();
criteria.keyRange("", "", 100);
criteria.columnRange(ByteArray.EMPTY, ByteArray.EMPTY, 100);
Map<String, List<IColumn>> listMap = criteria.select();
System.out.println(listMap.size());

The above code will returns key->column list mappings. And with all
these details, you can delete them later.

Cheers, Dop


On Jun 9, 7:44 am, caribbean <caribbean...@gmail.com> wrote:

caribbean

unread,
Jun 9, 2010, 12:08:26 PM6/9/10
to jassandra-user
Thank you for reply. Looks good.

One more question, what's the meaning of 100 here, is it the number of
keys? But if I don't know the number of keys in previous database, how
do I specify this parameter? Thanks,

Dop Sun

unread,
Jun 9, 2010, 7:25:43 PM6/9/10
to jassandra-user
Actually, if you don't know the number of keys, then, you may need a
loop.

criteria = cf.createCriteria();
criteria.keyRange("", "", 100);
criteria.columnRange(ByteArray.EMPTY, ByteArray.EMPTY, 100);
Map<String, List<IColumn>> listMap = criteria.select();

in the above example, you will get the 100 keys with 100 columns.

First you need to ensure that all columns are returned. Actually, you
can give the columns a very big number, unless you are not sure
whether you server memory can hold all column data. So, in most of the
case, let's put Integer.MAX_VALUE.

Secondly, look back to the number of keys. The query above returns
only 100 keys. There will be a trick here. Once you got the first 100
keys, then, using the biggest key returned as the start key, and using
Empty string as finish key, and do a second query. But be sure that
there is a duplication in this case, since the start key columns will
be returned again in the next query.

While I'm writing, I found that select returns a map, which does not
have sequence, it is inconvenience to find the biggest key. The
original API should be easier because it returned the value
sequentially.

I will make an enhancement. :-) At the meantime, you may need to
iterate the returned keys and find the biggest keys.

Cheers,
Dop

Dop Sun

unread,
Jun 9, 2010, 7:32:04 PM6/9/10
to jassandra-user
http://code.google.com/p/jassandra/issues/detail?id=26

Issue 26 created. I'm working on it, and please follow the changes.

On Jun 10, 12:08 am, caribbean <caribbean...@gmail.com> wrote:

caribbean

unread,
Jun 9, 2010, 7:49:39 PM6/9/10
to jassandra-user
Thanks, looking forward to hearing news from you.

Dop Sun

unread,
Jun 10, 2010, 9:53:56 AM6/10/10
to jassandra-user
r68 committed. New test cases added as SelectionTest. Please refer
testSelectPart() as sample for continuous querying.

Refer Issue 26 for change details.

Please try and let me know if you found any issues. I may create a new
build only for this soon (tomorrow? :))

Cheers,
Dop

Dop Sun

unread,
Jun 14, 2010, 7:21:11 PM6/14/10
to jassandra-user
Released.
Reply all
Reply to author
Forward
0 new messages