Trouble working with legacy data

24 views
Skip to first unread message

Erik Forsberg

unread,
Nov 13, 2014, 3:06:16 AM11/13/14
to python-dr...@lists.datastax.com
Hi!

Hi!

I have some data in a table created using thrift. In cassandra-cli, the
'show schema' output for this table is:

create column family Users
with column_type = 'Standard'
and comparator = 'AsciiType'
and default_validation_class = 'UTF8Type'
and key_validation_class = 'LexicalUUIDType'
and column_metadata = [
{column_name : 'date_created',
validation_class : LongType},
{column_name : 'active',
validation_class : IntegerType,
index_name : 'Users_active_idx_1',
index_type : 0},
{column_name : 'email',
validation_class : UTF8Type,
index_name : 'Users_email_idx_1',
index_type : 0},
{column_name : 'username',
validation_class : UTF8Type,
index_name : 'Users_username_idx_1',
index_type : 0},
{column_name : 'default_account_id',
validation_class : LexicalUUIDType}];

From cqlsh, it looks like this:

[cqlsh 4.1.1 | Cassandra 2.0.11 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh:test> describe table Users;

CREATE TABLE "Users" (
key 'org.apache.cassandra.db.marshal.LexicalUUIDType',
column1 ascii,
active varint,
date_created bigint,
default_account_id 'org.apache.cassandra.db.marshal.LexicalUUIDType',
email text,
username text,
value text,
PRIMARY KEY ((key), column1)
) WITH COMPACT STORAGE;

CREATE INDEX Users_active_idx_12 ON "Users" (active);

CREATE INDEX Users_email_idx_12 ON "Users" (email);

CREATE INDEX Users_username_idx_12 ON "Users" (username);

Now, when I try to extract data from this using cqlsh or the
python-driver, I have no problems getting data for the columns which are
actually UTF8,but for those where column_metadata have been set to
something else, there's trouble. Example using the python driver:

-- snip --

In [8]: u = uuid.UUID("a6b07340-047c-4d4c-9a02-1b59eabf611c")

In [9]: sess.execute('SELECT column1,value from "Users" where key = %s
and column1 = %s', [u, 'username'])
Out[9]: [Row(column1='username', value=u'uc6vf')]

In [10]: sess.execute('SELECT column1,value from "Users" where key = %s
and column1 = %s', [u, 'date_created'])
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-10-d06f98a160e1> in <module>()
----> 1 sess.execute('SELECT column1,value from "Users" where key = %s
and column1 = %s', [u, 'date_created'])

/home/forsberg/dev/virtualenvs/ospapi/local/lib/python2.7/site-packages/cassandra/cluster.pyc
in execute(self, query, parameters, timeout, trace)
1279 future = self.execute_async(query, parameters, trace)
1280 try:
-> 1281 result = future.result(timeout)
1282 finally:
1283 if trace:

/home/forsberg/dev/virtualenvs/ospapi/local/lib/python2.7/site-packages/cassandra/cluster.pyc
in result(self, timeout)
2742 return PagedResult(self, self._final_result)
2743 elif self._final_exception:
-> 2744 raise self._final_exception
2745 else:
2746 raise OperationTimedOut(errors=self._errors,
last_host=self._current_host)

UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 6:
unexpected end of data

-- snap --

cqlsh gives me similar errors.

Can I tell the python driver to parse some column values as integers, or
is this an unsupported case?

For sure this is an ugly table, but I have data in it, and I would like
to avoid having to rewrite all my tools at once, so if I could support
it from CQL that would be great.

Thanks!
\EF

Adam Holmberg

unread,
Nov 24, 2014, 11:28:53 AM11/24/14
to python-dr...@lists.datastax.com
When I create a CF using the cli schema specified, it results in different CQL than you're showing ('column1' and 'value' are not present). Has the CF meta changed? Is this the result of an earlier upgrade?

What you are trying to do should be possible. If the schema types have somehow diverged from what's serialized, you may be able to correct by altering the table:
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/alter_table_r.html

Please let us know if you are able to reproduce this from scratch.

Adam

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.

Reply all
Reply to author
Forward
0 new messages