Did token handling change?

Taylor Gronka

unread,

Feb 2, 2015, 6:30:04 AM2/2/15

to python-dr...@lists.datastax.com

Hello,

Did the expected object type for tokens change from bytearray (as blobs) to something else?

I am running development code that was working fine earlier this month. I changed operating systems (arch to gentoo), reinstalled cassandra and the cassandra-driver, and I'm getting an odd error now. I tried rolling back to cassandra 2.0.9 and cassandra-driver-2.0.10 and 2.0.2 (among others) - although my code was running fine on cassandra-2.1.2 and cassandra-driver-2.1.2.

Here is the error:

[pid: 16478|app: 0|req: 1/1] 127.0.0.1 () {50 vars in 886 bytes} [Mon Feb 2 05:41:09 2015] POST / db_show_bounded_items.json => generated 0 bytes in 6 msecs (HTTP/1.1 500) 0 headers in 0 bytes (0 switches on core 0)
lngdif = 1.816864013671875
SELECT title, ranking, itemid FROM events.items WHERE token(muid) >= ? AND token(muid) <= ? LIMIT 30;
['62c13d000000308c8450000000000000', '62c13d000000308c8850000000000000']
Traceback (most recent call last):
File "/var/env/tacle/lib64/python3.4/site-packages/cassandra/query.py", line 460, in bind
    self.values.append(col_type.serialize(value))
struct.error: required argument is not an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/var/env/tacle/lib64/python3.4/site-packages/pyramid-1.5.2-py3.4.egg/pyramid/router.py", line 242, in __call__
    response = self.invoke_subrequest(request, use_tweens=True)
File "/var/env/tacle/lib64/python3.4/site-packages/pyramid-1.5.2-py3.4.egg/pyramid/router.py", line 217, in invoke_subrequest
    response = handle_request(request)
File "/var/env/tacle/lib64/python3.4/site-packages/pyramid-1.5.2-py3.4.egg/pyramid/tweens.py", line 21, in excview_tween
    response = handler(request)
File "/var/env/tacle/lib64/python3.4/site-packages/pyramid-1.5.2-py3.4.egg/pyramid/router.py", line 163, in handle_request
    response = view_callable(context, request)
File "/var/env/tacle/lib64/python3.4/site-packages/pyramid-1.5.2-py3.4.egg/pyramid/config/views.py", line 355, in rendered_view
    result = view(context, request)
File "/var/env/tacle/lib64/python3.4/site-packages/pyramid-1.5.2-py3.4.egg/pyramid/config/views.py", line 491, in _class_view
    response = getattr(inst, attr)()
File "/var/www/tacle/tacboard/db_views.py", line 102, in db_show_bounded_items
    querycols=querycols))
File "/var/www/tacle/items/tac.py", line 352, in query_muid_list
    uuid.UUID(muid_pair[1]),
File "/var/env/tacle/lib64/python3.4/site-packages/cassandra/query.py", line 361, in bind
    return BoundStatement(self).bind(values)
File "/var/env/tacle/lib64/python3.4/site-packages/cassandra/query.py", line 468, in bind
    raise TypeError(message)
TypeError: Received an argument of invalid type for column "partition key token". Expected: <class 'cassandra.cqltypes.LongType'>, Got: <class 'bytearray'>
[pid: 16478|app: 0|req: 2/2] 127.0.0.1 () {50 vars in 886 bytes} [Mon Feb 2 05:52:20 2015] POST /db_show_bounded_items.json => generated 0 bytes in 4 msecs (HTTP/1.1 500) 0 headers in 0 bytes (0 switches on core 0)

Here is the relevant code (sorry if the formatting ends up messy):

        if type(muid_list) is list:
            st = ("SELECT {columns} FROM {keyspace}.{table} "
                  "WHERE token(muid) >= ? AND token(muid) <= ? "
                  "LIMIT {limit};").format(columns=", ".join(querycols),
                                           keyspace=self.keyspace,
                                           table=table,
                                           limit=limit)
            bs = CaSession.session.prepare(st)
            futures = []

            print(st)       <-------- this is part of the printout
            for muid_pair in muid_list:
                print(str(muid_pair))        <-------- these are the two hex numbers
                futures.append(
                    CaSession.session.execute_async(
                        bs.bind((
                                bytearray.fromhex(muid_pair[0]), <------- error occurs here
                                bytearray.fromhex(muid_pair[1]),
                                #int(muid_pair[0], 16),
                                #int(muid_pair[1], 16),
                        ))))

However, casting to an integer doesn't work either.. looking at metadata.py, it looks like the driver is expecting a 64 bit int (juding by LongType using 64_pack and 64_unpack). This doesn't make sense, since Cassandra tokens are 128 bits.

What I really don't understand is why downgrading isn't working - perhaps bytearrays were only used for tokens for one version?

Thanks,

Adam Holmberg

unread,

Feb 2, 2015, 10:11:03 AM2/2/15

to python-dr...@lists.datastax.com

Are you using a different partitioner in Cassandra? The current default (murmur3) only uses a 64-bit token. If you were using RandomPartitioner previously, this could explain the difference in tokens.

Another idea: When switching systems, did you change Python versions as well? If you switched from Python 2 to Python 3 it's possible you've found something that's not handled correctly for 3. If this turns out to be it, we would welcome a bug report, with versions, example schema, and a script to reproduce.

https://datastax-oss.atlassian.net/browse/PYTHON/

Thanks,

Adam Holmberg

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.

Taylor Gronka

unread,

Feb 2, 2015, 3:24:03 PM2/2/15

to python-dr...@lists.datastax.com

No, sorry. I'm still using Python 3.4 and I have always used the ByteOrderedPartitioner - I know it's not recommended, but it works well for me. Thinking about it now, I think the concept I'm using should stable enough to port over to column slicing.. and they have the cassandra sub-objects, which if I remember correctly they didn't have when I started out, and I didn't feel like interpreting json and then doing the next level of filtering.

I have made sure to remove the data folder between reinstalls of cassandra (although my cassandra.yaml isn't being overwritten between installs, so every time it boots it should be byte-ordered). And I have used this same setup on arch, debian, and centos, although it has been roughly 3 months since I've used a fresh setup.

I can see if the gentoo python use flags make a difference, but I don't think they will - it seems like a pretty straightforward problem of using a 64 bit token. I have also toyed with the CQL version and protocol versions to see if they made a difference.

Oh! Actually, I misread your response. The murmur3 hash also uses 128 bit tokens, I'm 95% sure. I don't think 64 bit tokens are even an option for murmur3, unless the cassandra team extended it. I think this is also illustrated by the python driver using 128 bit python UUID's. I remember reading a bug report where they talked about upgrading from murmur2 to murmur3 around cassandra 1.2 - and murmur2 has a 64 bit option.

What gets me now is that this part of my code has worked as expected since around last July, yet now I seem to have an error with all previous versions. Maybe I can dissect cassandra.marshal, maybe add a 128-bit type.

Thanks,

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.

Taylor Gronka

unread,

Feb 2, 2015, 3:35:05 PM2/2/15

to python-dr...@lists.datastax.com

Oh, according to herehttp://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePartitionerAbout_c.htm
According to this, murmur3 is 64 bit, and MD5 and ByteOrdered are 128 bit.

To be honest though I think that might be an error, since with a quick google search I'm not finding a 64 bit murmur3 hash, which is a bit lazy of me; but more importantly, the python driver interfaces with row keys that can be fetched and stored according to 128-bit UUIDs, as detailed in the python-driver documentation:
http://www.datastax.com/documentation/developer/python-driver/2.1/pdf/pythondriver21.pdf

Apologies if I'm seriously missing something here.

Tyler Hobbs

unread,

Feb 3, 2015, 12:37:06 PM2/3/15

to python-dr...@lists.datastax.com

Cassandra uses 64 bits for murmur3. It does this by using the 128-bit murmur3 variant and taking the first 64 bytes.

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.

--

Tyler Hobbs
DataStax

Taylor Gronka

unread,

Feb 3, 2015, 2:08:02 PM2/3/15

to python-dr...@lists.datastax.com

Great! Thanks, clears up a lot, and is good for me to know.

I found my issue - the gentoo overlay I was installing from places the configuration file at /opt/cassandra/conf/cassandra.yaml instead of in /etc/cassandra, so I suppose the python-driver was in fact loading the murmur3 token.

Thanks for the help.

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsubscribe@lists.datastax.com.

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.

--
Tyler Hobbs
DataStax

Reply all

Reply to author

Forward