pycassa encoding issue + mutation_map issue

154 views
Skip to first unread message

Kumar Ranjan

unread,
May 21, 2013, 12:26:34 PM5/21/13
to pycassa...@googlegroups.com
Been stuck on this issue for a week and can't get a clue to this, So, I am using pycassa batch Mutator to insert/send. It complains of 2 issues:

1. Looks like encoding issue but I tried to encode the associated key's value but still fails. I tried to encode using encode('UTF-8') the text field and it shows, getting converted from 'encoding' to 'str' but still fails? Exact error is here, 'ascii' codec can't encode character u'\xbf' in position 0: ordinal not in range(128) 

Traceback:

Traceback (most recent call last):  File "/opt/socialflow/prod/api-reporting/api-reporting/CassFH/app/c.py", line 40, in send    Mutator.send(self, *a, **kw)  File "/usr/local/lib/python2.6/dist-packages/pycassa/batch.py", line 126, in send    allow_retries=self.allow_retries)  

File "/usr/local/lib/python2.6/dist-packages/pycassa/pool.py", line 124, in new_f    result = f(self, *args, **kwargs)  
File "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/Cassandra.py", line 1005, in batch_mutate    self.send_batch_mutate(mutation_map, consistency_level)  
File "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/Cassandra.py", line 1013, in send_batch_mutate    args.write(self._oprot)  
File "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/Cassandra.py", line 5200, in write    oprot.trans.write(fastbinary.encode_binary(self, (self.__class__, self.thrift_spec)))UnicodeEncodeError: 'ascii' codec can't encode character u'\xbf' in position 0: ordinal not in range(128)[2013-05-20 21:31:14,450] root CRITICAL:

2. Required field 'mutation_map' was not present! Struct: batch_mutate_args(mutation_map:null, consistency_level:ONE) None /opt/socialflow/prod/api-reporting/api-reporting/CassFH/app/c.py line 44.

Would you enlighten me, so I can proceed with solving this?

Tyler Hobbs

unread,
May 21, 2013, 12:42:53 PM5/21/13
to pycassa...@googlegroups.com
On Tue, May 21, 2013 at 11:26 AM, Kumar Ranjan <winne...@gmail.com> wrote:

1. Looks like encoding issue but I tried to encode the associated key's value but still fails. I tried to encode using encode('UTF-8') the text field and it shows, getting converted from 'encoding' to 'str' but still fails? Exact error is here, 'ascii' codec can't encode character u'\xbf' in position 0: ordinal not in range(128) 

Traceback:

Traceback (most recent call last):  File "/opt/socialflow/prod/api-reporting/api-reporting/CassFH/app/c.py", line 40, in send    Mutator.send(self, *a, **kw)  File "/usr/local/lib/python2.6/dist-packages/pycassa/batch.py", line 126, in send    allow_retries=self.allow_retries)  

File "/usr/local/lib/python2.6/dist-packages/pycassa/pool.py", line 124, in new_f    result = f(self, *args, **kwargs)  
File "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/Cassandra.py", line 1005, in batch_mutate    self.send_batch_mutate(mutation_map, consistency_level)  
File "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/Cassandra.py", line 1013, in send_batch_mutate    args.write(self._oprot)  
File "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/Cassandra.py", line 5200, in write    oprot.trans.write(fastbinary.encode_binary(self, (self.__class__, self.thrift_spec)))UnicodeEncodeError: 'ascii' codec can't encode character u'\xbf' in position 0: ordinal not in range(128)[2013-05-20 21:31:14,450] root CRITICAL:

Is this supposed to be a UTF8Type field?  If so, pycassa is just expecting a 'unicode' or string object and will call .encode('utf-8') on it itself (although it handles it already being utf8-encoded for all cases I've seen so far).
 

2. Required field 'mutation_map' was not present! Struct: batch_mutate_args(mutation_map:null, consistency_level:ONE) None /opt/socialflow/prod/api-reporting/api-reporting/CassFH/app/c.py line 44.

This is probably a consequence of 1 when it's sent after hitting the above error.  Are you using the batch as a context manager (i.e. using the "with" statement)?


--
Tyler Hobbs
DataStax

Kumar Ranjan

unread,
May 21, 2013, 1:47:33 PM5/21/13
to pycassa...@googlegroups.com


On Tuesday, May 21, 2013 12:42:53 PM UTC-4, Tyler Hobbs wrote:

On Tue, May 21, 2013 at 11:26 AM, Kumar Ranjan <winne...@gmail.com> wrote:

1. Looks like encoding issue but I tried to encode the associated key's value but still fails. I tried to encode using encode('UTF-8') the text field and it shows, getting converted from 'encoding' to 'str' but still fails? Exact error is here, 'ascii' codec can't encode character u'\xbf' in position 0: ordinal not in range(128) 

Traceback:

Traceback (most recent call last):  File "/opt/socialflow/prod/api-reporting/api-reporting/CassFH/app/c.py", line 40, in send    Mutator.send(self, *a, **kw)  File "/usr/local/lib/python2.6/dist-packages/pycassa/batch.py", line 126, in send    allow_retries=self.allow_retries)  

File "/usr/local/lib/python2.6/dist-packages/pycassa/pool.py", line 124, in new_f    result = f(self, *args, **kwargs)  
File "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/Cassandra.py", line 1005, in batch_mutate    self.send_batch_mutate(mutation_map, consistency_level)  
File "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/Cassandra.py", line 1013, in send_batch_mutate    args.write(self._oprot)  
File "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/Cassandra.py", line 5200, in write    oprot.trans.write(fastbinary.encode_binary(self, (self.__class__, self.thrift_spec)))UnicodeEncodeError: 'ascii' codec can't encode character u'\xbf' in position 0: ordinal not in range(128)[2013-05-20 21:31:14,450] root CRITICAL:

Is this supposed to be a UTF8Type field?  If so, pycassa is just expecting a 'unicode' or string object and will call .encode('utf-8') on it itself (although it handles it already being utf8-encoded for all cases I've seen so far).

text field validator is not defined means, it can take encode('UTF-8') str type. Though I tried declaring 'text' field validator as UTF8Type() and insert encoded string but still throws the same error. Not sure, why? 

Kumar Ranjan

unread,
May 21, 2013, 5:20:35 PM5/21/13
to pycassa...@googlegroups.com
Just noticed something, The field where it's failing does not have default type defined and as per doc, cassandra will try to store it as Hex byte arrays (ByteType) where as I am trying to insert UTF-8 encoded string, Could this be a problem?

Tyler Hobbs

unread,
May 21, 2013, 6:46:40 PM5/21/13
to pycassa...@googlegroups.com

On Tue, May 21, 2013 at 4:20 PM, Kumar Ranjan <winne...@gmail.com> wrote:
Just noticed something, The field where it's failing does not have default type defined and as per doc, cassandra will try to store it as Hex byte arrays (ByteType) where as I am trying to insert UTF-8 encoded string, Could this be a problem?

BytesType only expects hex in cassandra-cli, not from normal clients.  I suspect your problem is that you're passing a unicode string for that field.  Make sure you're not doing something like 'mystr.encode('utf-8')' and expecting it to change mystr -- strings are immutable in python.

If you can't figure out what's going on, try to make a quick script to reproduce and I'll take a look.

--
Tyler Hobbs
DataStax

Kumar Ranjan

unread,
May 21, 2013, 8:30:52 PM5/21/13
to pycassa...@googlegroups.com
This is what I am doing, full_mention is a dictionary

Line 1: full_mention['text'] is unicode
Line 2: full_mention['text'] = full_mention['text'].encode('UTF-8')  --> This is what I do, ????
Line 3: full_mention['text'] is unicode
Line 4: self.insert(full_mention) --> batches and calls send from Mutator 

Thanks Tyler


--
You received this message because you are subscribed to a topic in the Google Groups "pycassa-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pycassa-discuss/ibC2CzLTFIY/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to pycassa-discu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Kumar Ranjan

unread,
May 22, 2013, 1:12:13 PM5/22/13
to pycassa...@googlegroups.com
This issue has been fixed. Thank you Tyler for spending time to help. So, here was the issue.

- Encoding issue existed in couple of column families for same field called tweet text, which can have non-ascii
  characters.
- I used, pycassa Mutator to batch requests across multiple column families
- So, I fixed encoding issue for 2 column families but failed to do so for rest of 3 CFs.
- So batch insertion fails for all because it failed for 1 in Pycassa batch Mutator.

Hope it will help you all.

Tyler Hobbs

unread,
May 22, 2013, 8:10:04 PM5/22/13
to pycassa...@googlegroups.com
Ah, interesting.  Thanks for sharing your fix.


You received this message because you are subscribed to the Google Groups "pycassa-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pycassa-discu...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Tyler Hobbs
DataStax
Reply all
Reply to author
Forward
0 new messages