...
File "/pegasus/code/current/django/core/cache/backends/memcached.py" in set
48. self._cache.set(key, value, timeout or self.default_timeout)
File "/usr/lib/python2.5/site-packages/memcache.py" in set
305. return self._set("set", key, val, time)
File "/usr/lib/python2.5/site-packages/memcache.py" in _set
328. fullcmd = "%s %s %d %d %d\r\n%s" % (cmd, key, flags, time, len(val), val)
UnicodeDecodeError at /
'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
What's going on here is that the memcache.py library does this with
the passed parameters:
fullcmd = "%s %s %d %d %d\r\n%s" % (cmd, key, flags, time, len(val), val)
Since "key" is often a unicode string, it infects, as it were, the
rest of the line, forcing "val" to be encoded, then decoded.
It may be that only the memcache backend has this problem, but the
general solution I'd suggest is to use smart_str on the key given to
each low-level cache's backend set method. Works-for-me.
It may also make sense to run on the value, but I imagine that has a
significant overhead, and I haven't had a problem with it yet....
To be clear, if this is accepted as a solution, I'm happy to make a
ticket and patch.
I thought I understood the problem until I read this sentence. Now my
brain hurts. I fully understand that the whole string is treated as
Unicode as soon as one argument is Unicode. Why is "val" the problem
here then? What sort of object is "val" and why doesn't unicode(val)
work (aah ... is is going via str(val) and val is non-ASCII? That could
do it).
The error in the traceback suggests it is trying to treat something
*not* as Unicode. I'm a little fuzzy on what's going on.
> It may be that only the memcache backend has this problem, but the
> general solution I'd suggest is to use smart_str on the key given to
> each low-level cache's backend set method. Works-for-me.
Hasn't actually occurred to me to check previously: can memcache handle
non-ASCII data there, because even converting to UTF-8 is going to give
values that are not always understandable to the ascii codec.
> It may also make sense to run on the value, but I imagine that has a
> significant overhead, and I haven't had a problem with it yet....
Assuming the missing key part of this sentence is force_unicode(), it
should be not really worse than running smart_str() (about one extra
function call), from first glance. However, as indicated above, I'll
admit to being sketchy about the real problem still.
If you can guarantee that str(val) will always make sense and be encoded
as UTF-8, then your proposed solution sounds fine. The encoding of
str(val) is important, because we have to able to understand it when we
pull it out from the cache again later.
Regards,
Malcolm
--
Works better when plugged in.
http://www.pointy-stick.com/blog/
Sorry for not giving more context.
In that quoted line, cmd is a str (created by the library itself), key
is whatever the low-level django API passes in (very likely a
Unicode), and val is a pickled object (that is, arbitrary binary).
When key is Unicode, it forces val to be decoded into Unicode, which
fails, since it's a binary.
At least, I'm pretty darn sure. I *think* I understand this bit-pushing. :)
> Hasn't actually occurred to me to check previously: can memcache handle
> non-ASCII data there, because even converting to UTF-8 is going to give
> values that are not always understandable to the ascii codec.
>
/me checks python-memcache code.
python-memcache assumes a str key with no control characters (ord(c)
>= 33) and len(key) < 250.
The stored value can be any object, but there are a few optimizations.
This is how the marshalling is done:
if isinstance(val, types.StringTypes):
pass
elif isinstance(val, int):
flags |= Client._FLAG_INTEGER
val = "%d" % val
elif isinstance(val, long):
flags |= Client._FLAG_LONG
val = "%d" % val
else:
flags |= Client._FLAG_PICKLE
val = pickle.dumps(val, 2)
fullcmd = "%s %s %d %d %d\r\n%s" % (cmd, key, flags, time, len(val), val)
The result, fullcmd, is then sent over the wire.
So, my assertion is that key is the only possible unicode value, and
that it better be coercable to str using sys.getdefaultencoding(),
because otherwise the string format will die.
cmd, flags, time, len(val), and val must all be str or unicode (it's
odd that they have StringTypes there, when they clearly don't handle a
Unicode value in the general sense).
My understanding is that smart_str forces a unicode value to str using
encoding='utf-8', and is a no-op when passed a str.
I want to make sure that all parameters there are str; I'm pretty
confident "key" is the only non-str object.
> The encoding of
> str(val) is important, because we have to able to understand it when we
> pull it out from the cache again later.
I agree, but I don't want to mess with val; I want to force encoding of "key".
Clearer?
Okay. That makes things clearer. Memcache is expecting to handle val as
an opaque sequence of bytes here (they are using the binary pickling
format), which is the key point. So your proposed fix looks right to me.
> When key is Unicode, it forces val to be decoded into Unicode, which
> fails, since it's a binary.
Yes, I agree.
Regards,
Malcolm
--
Honk if you love peace and quiet.
http://www.pointy-stick.com/blog/
When running django unicode the value returned from the database was
valid. However when running the non unicode version of django it'd
blow up in my face.
trying to do a .set with bytestrings that contain non ascii char
values doesn't work. It has to do a .encode('UTF-8') on the string I
was attempting to push into memcached. and likewise on pulling it
back out I had to do a .decode('UTF-8').
Then you do Unicode.encode('utf-8'), you are creating a bytestring
with non-ascii char values. I'm not sure how your statements can be
simultaneously true.
At any rate, I'll get a patch done soon.
On Jul 12, 11:24 am, "Jeremy Dunck" <jdu...@gmail.com> wrote:
PythonHandler django.core.handlers.modpython:
MemcachedStringEncodingError: Keys must be str()'s, not unicode.
Convert your unicode strings using mystring.encode(charset)!
There's a few patches there which force the keys to ASCII, but this
may not be the best solution.
--Simon
I've attached my patch and tests to that ticket.