Bug: Underscores in primary keys and quote/unquote...

91 views
Skip to first unread message

jedie

unread,
Jun 26, 2007, 4:30:10 AM6/26/07
to Django developers

I have a model class like this:
----------------------------------------------------------------
class PagesInternal(models.Model):
name = models.CharField(primary_key=True, maxlength=150)
...
----------------------------------------------------------------

And my names (the primary keys) contains unterscore, like this:
"page_admin.edit_page"

I used no ID for the primary key, because i "addressed" the entries
about the names.

I used the unicode-branch and get a error, if i edit a entry in this
model with the django admin panel:
----------------------------------------------------------------
Traceback (most recent call last):
File "./django/core/handlers/base.py" in get_response
72. response = middleware_method(request, callback, callback_args,
callback_kwargs)
File "./PyLucid/middlewares/pagestats.py" in process_view
45. response = view_func(request, *view_args, **view_kwargs)
File "./django/contrib/admin/views/decorators.py" in _checklogin
55. return view_func(request, *args, **kwargs)
File "./django/views/decorators/cache.py" in _wrapped_view_func
39. response = view_func(request, *args, **kwargs)
File "./django/contrib/admin/views/main.py" in change_stage
324. manipulator = model.ChangeManipulator(object_id)
File "./django/db/models/manipulators.py" in __init__
261. self.original_object = self.manager.get(pk=obj_key)
File "./django/db/models/manager.py" in get
73. return self.get_query_set().get(*args, **kwargs)
File "./django/db/models/query.py" in get
258. obj_list = list(clone)
File "./django/db/models/query.py" in __iter__
111. return iter(self._get_data())
File "./django/db/models/query.py" in _get_data
478. self._result_cache = list(self.iterator())
File "./django/db/models/query.py" in iterator
186. cursor.execute("SELECT " + (self._distinct and "DISTINCT " or
"") + ",".join(select) + sql, params)
File "./django/db/backends/util.py" in execute
23. 'sql': smart_unicode(sql) % convert_args(params),
File "./django/db/backends/util.py" in convert_args
50. return tuple([to_unicode(val) for val in args])
File "./django/db/backends/util.py" in
48. to_unicode = lambda s: force_unicode(s, strings_only=True)
File "./django/utils/encoding.py" in force_unicode
42. s = unicode(s, encoding, errors)

UnicodeDecodeError at /_admin/PyLucid/pagesinternal/
page_admin.edit_page/
'utf8' codec can't decode byte 0xad in position 4: unexpected code
byte
----------------------------------------------------------------

But i think this is not a real UnicodeDecodeError... It's a problem
with the quote()/unquote() routines in
django.contrib.admin.views.main.py

The string before unquote() is..: page_admin.edit_page
The string after unquote() is...: page\ufffdmin.edit_page


In a local test, it seems to work fine:

----------------------------------------------------------------
from django.contrib.admin.views.main import quote, unquote

TEST_STRING = "page_admin.edit_page"

q = quote(TEST_STRING)
print "quote():", q
print "unquote():", unquote(q)
print
print "unquote()2:", unquote(TEST_STRING)
----------------------------------------------------------------


output:
----------------------------------------------------------------
quote(): page_5Fadmin.edit_5Fpage
unquote(): page_admin.edit_page

unquote()2: page\ufffdmin.edit_page
----------------------------------------------------------------


But, in my case, the real input for unquote() is not the quoted one
like "page_5Fadmin.edit_5Fpage"!
It is the non-quoted one like: "page_admin.edit_page"
And then the unquote() break the string into: "page
\ufffdmin.edit_page"

It this a django bug?

--
Mfg.

Jens Diemer


----
A django powered CMS: http://www.pylucid.org

Malcolm Tredinnick

unread,
Jun 26, 2007, 7:53:32 AM6/26/07
to django-d...@googlegroups.com
On Tue, 2007-06-26 at 01:30 -0700, jedie wrote:
>
> I have a model class like this:
> ----------------------------------------------------------------
> class PagesInternal(models.Model):
> name = models.CharField(primary_key=True, maxlength=150)
> ...
> ----------------------------------------------------------------
>
> And my names (the primary keys) contains unterscore, like this:
> "page_admin.edit_page"
>
> I used no ID for the primary key, because i "addressed" the entries
> about the names.
>

[...]

Except this isn't what your test produces. When I run that test program
against the Unicode branch, I got the same result as against trunk:
unquote(TEST_STRING) returns 'page\xadmin.edit_page'. And when Python
tries to interpret that as at UTF-8 string, it hits the illegal byte
\xad, giving the UnicodeDecodeError it reports. You are seeing \ufffd in
your output because of some terminal or Python shell settings you have
in effect (\ufffd is the Unicode "illegal character" replacement
codepoint). Have a look at the results of repr(unquote(TEST_STRING)) to
see the real data being passed around by Django.

> But, in my case, the real input for unquote() is not the quoted one
> like "page_5Fadmin.edit_5Fpage"!
> It is the non-quoted one like: "page_admin.edit_page"

So the real question is why is unquote() being called on that string,
since unquote() should only ever be called on strings that have been run
through quote() previously.

Since the bug is crash inside change_stage() in the same file, try to
work out what why the wrong string is being passed in there. This should
be just pieces of input captured from the URL (via admin/urls.py), so
this suggests that something is creating the wrong URL (and I thought we
would have noticed that previously: it's why quote() and unquote() were
written in the first place).

It might also be possible that you have just lucked onto the right piece
of data that demonstrates the problem. The unquote() function is
resilient to bad input and it's only by accident that the first two
characters the word "admin" happen to be valid hex digits, so "_ad" can
be treated as an escaped sequence.

So there might be a bug here, but a bit more poking around is required.
Try to figure out why an unquoted URL fragment is being passed to admin
in the first place (well, first check that the object_id being passed to
change_stage() really is unquoted already and work backwards from there
-- why is it?).

Regards,
Malcolm

--
How many of you believe in telekinesis? Raise my hand...
http://www.pointy-stick.com/blog/

jedie

unread,
Jun 26, 2007, 9:29:43 AM6/26/07
to Django developers
On 26 Jun., 13:53, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

> Since the bug is crash inside change_stage() in the same file, try to
> work out what why the wrong string is being passed in there. This should
> be just pieces of input captured from the URL (via admin/urls.py), so
> this suggests that something is creating the wrong URL (and I thought we
> would have noticed that previously: it's why quote() and unquote() were
> written in the first place).

No, i think the URL is ok.
It's the combination of "_ad"... Because the primary key "TEST_adTEST"
goes into the same traceback...

In /django/views/decorators/cache.py in _wrapped_view_func the String
is ok:
-----------------------------------------------------------------------
Line 39. | response = view_func(request, *args, **kwargs)
-----------------------------------------------------------------------
Local var "args" is -> ('PyLucid', 'pagesinternal', 'TEST_adTEST')

In the next trace /django/contrib/admin/views/main.py in change_stage
the String is broken:
-----------------------------------------------------------------------
Line 322. | manipulator = model.ChangeManipulator(object_id)
-----------------------------------------------------------------------
Local var "object_id" is -> 'TEST\xadTEST'

The method change_stage() always unquote the object_id in Line 310!
And repr(unquote("TEST_adTEST")) is -> 'TEST\xadTEST'


Note: I used the unicode branch. Does the error related to this
branch?

I implement a silly work-a-round (rename the primary keys on the fly):
http://pylucid.net/trac/changeset/1110

Malcolm Tredinnick

unread,
Jun 26, 2007, 9:37:51 AM6/26/07
to django-d...@googlegroups.com
On Tue, 2007-06-26 at 06:29 -0700, jedie wrote:
> On 26 Jun., 13:53, Malcolm Tredinnick <malc...@pointy-stick.com>
> wrote:
> > Since the bug is crash inside change_stage() in the same file, try to
> > work out what why the wrong string is being passed in there. This should
> > be just pieces of input captured from the URL (via admin/urls.py), so
> > this suggests that something is creating the wrong URL (and I thought we
> > would have noticed that previously: it's why quote() and unquote() were
> > written in the first place).
>
> No, i think the URL is ok.
> It's the combination of "_ad"... Because the primary key "TEST_adTEST"
> goes into the same traceback...

When I asked "is the URL okay", I meant, has it been passed through
quote() correctly. Because unquote() is only meant to be run on things
that have been passed through quote() and because any primary key value
is meant to be passed through quote() before it goes into an admin URL,
something is going wrong in the URL generation phase.

I realise it's the combination of "_ad" that is the problem. That was in
my original mail. The problem is to work out *why* the string "_ad" is
being permitted into the URL in the first place. Why isn't it going
through quote()? This might take some tracking down. It's probably a bug
in Django, but any investigation you can do before filing a ticket will
save somebody else some time later.

There should be no difference between the Unicode branch and trunk in
that behaviour, I would have thought, but file the ticket against the
Unicode branch in any case, just in case it makes a difference.

Malcolm

--
The hardness of butter is directly proportional to the softness of the
bread.
http://www.pointy-stick.com/blog/

jedie

unread,
Jun 29, 2007, 2:28:05 AM6/29/07
to Django developers
I don't know how to find this bug. So i create a ticket for this:
http://code.djangoproject.com/ticket/4725

Reply all
Reply to author
Forward
0 new messages