Help with request encoding (again)

117 views
Skip to first unread message

John M

unread,
May 29, 2008, 3:54:24 AM5/29/08
to Django users
I've run into some more trouble with what I think is an encoding
issue?

I am writting a bittorrent tracker, and one of the GET parameters that
is passed is called info_hash, which is a lengthy / escaped hex
string, for example:

GET /announce?info_hash=%EByXm%C5%7EmfD%D8%9A%91%D4%F7%C7%86%C7%D18%A7

the django / python code I am using to decode this is:

request.encoding = "iso-8859-1"
hash = request.GET['info_hash'].encode('iso-8859-1').encode("hex")

But when I print the hash field, i get this:

efbfbd6defbfbd6d6644d89aefbfbdefbfbdc786efbfbd38efbfbd

where it should be

eb79586dc57e6d6644d89a91d4f7c786c7d138a7

I had it working for a while, but I'm not sure what changed to 'break'
this.

Any help would be greatly appreciated

Karen Tracey

unread,
May 29, 2008, 10:04:28 AM5/29/08
to django...@googlegroups.com

First, the code you've included doesn't produce the answer you show, so I'm a little confused about what your code is really doing.  In a python shell:

>>> s = '%EByXm%C5%7EmfD%D8%9A%91%D4%F7%C7%86%C7%D18%A7'
>>> s.encode('iso8859-1').encode('hex')
'25454279586d2543352537456d664425443825394125393125443425463725433725383625433725443138254137'

Where the 'efbbd...' is coming from I don't understand.  But doing anything with an encoding like iso8859-1 isn't what you want to do anyway.  You just want to un-do the percent-encoding and convert (encode) into the hex representation, using just the plain ASCII values for anything that wasn't percent-encoded:

>>> urllib.unquote(s).encode('hex')
'eb79586dc57e6d6644d89a91d4f7c786c7d138a7'

so try:
 
import urllib
hash = urllib.unquote(request.GET['info_hash']).encode('hex')

Karen

John M

unread,
May 29, 2008, 10:16:04 AM5/29/08
to Django users
Karen,

Thanks for the response, but I'm unable to unquote the strings as
django does it automatically for me when I get the GET list entries.

I've tried to run it from django, insert an assert false and give you
the answer

Here's the URL that was entered:
http://127.0.0.1:8000/announce/?info_hash=%EByXm%C5%7EmfD%D8%9A%91%D4%F7%C7%86%C7%D18%A7&peer_id=-TR1210-1q4q9aomcne9&port=51413&uploaded=0&downloaded=0&corrupt=0&left=0&compact=1&numwant=80&key=jda2jjrygr&event=started

Here's the URL handler
def view_announce(request):
# request.encoding = "iso-8859-1"
request.encoding = "latin-1"

#client information
c = {}

try:
#if the client explicitly specifies its preferred IP address, then
#use that. Else just take the address.
c['addr'] = request.GET.get('ip', None) or
request.META['REMOTE_ADDR']
c['port'] = int(request.GET['port'])
c['peer_id'] = request.GET['peer_id']

# JFM, hack arond unicode issues with info_hash in django
hash = request.GET['info_hash'].encode('iso8859-1')

except KeyError:
#if any of these is not defined, then it makes little sense to
proceed.
return _fail("""One of the following is missing: IP, Port, Peer_ID
or info_hash""")

print "Got all keys..."
print "hash " + hash

assert False

Here's the dump from Assert false:
Request information

GET
Variable Value
uploaded
u'0'
compact
u'1'
numwant
u'80'
info_hash
u'\xebyXm\xc5~mfD\xd8\x9a\x91\xd4\xf7\xc7\x86\xc7\xd18\xa7'
event
u'started'
downloaded
u'0'
key
u'jda2jjrygr'
corrupt
u'0'
peer_id
u'-TR1210-1q4q9aomcne9'
port
u'51413'
left
u'0'
POST
No POST data
COOKIES
Variable Value
sessionid
'f356ef76d6c20c3be39aec3d29ff023a'
META
Variable Value
Apple_PubSub_Socket_Render
'/tmp/launch-1GTNEW/Render'
COMMAND_MODE
'unix2003'
CONTENT_LENGTH
''
CONTENT_TYPE
'text/plain'
DISPLAY
'/tmp/launch-eJizQ9/:0'
DJANGO_SETTINGS_MODULE
'tracker.settings'
GATEWAY_INTERFACE
'CGI/1.1'
HOME
'/Users/John'
HTTP_ACCEPT
'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/
plain;q=0.8,image/png,*/*;q=0.5'
HTTP_ACCEPT_ENCODING
'gzip, deflate'
HTTP_ACCEPT_LANGUAGE
'en-us'
HTTP_CACHE_CONTROL
'max-age=0'
HTTP_CONNECTION
'keep-alive'
HTTP_COOKIE
'sessionid=f356ef76d6c20c3be39aec3d29ff023a'
HTTP_HOST
'127.0.0.1:8000'
HTTP_USER_AGENT
'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_2; en-us) AppleWebKit/
525.18 (KHTML, like Gecko) Version/3.1.1 Safari/525.18'
LANG
'en_US.UTF-8'
LOGNAME
'John'
MANPATH
'/usr/share/man:/usr/local/share/man:/usr/X11/man'
OLDPWD
'/Users/John/development'
PATH
'/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin'
PATH_INFO
'/announce/'
PWD
'/Users/John/development/tracker'
QUERY_STRING
'info_hash=%EByXm%C5%7EmfD%D8%9A%91%D4%F7%C7%86%C7%D18%A7&peer_id=-
TR1210-1q4q9aomcne9&port=51413&uploaded=0&downloaded=0&corrupt=0&left=0&compact=1&numwant=80&key=jda2jjrygr&event=started'
REMOTE_ADDR
'127.0.0.1'
REMOTE_HOST
''
REQUEST_METHOD
'GET'
RUN_MAIN
'true'
SCRIPT_NAME
''
SECURITYSESSIONID
'6939d0'
SERVER_NAME
'localhost'
SERVER_PORT
'8000'
SERVER_PROTOCOL
'HTTP/1.1'
SERVER_SOFTWARE
'WSGIServer/0.1 Python/2.5.1'
SHELL
'/bin/bash'
SHLVL
'1'
SSH_AUTH_SOCK
'/tmp/launch-qjIQx5/Listeners'
TERM
'xterm-color'
TERM_PROGRAM
'Apple_Terminal'
TERM_PROGRAM_VERSION
'240'
TMPDIR
'/var/folders/v-/v-IkkU9PHbeIn98mRd6ZM++++TI/-Tmp-/'
TZ
'America/Los_Angeles'
USER
'John'
_
'./manage.py'
__CF_USER_TEXT_ENCODING
'0x1F5:0:0'
wsgi.errors
<open file '<stderr>', mode 'w' at 0x170b0>
wsgi.file_wrapper
<class 'django.core.servers.basehttp.FileWrapper'>
wsgi.input
<socket._fileobject object at 0x6d5bb0>
wsgi.multiprocess
False
wsgi.multithread
True
wsgi.run_once
False
wsgi.url_scheme
'http'
wsgi.version
(1, 0)



john

On May 29, 7:04 am, "Karen Tracey" <kmtra...@gmail.com> wrote:

Karen Tracey

unread,
May 29, 2008, 12:00:38 PM5/29/08
to django...@googlegroups.com
On Thu, May 29, 2008 at 10:16 AM, John M <retire...@gmail.com> wrote:

Karen,

Thanks for the response, but I'm unable to unquote the strings as
django does it automatically for me when I get the GET list entries.

Oh right, sigh, Django has already helpfully done the un-qoting and converted to unicode for the request.GET dictionary.  So, I'd go back to the raw query string (accessible as request.META['QUERY_STRING']) and use Python's cgi.parse_qs to do the parsing/unquoting but not the unicode step -- instead do the hex encoding on what parse_qs returns:

from cgi import parse_qs
hash = parse_qs(request.META['QUERY_STRING'])['info_hash'][0].encode('hex')

Karen

John M

unread,
May 29, 2008, 1:16:08 PM5/29/08
to Django users
Ill give that a try, but seems like I'm working against the framework
doesn't it?

I could swear I was able to set the encoding, GET the fields I wanted
and then do it another way, but I need to move on in my coding, I hope
this works.

J

On May 29, 9:00 am, "Karen Tracey" <kmtra...@gmail.com> wrote:

Karen Tracey

unread,
May 29, 2008, 2:15:09 PM5/29/08
to django...@googlegroups.com
On Thu, May 29, 2008 at 1:16 PM, John M <retire...@gmail.com> wrote:
Ill give that a try, but seems like I'm working against the framework
doesn't it?

Nah, working against the framework would be rooting around in undocumented internal data structures to get the data you are looking for.  HttpRequest.META['QUERY_STRING'] is documented (http://www.djangoproject.com/documentation/request_response/#attributes) and the rest is just standard Python library code.
 
I could swear I was able to set the encoding, GET the fields I wanted
and then do it another way, but I need to move on in my coding, I hope
this works.

There may be some way to un-do the unicodification but it isn't obvious to me.  Fact is you don't want this query parameter turned into unicode, you want he hex representation of what's essentially a  byte string, represented in the url as a mix of ASCII and percent-encoded octets.  For that I think it is more straightforward to start with the raw query string vs. attempting to reverse what the framework did in constructing the request.GET dictionary.  But that's just my opinion, of course.

Karen

John M

unread,
May 29, 2008, 2:36:47 PM5/29/08
to Django users
Well, either way, thank you so much for helping me on this.

The more I dive into the framework and more so, Python, the more I
know i've made the right choice.

John

On May 29, 11:15 am, "Karen Tracey" <kmtra...@gmail.com> wrote:

John M

unread,
May 29, 2008, 11:47:31 PM5/29/08
to Django users


On May 29, 9:00 am, "Karen Tracey" <kmtra...@gmail.com> wrote:

John M

unread,
May 29, 2008, 11:52:45 PM5/29/08
to Django users
Well, I thought it was working, but alas, now it's not, let me give
you the latest...

GET /announce/?uploaded=0&compact=1&numwant=80&info_hash=%EF%BF%BDm%EF
%BF%BDmfD%D8%9A%EF%BF%BD%EF%BF%BD%C7%86%EF%BF%BD8%EF%BF
%BD&event=started&downloaded=0&key=yoolrpcyku&corrupt=0&peer_id=-
TR1210-exbira35le7h&port=51413&left=0

In the code:

print "Debug"
print "query string " + parse_qs(request.META['QUERY_STRING'])
['info_hash'][0]
hash = parse_qs(request.META['QUERY_STRING'])['info_hash']
[0].encode('hex')
assert False

The output from the print statements:
Debug
query string parseqs�m�mfDؚ��dž�8�

LOCAL Variables
c
{'addr': '127.0.0.1', 'peer_id': u'-TR1210-exbira35le7h', 'port':
51413}
hash
'efbfbd6defbfbd6d6644d89aefbfbdefbfbdc786efbfbd38efbfbd'


and the debug screen snippit

QUERY_STRING
'uploaded=0&compact=1&numwant=80&info_hash=%EF%BF%BDm%EF%BF%BDmfD%D8%9A
%EF%BF%BD%EF%BF%BD%C7%86%EF%BF%BD8%EF%BF
%BD&event=started&downloaded=0&key=yoolrpcyku&corrupt=0&peer_id=-
TR1210-exbira35le7h&port=51413&left=0'

Any help would be great!

John

Karen Tracey

unread,
May 30, 2008, 12:13:26 AM5/30/08
to django...@googlegroups.com
What's wrong with the hash it is calculating?  Lining up the %-encoded query string with the hash produced it looks like everything is correct:

http://dpaste.com/53697/

What is the 'correct' result supposed to be?  Perhaps I misunderstood exactly how you need to interpret info_hash?

Karen

John M

unread,
Jun 2, 2008, 4:01:33 PM6/2/08
to Django users
Karen,

Thanks so much for keeping up on this. I'll run through the software
to find the correct answer. This is a side project, and my time gets
in bursts, sorry for the late reply.

John

On May 29, 9:13 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:

John M

unread,
Jun 4, 2008, 12:55:33 PM6/4/08
to Django users
Karen,

I think I found my problem, and added a new post, maybe you can help?

http://groups.google.com/group/django-users/browse_thread/thread/fc47edb1b9f8ec8f#

Thanks again, you've really helped me see this through

John
Reply all
Reply to author
Forward
0 new messages