POST argument parsing problems

157 views
Skip to first unread message

Brian Jones

unread,
Nov 19, 2010, 11:19:58 AM11/19/10
to python-...@googlegroups.com
I'm trying to create a PyPI server using Tornado. I've done lots of other stuff with tornado, but this one is different in that I have to support easy_install, and it seems it's doing odd things with boundaries and header demarcation. That said, I don't know if what it's doing is invalid: maybe tornado needs to make fewer assumptions/be more forgiving? Hence this mail :) 

For reference, everything I'm going to mention is in the log output at the bottom. 

Problem 1 is that the POST from easy_install doesn't end up having anything in self.request.arguments, and I believe it definitely should. So I dumped the whole request and saw that it does specify a boundary, and generally the request looks like all of the requisite pieces are there. 

When I looked at how tornado parses boundaries, I found that it makes the assumption that the boundary is the last thing in the Content-Type header. Is this a valid assumption? It's not true in the case of easy_install. I changed the boundary definition code in httpserver.py from this: 

                    boundary = content_type.split('boundary=',1)[1]

To this: 

                    ctfields = content_type.split(';')
                    boundary = [x.split('=')[1] for x in ctfields if 'boundary' in x][0] 

So after this change, I figured I'd be all set, but I still wind up with an empty self.request.arguments. So then I went to _parse_mime_body, and found that it looks specifically for '\r\n' to define headers, and it looks to me like easy_install (which uses urllib) uses '\n\n'. Is this invalid HTTP? Should tornado check for both, or is easy_install doing things in an unacceptable way? I haven't gone back to the HTTP spec to look, to be honest. 

Advice here is welcome. I'd rather not do this parsing myself, and I'd rather not have to own my own version of either tornado or easy_install :( 



Here's the log output (which reflects my changes to httpserver.py noted above): 

[D 101119 11:06:30 httpserver:360] BOUNDARY: --------------GHSKFJDLGDS7543FJKLFHRE75642756743254
[D 101119 11:06:30 httpserver:380] data parts: ['\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="license"\n\nUNKNOWN\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="name"\n\nloghetti\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="metadata_version"\n\n1.0\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="author"\n\nBrian K. Jones\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="home_page"\n\nhttp://github.com/bkjones/loghetti\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name=":action"\n\nsubmit\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="download_url"\n\nUNKNOWN\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="summary"\n\nA log strainer for Apache combined format log files.\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="author_email"\n\nbkj...@gmail.com\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="version"\n\n0.90\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="platform"\n\nUNKNOWN\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="classifiers"\n\nDevelopment Status :: 4 - Beta\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="classifiers"\n\nIntended Audience :: Developers\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="classifiers"\n\nLicense :: OSI Approved :: MIT License\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="description"\n\nUNKNOWN\n-']
[W 101119 11:06:30 httpserver:385] multipart/form-data missing headers
[D 101119 11:06:30 mypi:42] FILES: {}
[D 101119 11:06:30 mypi:43] ARGUMENTS: {}
[D 101119 11:06:30 mypi:44] BODY: 
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="license"
    
    UNKNOWN
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="name"
    
    loghetti
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="metadata_version"
    
    1.0
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="author"
    
    Brian K. Jones
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="home_page"
    
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name=":action"
    
    submit
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="download_url"
    
    UNKNOWN
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="summary"
    
    A log strainer for Apache combined format log files.
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="author_email"
    
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="version"
    
    0.90
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="platform"
    
    UNKNOWN
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="classifiers"
    
    Development Status :: 4 - Beta
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="classifiers"
    
    Intended Audience :: Developers
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="classifiers"
    
    License :: OSI Approved :: MIT License
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254
    Content-Disposition: form-data; name="description"
    
    UNKNOWN
    ----------------GHSKFJDLGDS7543FJKLFHRE75642756743254--
    
[D 101119 11:06:30 mypi:45] REQUEST: HTTPRequest(protocol='http', host='localhost:8888', method='POST', uri='/', version='HTTP/1.1', remote_ip='127.0.0.1', remote_ip='127.0.0.1', body='\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="license"\n\nUNKNOWN\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="name"\n\nloghetti\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="metadata_version"\n\n1.0\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="author"\n\nBrian K. Jones\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="home_page"\n\nhttp://github.com/bkjones/loghetti\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name=":action"\n\nsubmit\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="download_url"\n\nUNKNOWN\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="summary"\n\nA log strainer for Apache combined format log files.\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="author_email"\n\nbkj...@gmail.com\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="version"\n\n0.90\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="platform"\n\nUNKNOWN\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="classifiers"\n\nDevelopment Status :: 4 - Beta\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="classifiers"\n\nIntended Audience :: Developers\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="classifiers"\n\nLicense :: OSI Approved :: MIT License\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\nContent-Disposition: form-data; name="description"\n\nUNKNOWN\n----------------GHSKFJDLGDS7543FJKLFHRE75642756743254--\n', headers={'Content-Length': '1901', 'Accept-Encoding': 'identity', 'Content-Type': 'multipart/form-data; boundary=--------------GHSKFJDLGDS7543FJKLFHRE75642756743254; charset=utf-8', 'Connection': 'close', 'Host': 'localhost:8888', 'User-Agent': 'Python-urllib/2.6'})
[D 101119 11:06:30 mypi:46] HEADERS: {'Content-Length': '1901', 'Accept-Encoding': 'identity', 'Host': 'localhost:8888', 'User-Agent': 'Python-urllib/2.6', 'Connection': 'close', 'Content-Type': 'multipart/form-data; boundary=--------------GHSKFJDLGDS7543FJKLFHRE75642756743254; charset=utf-8'}
[W 101119 11:06:30 web:864] 404 POST / (127.0.0.1): Missing argument license
[W 101119 11:06:30 web:853] 404 POST / (127.0.0.1) 7.26ms


--
Brian K. Jones
My Blog          http://www.protocolostomy.com
Follow me      http://twitter.com/bkjones

Ben Darnell

unread,
Nov 19, 2010, 12:02:13 PM11/19/10
to python-...@googlegroups.com
The boundary parsing is a bug in tornado that should be fixed. Using
bare linefeeds instead of \r\n is invalid HTTP. The spec "recommends"
that implementations accept bare \n
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.3), but
doing so would require a messy refactoring in tornado (since you'd
have to read a line at a time instead of reading the entire header
block at once). I think this is one of the things that running behind
nginx shields you from.

-Ben

Brian Jones

unread,
Nov 19, 2010, 1:21:37 PM11/19/10
to python-...@googlegroups.com
On Fri, Nov 19, 2010 at 12:02 PM, Ben Darnell <b...@bendarnell.com> wrote:
The boundary parsing is a bug in tornado that should be fixed.

Submitted pull request :) 

brian
Reply all
Reply to author
Forward
0 new messages