#12578: multipartparser.Parser does not accept non-canonical bare CR and bare LF
------------------------------------+---------------------------------------
Reporter: jfenwick | Owner: nobody
Status: closed | Milestone:
Component: HTTP handling | Version: 1.1
Resolution: invalid | Keywords: jython
Stage: Unreviewed | Has_patch: 0
Needs_docs: 0 | Needs_tests: 0
Needs_better_patch: 0 |
------------------------------------+---------------------------------------
Comment (by jfenwick):
I'm not sure what the real problem is, so rather than open a new ticket I
will ask some questions here in the hope that someone can answer them.
The issue is that in on Windows, in Django-Jython on Tomcat, multipart
data is not parsed correctly by the LazyStream in multipartparser.py.
As far as I can tell, this happens because at
http://code.djangoproject.com/browser/django/trunk/django/http/multipartparser.py#L553
because the character CRLFCRLF is not found.
I did some experiments. I POSTed some data using the same app on four
different Django platforms.
Here are the platforms I tested on, and the associated data that was
output as hex:
Django on Python on runserver on Windows - multipartpythonwindows.hex
Django on Jython on runserver on Windows -
multipartdjangojythonwindows.hex
Django on Jython on Tomcat on Windows - multiparttomcatjythonwindows.hex
Django on Python on runserver on OS X - multipartpythonosx.hex (note: I
used a different Django app, but I believe the result would have been the
same)
The data was dumped using the code in multipartparser.diff
Note: I ran the files I generated through hexdump -C file to generate the
hex files from the data files I created.
According to
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.2 the message
body of the multipart should use CRLF as a line break between body-parts.
If you look at multipartpythonwindows.hex you will see that what actually
happens is all CRs are replaced with CRCR. This means that when the cited
line in multipartparser.py is looking for CRLFCRLF, it instead finds
CRCRLFCRCRLF. I would have thought this would fail, but it does not! It
works correctly. This is the normal operating procedure of Django on
Python, as far as I can tell.
In multipartdjangojythonwindows.hex you will see that the pattern of CRs
being replaced by CRCR still occurs. This means that Django on Jython
running on runserver works the same as Django on Python running on
runserver, and as a result, still works.
Now we go to multiparttomcatjythonwindows.hex. In this file, the CRLF
comes as you would expect it in RFC 2616. In this case, chunk.find fails,
which is the root of the problem.
Finally, compare the CR characters of multipartpythonosx.hex. You will see
it does not duplicate CR the same way multiparttomcatjythonwindows does.
And yet it works.
These are my questions:
1. Where is that extra CR coming from and why is it required? Why does
this not result in failure?
2. Could there be something wrong with my method of data collection as
specified in multipartparser.diff?
3. kmtracy previously said "All the code touching this data before feeding
it to the Django code needs to be treating it as binary, not text, and not
doing any type of line normalization." Is there a way I can check whether
the data is binary or text in Python to verify this is the case?
I'm sorry if this is not the correct avenue to be asking these questions.
If there is a better one, please point me in that direction.
--
Ticket URL: <
http://code.djangoproject.com/ticket/12578#comment:8>