In subversion trunk of mod_wsgi for version 3.0, you can now if using
Python 2.6 set:
WSGIPy3kWarningFlag On
with effect hopefully being same as -3 option to 'python' executable.
Note that haven't got Python 2.6 installed so haven't actually tested it.
Graham
What specification are you using for encoding/decoding strings in HTTP
headers? I understand that there is no PEP for it right now; I just want to
know what the expected behavior is, so I can review the code to ensure that
it matches the expected behavior.
- Brian
Are you asking about how it is done in Python 3.0 or older Python versions?
In older Python versions they just get passed straight through as
traditional Python strings.
In Python 3.0 I do conversions to Unicode, ie., default string type in
that version, as per discussion summary for Python 3.0 in:
http://www.wsgi.org/wsgi/Amendments_1.0
You asked about that before though, so bit confused as to what you are
wanting to know.
As to Python 2.6 and the -3 flag. I haven't looked at exactly what it
does, but I thought it just complains about syntactical elements
rather than fiddle with current distinction between traditional
strings and Unicode strings. As such, nothing has been done in C code
level for Python 2.6 and -3 option except to set the
Py_Py3kWarningFlag.
Graham
Thanks. I had not realized that had been added to the Ammendments page. The
page says that that strings are to be converted to Latin 1 + RFC 2047. But,
although the HTTP spec. references RFC 2047, it doesn't actually explain how
to use RFC 2047 in HTTP. Consequently, the HTTP working group of the IETF is
probably going to remove all references to RFC 2047 from the HTTP 1.1
specification in its next revision. So, it actually makes more sense to
reject non-latin-1 characters outright than it does to escape them with RFC
2047 encoding.
I am also curious what happens when the HTTP request contains header fields
that cannot be decoded from latin 1. You cannot just silently strip out the
bad header fields. And, rejecting the request outright is problematic too if
the application has all of its logging, etc. done using WSGI middleware (it
won't even see the bad requests in its log).
I think Python 3.0 really needs a slightly different WSGI interface to
handle these issues--an interface where the application can access the
request headers as bytestrings for any request (including invalid ones) and
where the application can have them converted to unicode transparently when
they are valid.
- Brian
I use PyUnicode_AsLatin1String() followed by PyBytes_AsString() when
converting Python 3.0 unicode string objects to byte strings. I didn't
understand the RFC 2047 stuff either and possible based on comments
made at the time in discussions ignored that part of it.
> I am also curious what happens when the HTTP request contains header fields
> that cannot be decoded from latin 1. You cannot just silently strip out the
> bad header fields. And, rejecting the request outright is problematic too if
> the application has all of its logging, etc. done using WSGI middleware (it
> won't even see the bad requests in its log).
>
> I think Python 3.0 really needs a slightly different WSGI interface to
> handle these issues--an interface where the application can access the
> request headers as bytestrings for any request (including invalid ones) and
> where the application can have them converted to unicode transparently when
> they are valid.
I am quite ignorant on the intricacies of unicode, but I thought the
whole thing with Latin 1 was that all 255 characters would convert and
so it couldn't fail in converting to Unicode. Presuming I haven't got
this wrong as I usually do with unicode stuff, but the following in
Python 3.0 doesn't yield an error. Don't get issues for similar thing
on Python 2.3 either.
b1=b'\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f
!"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
s1=str(b1, 'latin-1')
b2=bytes(s1,'latin-1')
b2
b'\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f
!"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
So, I get back to what I started with.
Graham
> I am quite ignorant on the intricacies of unicode, but I
> thought the whole thing with Latin 1 was that all 255
> characters would convert and so it couldn't fail in
> converting to Unicode. Presuming I haven't got this wrong as
> I usually do with unicode stuff, but the following in Python
> 3.0 doesn't yield an error. Don't get issues for similar
> thing on Python 2.3 either.
Based on your test, it seems "Latin1" means ISO-8859-1 and not ISO/IEC
8859-1, so there is no problem.
- Brian
Anything that is part of a URI (e.g. SCRIPT_NAME, REQUEST_URI) must be ASCII
by definition.
Anything that comes from a a HTTP header must be Latin-1 according to the
specification. However, there are applications (especially in Asia) where
raw (unencoded) UTF-8/Shift-JIS/etc. octet sequences get put in HTTP
headers. For example, Internet Explorer expects this kind of encoding
sometimes for Content-Disposition. I encourage everybody to avoid non-ASCII
data in headers whenever possible.
Anything that comes from a file path will be a raw string of bytes (on
Linux), that you should interpret according to the file system's encoding
(usually UTF-8 on modern Linuxes, I believe). In particular, do not assume
that you can just include a file name or path in a URI without encoding it,
and don't assume that file paths can be encoded into ASCII or Latin-1.
Another thing to consider is the encoding of Apache's configuration files. I
don't know what encoding it uses.
- Brian
This morning i tried in my Fedora 8 desktop, wich is utf-8, including
all the files, this:
DocumentRoot /var/www/html/çãé
and it worked and sent index.html as expected.
Regards, Clodoaldo
> - Brian
>
>
> >
>
No, it will leave it as http://localhost/%e2%82%ac.html. It does (or should do) the Latin-1-to-Unicode conversion before it decodes URL encoding.
> Unlike wsgi.input where the application *must* decide how to
> decode the data, you are trying to do automatic encoding of
> data in the wsgi server here. This will cause tracebacks on
> some unicode string input but not others (which is one of the
> reasons that people hate unicode handling in python-2). The
> tracebacks occur because latin-1 characters are a subset of
> Unicode characters (note that we're not dealing with
> code-point to byte mapping here, we're dealing with character
> mapping). So you can always convert latin-1 to unicode.
> But you can't always convert Unicode to latin-1 (which is
> what this automatic conversion would attempt). It's much
> better for the application layer to always hand mod_wsgi byte
> types, never unicode.
The HTTP standards mandates Latin-1. Python 3.0 says all strings are Unicode. The encoding/decoding is needed to bridge the gap. Treating the HTTP headers as raw sequences of bytes and requiring Python applications to do their own manual decoding/encoding would not be Pythonic and the Python community wouldn't accept it.
> This takes care of the problem but is somewhat silly. We're
> basically using latin-1 as a marshalling format for passing
> bytes over the wire. So we have to convert the unicode to
> bytes as the first step in changing unicode characters
> outside the latin-1 range into bytes that can go over the
> wire. At that point converting the bytes back to unicode
> pretending they're latin-1 instead of utf-8 is just an extra
> step for no reason.
Again, I think you are misunderstanding the interaction between URL encoding and character encoding conversion. Mod_wsgi will (should) never do or undo URL-encoding itself for non-ASCII (%80-%FF) sequences.
> I have two files there. Both are named ½ñ.html. (one-half
> tilde- lowercase-n .html). However one of the filenames is
> encoded with
> latin-1 and the other with utf-8. If you switch between
> character encodings for the web page (firefox3:
> View::Character Encoding::UTF-8 vs View::Character
> Encoding::Western (iso 8859-1) ) you'll see that you can make
> one or the other show its name correctly. Why isn't apache
> able to display both correctly at the same time? It's
> because apache doesn't know what the encoding of the
> filenames are. The filesystem is just handing it as a
> sequence of bytes.
That issue doesn't really affect mod_wsgi though. All mod_wsgi can do is try to decode filenames given the information the OS gives it. If the users are using multiple encodings for file names then they deserve whatever bad behavior they get. Actually, I would go further and say that if you are using any encoding for filenames other than NFC UTF-8 on Linux then you are asking for trouble.
> mod_wsgi receives a sequence of bytes from apache.
> It transforms those into unicode by pretending that those bytes are
> latin-1 and sticks them into SCRIPT_NAME.
IMO, mod_wsgi should just drop SCRIPT_NAME and all other non-WSGI environ keys except REQUEST_URI (REQUEST_URI is needed to get the raw, un-decoded URI).
> Now let's look at the reverse case: Let's say that the
> application wants to redirect the user to €.html (Euro
> symbol.html). For that, they have to enter this into the
> location header::
> real_url = '€.html'
> byte_sequence = real_url.encode('utf-8')
> marshalled_form = str(byte_sequence, 'latin-1')
> headers = [('location', marshalled_form)]
No, they have to URL-encode mashalled_form into ASCII first, because the Location header holds a URI, and URIs are always ASCII-only.
Regards,
Brian
The amendment page says:
When running under Python 3, applications SHOULD produce bytes
output and headers
So, the ideal situation is that the application would always produce
bytes and so it is the application which is supposed to deal with it.
That mod_wsgi fallbacks to converting any Unicode strings to bytes is
a fail safe as dictated by:
When running under Python 3, servers and gateways MUST accept
strings as application output or headers, under the existing rules (i.e.,
s.encode('latin-1') must convert the string to bytes without an
exception)
and is more to protect lazy programmers, plus make it easier to port
WSGI applications for Python 2.X.
In other words, your application is the one who should be dealing with
it in the first place if you want to be sure about what is being
produced. It only becomes an issue where the WSGI application hasn't
done what it really should have done.
Graham
Did you perhaps mean SCRIPT_FILENAME. The WSGI specification requires
SCRIPT_NAME.
As to this whole discussion, as much as it is interesting there is
nothing I can do about it. It really needs to be brought up on the
Python WEB-SIG where I originally raised the issue of Python 3.0
support for WSGI. I can only implement what consensus comes out of
discussion on Python WEB-SIG in lieu of them not wanting to come out
with an official revised specification for WSGI.
Graham
I thought I had made it clear enough and that the proposed amendments
were also clear on this.
The wsgi.input stream which contains the request content is 'bytes'.
Thus it is not touched by mod_wsgi. The amendments say:
When running under Python 3, servers MUST make wsgi.input a
binary (byte) stream
Though amendments do though also say:
When running under Python 3, servers MUST provide CGI HTTP variables
as strings, decoded from the headers using HTTP standard encodings
(i.e. latin-1 + RFC 2047) (Open question: are there any CGI or WSGI
variables that should NOT be strings?)
Thus, mod_wsgi does however convert the CGI variables (ie., translated
HTTP headers) in WSGI environment dictionary, into Unicode strings
using latin-1 encoding.
As I pointed out there were only a few variables in there which were
of concern. Brian has pointed out that request URI has to be ascii
characters but there possibly still is an open question there on how
encoding of non ascii characters works in practice. We just need to do
some actual tests to see what happens and whether there is a problem.
Thus we are possibly down to SCRIPT_FILENAME given that it is
reflecting a file system path. Again, we just need to do some actual
tests to see what happens. Remembering that Apache is going to dictate
in the main how things work.
> 2) pje said that accepting unicode str here would make it easier to
> port WSGI applications but that's actually not true. In python-2.x,
> you are only supposed to pass byte strings (py-2.x str) so there's no
> problems. When those str's are converted to unicode str in py3.x, you
> have to rewrite your code so you aren't passing non-latin-1
> characters. At that point, there's zero incentive to pass a sanitized
> unicode string to the wsgi server as you had to go through the byte
> type in order to get there (unless you misunderstand the WSGI spec and
> think it wants you to send py-3.x str type.)
>
> As for protecting lazy programmers... I'd argue that it's much better
> to throw an exception immediately upon receiving a unicode type rather
> than waiting until your app starts getting popular and you suddenly
> have transient errors due to people occassionally submitting data with
> non-latin-1 characters.
My feeling was that fallback to converting to bytes using latin-1 was
so that simple applications would still work. For example, the hello
world application:
def application(environ, start_response):
status = '200 OK'
output = 'Hello World!'
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)
return [output]
works in by Python 2.X and 3.0 without change.
Larger applications such as Django already internally deal with all
response content as Unicode and convert it to string objects at last
minute. The 2to3 converter would presumably pick that up automatically
and make it produce bytes instead.
Request headers in Django are a bit different more interesting. At the
moment, it will do things like:
path_info = force_unicode(environ.get('PATH_INFO', u'/'))
where force_unicode is:
def force_unicode(s, encoding='utf-8', strings_only=False,
errors='strict'): ...
Thus, Django was converting Python 2.X string objects to Unicode but
as UTF-8, which technically may not be correct.
In Python 3.0 because this conversion will likely still be applied
when 2to3 conversion done, they may well be converting Unicode string
created as latin-1 to Unicode string as UTF-8, albeit possibly by
going back through bytes type to do it if I read code correctly.
So, issue there is whether that they are treating them as UTF-8 is
right given that amendment is suggesting CGI variables are supposed to
be handled as latin-1.
Anyway, that is getting a bit off topic.
>> In other words, your application is the one who should be dealing with
>> it in the first place if you want to be sure about what is being
>> produced.
>
> +100
>
>> It only becomes an issue where the WSGI application hasn't
>> done what it really should have done.
>>
> As long as mod_wsgi is only converting unicode to bytes and not
> converting bytes to unicode, this is true.
I have already explained that for CGI variables (translated HTTP
headers) in the WSGI environment dictionary, that mod_wsgi does
convert bytes to Unicode.
Graham
I tested that url with Firefox and Opera in Linux utf-8 and what
happens is that Firefox does what Brian says. But testing Firefox in
Windows XP it substitutes € for %80 and IE6 changes € to %e2%82%ac.
Original string in Latin-1: http://localhost/%e2%82%ac.html
Latin-1 to Unicode: http://localhost/%e2%82%ac.html
Since the original Latin-1 string did not contain any non-Latin characters, no codepoint conversions are performed.
> Or I suppose you can, but it isn't by any means the opposite of
> what you did to get the url escaped bytes so it's pretty senseless.
I made a mistake about the *encoding* (not decoding) order in my previous email. I will correct it below.
> > Again, I think you are misunderstanding the interaction
> > between URL encoding and character encoding conversion.
> > Mod_wsgi will (should) never do or undo URL-encoding itself
> > for non-ASCII (%80-%FF) sequences.
> I think that you are misunderstanding the interaction. And I
> thing that % sequences should definitely be done by mod_wsgi.
> Ending up with a unicode string containing %encoded
> sequences is even worse than the other scenarios I described
> as the application then has to convert from unicode to byte
> string, unquote the url quoting, and then convert back to
> unicode.
mod_wsgi cannot decode all the % sequences in headers because it doesn't know which headers contain URIs and which ones don't; many headers can contain % sequences that don't mean the same thing they mean in URIs. Plus, sometimes (many times) the application needs the encoded URI instead of the IRI form. If you are you talking about things like PATH_INFO, SCRIPT_NAME, and REQUEST_URI, doing URI->IRI conversion on them will break applications like mine that already do their own URI->IRI conversion. I should test to see what WSGI gateways actually do there.
> It would be much better for mod_wsgi to do the url quoting
> for the user as converting between bytes and %escape
> sequences is 100% automatable. This is unlike converting
> between unicode and a sequence of bytes where something has
> to decide what the character encoding is. So -- WSGI should
> take care of %encoding because that's a job for a computer
> anyway. WSGI should not take care of the byte=> unicode
> conversion because it doesn't know what enconding the bytes are in.
mod_wsgi already mangles the URI components too much in SCRIPT_NAME and PATH_INFO (in its defense, it does so because CGI/WSGI require it to for the most part, except for "//" munging). That is why I fall back to parsing REQUEST_URI myself.
> > > Now let's look at the reverse case: Let's say that the
> application
> > > wants to redirect the user to €.html (Euro symbol.html).
> For that,
> > > they have to enter this into the location header::
> > > real_url = '€.html'
> > > byte_sequence = real_url.encode('utf-8')
> > > marshalled_form = str(byte_sequence, 'latin-1')
> > > headers = [('location', marshalled_form)]
> >
> > No, they have to URL-encode mashalled_form into ASCII
> first, because the Location header holds a URI, and URIs are
> always ASCII-only.
> >
> Well... between marshalled_form and HTTP HEADER, there needs
> to be a url escaping sequence. but whether that needs to
> happen outside of mod_wsgi or inside is part of what you and
> I are debating. You do see from your example above why your
> initial sequence for decoding at the top of the post is
> wrong, though? Your decoding sequence at the top placed the
> ASCII escaping between byte_sequence and real_url instead of
> between marshalled_form and headers.
Right, I made two mistakes here. First, it doesn't make sense to URL-encode the string AFTER converting it to Latin-1. Instead, you need to URL-encode the string BEFORE converting it to Latin-1. Then, the string will only have ASCII characters. Secondly, you can encode/decode it using whatever encodings you please before you URL-encode it, because the URI and IRI specifications do not require every %XX sequence to decode to a valid UTF-8 sequence. mod_wsgi's own view of the filesystem encoding doesn't matter in this case.
Regards,
Brian
In my defence I do the leading duplicate slash removal in SCRIPT_NAME
because otherwise different major versions of Apache would behave
differently. Any duplicate slashes otherwise within the path of
SCRIPT_NAME and PATH_INFO are from memory eliminated by Apache itself
and not by mod_wsgi.
Graham
I understand that you do that for compatibility reasons. I didn't realize
that Apache does it too. Is that done in the CGI environment building
functions or in the core of Apache?
Thanks,
Brian
Can some clearly just tell me what you want me to test.
For Python 3.0, if I use a URL:
/wsgi/scripts/echo3000.py/%E2%82%AC.html
in Safari, where:
/wsgi/scripts/echo3000.py
just echos back WSGI environment, the following happen.
1. Once submit request Safari changes that symbol in URL bar to a Euro symbol.
2. In Apache access logs I get:
::1 - - [01/Oct/2008:16:34:57 +1000] "GET
/wsgi/scripts/echo3000.py/%E2%82%AC.html HTTP/1.1" 200 1858
3. In response to browser, relevant values from WSGI environment are:
PATH_INFO: '/â\x82¬.html'
PATH_TRANSLATED: '/usr/local/apache-2.2.4/htdocs/â\x82¬.html'
QUERY_STRING: ''
REQUEST_METHOD: 'GET'
REQUEST_URI: '/wsgi/scripts/echo3000.py/%E2%82%AC.html'
SCRIPT_FILENAME: '/usr/local/wsgi/scripts/echo3000.py'
SCRIPT_NAME: '/wsgi/scripts/echo3000.py'
Remember that this is what Apache passes and all mod_wsgi is doing is
converting them to Unicode string as latin-1.
For Python 2.3 get:
1. Safari does same obviously.
2. In Apache access logs, also same:
::1 - - [01/Oct/2008:16:41:29 +1000] "GET
/wsgi/scripts/echo.py/%E2%82%AC.html HTTP/1.1" 200 7118
3. In echoed response get:
PATH_INFO: '/\xe2\x82\xac.html'
PATH_TRANSLATED: '/usr/local/apache-2.2.4/htdocs/\xe2\x82\xac.html'
QUERY_STRING: ''
REQUEST_METHOD: 'GET'
REQUEST_URI: '/wsgi/scripts/echo.py/%E2%82%AC.html'
SCRIPT_FILENAME: '/usr/local/wsgi/scripts/echo.py'
SCRIPT_NAME: '/wsgi/scripts/echo.py'
The difference here obviously being that in Python 2.3 they aren't
Unicode strings byte Python 2.X byte strings (ie. conventional
string).
I'll try and update my script to put a link onto itself so can check
what referrer says on click though.
Graham
Better still, someone construct a small WSGI application etc which
does what you want to test various cases and I'll run it under Python
2.3 and 3.0. If script is for Python 2.X, I can convert it to Python
3.0 if need be.
Graham
Before going off to Python web-sig, like to get the WSGI example
program that demonstrates issues done first. Also would like to do
equivalent for normal CGI and show how CGI would work for Python 2.X
and 3.0 on Apache as well. That way one is taking mod_wsgi out of the
picture and possibly show that maybe it is in part the way that Apache
sets up data is the issue, although how os.environ works in Python 3.0
may also be an issue for CGI. Only other Python 3.0 WSGI server I know
of is wgsiref in Python 3.0, so should see how that works as well. For
Python 2.X, should also try CherryPy WSGI server and Paste server.
Graham
> > > PATH_INFO: '/â\x82¬.html'
> >
> > This is what Brian and I need to see. I think that he and I both
> > think this is incorrect. However, after Brian's last message I'm
> > unsure of whether it should be '%E2%82%AC.html' or
> > b'\xe2\x82\xac.html'. I'm pretty sure if it's the latter I
> > need to go to the python-web-sig and try to get the wsgi spec
> > changed. If it's the former, I don't know if it's a problem with
> > the spec or just how it's being interpreted. Although, if
> > paste's httpserver also gives this value, then it's something
> > that should be clarified at the wsgi spec level as there's likely
> > a lot of wsgi servers doing it that way.
> >
> Brian, ping. Could you comment on this?
Originally, I was going to say that you can find out what should happen by
running the test program in the reference implementation given in PEP 333,
running under Apache as CGI. That, combined with the URL reconstruction
algorithm given in PEP 333 should be enough information to specify the
behavior.
Unfortunately, the URL reconstruction algorithm is based on the standard
quote() function, and that function's semantics have changed substantially
for Python 3.0. In particular, the quote() and unquote() functions in Python
3 assume that you want URI-IRI conversion semantics by default. Not all URIs
are derived from IRIs so that assumption does not work in general. It looks
like a big mess. I'm starting to agree with you that these path-related
variables need to be byte strings. Then at least everything is clear.
In my application I have avoided this issue, and other issues, by using
REQUEST_URI when it is available. In fact, my application really only works
100% correctly when REQUEST_URI is available. Probably the biggest
improvement to WSGI, for applications that are sensitive to these issues,
would be to require REQUEST_URI for all gateways.
Regards,
Brian
In python-2.x:
PATH_INFO is a byte string
QUERY_STRING becomes a urlencoded byte string
REQUEST_URI is not present
This is the same behaviour as mod_wsgi on python-2.x.
We really need to see how other wsgi implementations treat PATH_INFO on
python-3.x.
-Toshio