Bad header with mod_proxy, mod_wsgi and paste. Possible bug

145 views
Skip to first unread message

Jorge Vargas

unread,
Sep 16, 2008, 11:32:42 PM9/16/08
to mod...@googlegroups.com, paste...@pythonpaste.org
Hello I'm having an issue running the following setup.

My hosting provider has an apache instance with mod_proxy which I
can't get rid of. This apache (lets call it 1) is sending request to
my (installed in $HOME) mod_wsgi apache (2) which is running a pylons
app (3) that uses some parts of paste(4), specially paste.auth. as the
error is only there when using /login.

Everything was working like a charm except for /login which was
returning a 502 Bad Gateway error.
and apache was writting ~/log/apache/error_ae_dev.log file: proxy: bad
HTTP/1.1 header returned by /login (POST)

Also if I enter the site directly going to the port where #2 was
running, I didn't had this error. Which proved the error was on 1.
Another indicator is that paste.serve running on my local machine
(without mod_proxy) wasn't giving this error.

After several days of exchanging ticket replies with my hosting
support team they came to the conclusion that the problem was that
paste or mod_wsgi was sending a bad header in the form "Connection:
close" which is invalid in a proxy environment (created by #1).

and to prove that point he created a second proxy server (4) which has
the "ProxyBadHeader Ignore" directive and now everything seems to be
working.

But of course having 1 -> 4 -> 2 -> 3 is just a huge bottleneck that I
want to get rid of. at least 4

Now I have examine paste's code and I'm not certain it's setting the
header, as I'm not really familiar with paste.deploy nor the internal
of mod_wsgi. So could someone guide me so I can find out who is the
offending party, and if it's a bug figure out a way to patch it.

A little off topic, since I'm forced to have this mod_proxy, wouldn't
a better setup be to eliminate mod_wsgi, and just have their main
mod_proxy send requests to a paster serve ?

I got several config files, and py scripts laying around, mostly based
on the instructions from modwsgi wiki. I'll post them if they are
needed but I don't want to clutter the mailing lists with unnecessary
files.

Graham Dumpleton

unread,
Sep 16, 2008, 11:50:40 PM9/16/08
to mod...@googlegroups.com, paste...@pythonpaste.org
2008/9/17 Jorge Vargas <jorge....@gmail.com>:

Use the logging middleware described in:

http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Tracking_Request_and_Response

to capture response headers from Pylons, before they are injected back
into mod_wsgi.

Note that mod_wsgi doesn't generate a 'Connection' header, but Apache
underneath it can for case where error response is being generated. I
wouldn't expect therefore to see a 'Connection' header in debug output
from that script as I don't think a WSGI application is supposed to be
generating them, but check anyway.

It is a bit odd that they say this is bad as this would come up all
the time in a proxy environment as from memory it is normal behaviour
for Apache to generate a 'Connection: close' for an error response
when HTTP/1.1 is used. I'd be asking them to produce the set of
headers that your Apache is supposedly responding with.

Other option may perhaps be to force your Apache to return HTTP/1.0
responses. As I said though, proxying Apache behind another Apache is
nothing strange and can't see why 'Connection' header would be a
problem.

> A little off topic, since I'm forced to have this mod_proxy, wouldn't
> a better setup be to eliminate mod_wsgi, and just have their main
> mod_proxy send requests to a paster serve ?

That is certainly how many Pylons folks would prefer you to run it. :-)

This may not be realistic though if hosting provider is forcing you to
serve static files from your Apache instance. If you have no static
files to serve or you are happy with reduced performance in having
Pylons application serve them, then you could still do that and cut
out middle Apache.

Graham

Graham Dumpleton

unread,
Sep 17, 2008, 12:03:19 AM9/17/08
to mod...@googlegroups.com
BTW, I hope you are reading response here as paste-users list rejects
outright my response so you will not see it there.

Graham

2008/9/17 Graham Dumpleton <graham.d...@gmail.com>:

Graham Dumpleton

unread,
Sep 17, 2008, 12:16:03 AM9/17/08
to mod...@googlegroups.com
2008/9/17 Graham Dumpleton <graham.d...@gmail.com>:

>> Note that mod_wsgi doesn't generate a 'Connection' header, but Apache
>> underneath it can for case where error response is being generated. I
>> wouldn't expect therefore to see a 'Connection' header in debug output
>> from that script as I don't think a WSGI application is supposed to be
>> generating them, but check anyway.

And below is the hideous bit of Apache code that determines whether it
forces a close. I always cringe when I look at this particular bit of
code.

Worth noting is that it will look from 'Connection: close' from an
application so actually reasonable that WSGI application could return
it.

As for capturing output from WSGI application with that debug code I
pointed out, you may want to change the pprint calls so they use
repr() instead. This way it will show if any garbage characters from
Unicode or something getting passed through in response headers when
they shouldn't.

AP_DECLARE(int) ap_set_keepalive(request_rec *r)
{
int ka_sent = 0;
int wimpy = ap_find_token(r->pool,
apr_table_get(r->headers_out, "Connection"),
"close");
const char *conn = apr_table_get(r->headers_in, "Connection");

/* The following convoluted conditional determines whether or not
* the current connection should remain persistent after this response
* (a.k.a. HTTP Keep-Alive) and whether or not the output message
* body should use the HTTP/1.1 chunked transfer-coding. In English,
*
* IF we have not marked this connection as errored;
* and the response body has a defined length due to the status code
* being 304 or 204, the request method being HEAD, already
* having defined Content-Length or Transfer-Encoding: chunked, or
* the request version being HTTP/1.1 and thus capable of being set
* as chunked [we know the (r->chunked = 1) side-effect is ugly];
* and the server configuration enables keep-alive;
* and the server configuration has a reasonable inter-request timeout;
* and there is no maximum # requests or the max hasn't been reached;
* and the response status does not require a close;
* and the response generator has not already indicated close;
* and the client did not request non-persistence (Connection: close);
* and we haven't been configured to ignore the buggy twit
* or they're a buggy twit coming through a HTTP/1.1 proxy
* and the client is requesting an HTTP/1.0-style keep-alive
* or the client claims to be HTTP/1.1 compliant (perhaps a proxy);
* THEN we can be persistent, which requires more headers be output.
*
* Note that the condition evaluation order is extremely important.
*/
if ((r->connection->keepalive != AP_CONN_CLOSE)
&& ((r->status == HTTP_NOT_MODIFIED)
|| (r->status == HTTP_NO_CONTENT)
|| r->header_only
|| apr_table_get(r->headers_out, "Content-Length")
|| ap_find_last_token(r->pool,
apr_table_get(r->headers_out,
"Transfer-Encoding"),
"chunked")
|| ((r->proto_num >= HTTP_VERSION(1,1))
&& (r->chunked = 1))) /* THIS CODE IS CORRECT, see above. */
&& r->server->keep_alive
&& (r->server->keep_alive_timeout > 0)
&& ((r->server->keep_alive_max == 0)
|| (r->server->keep_alive_max > r->connection->keepalives))
&& !ap_status_drops_connection(r->status)
&& !wimpy
&& !ap_find_token(r->pool, conn, "close")
&& (!apr_table_get(r->subprocess_env, "nokeepalive")
|| apr_table_get(r->headers_in, "Via"))
&& ((ka_sent = ap_find_token(r->pool, conn, "keep-alive"))
|| (r->proto_num >= HTTP_VERSION(1,1)))) {
int left = r->server->keep_alive_max - r->connection->keepalives;

r->connection->keepalive = AP_CONN_KEEPALIVE;
r->connection->keepalives++;

/* If they sent a Keep-Alive token, send one back */
if (ka_sent) {
if (r->server->keep_alive_max) {
apr_table_setn(r->headers_out, "Keep-Alive",
apr_psprintf(r->pool, "timeout=%d, max=%d",
(int)apr_time_sec(r->server->keep_alive_timeout),
left));
}
else {
apr_table_setn(r->headers_out, "Keep-Alive",
apr_psprintf(r->pool, "timeout=%d",
(int)apr_time_sec(r->server->keep_alive_timeout)));
}
apr_table_mergen(r->headers_out, "Connection", "Keep-Alive");
}

return 1;
}

/* Otherwise, we need to indicate that we will be closing this
* connection immediately after the current response.
*
* We only really need to send "close" to HTTP/1.1 clients, but we
* always send it anyway, because a broken proxy may identify itself
* as HTTP/1.0, but pass our request along with our HTTP/1.1 tag
* to a HTTP/1.1 client. Better safe than sorry.
*/
if (!wimpy) {
apr_table_mergen(r->headers_out, "Connection", "close");
}

r->connection->keepalive = AP_CONN_CLOSE;

return 0;
}

Graham Dumpleton

unread,
Sep 17, 2008, 1:02:31 AM9/17/08
to mod...@googlegroups.com
2008/9/17 Graham Dumpleton <graham.d...@gmail.com>:

> 2008/9/17 Graham Dumpleton <graham.d...@gmail.com>:
>>> Note that mod_wsgi doesn't generate a 'Connection' header, but Apache
>>> underneath it can for case where error response is being generated. I
>>> wouldn't expect therefore to see a 'Connection' header in debug output
>>> from that script as I don't think a WSGI application is supposed to be
>>> generating them, but check anyway.
>
> And below is the hideous bit of Apache code that determines whether it
> forces a close. I always cringe when I look at this particular bit of
> code.
>
> Worth noting is that it will look from 'Connection: close' from an
> application so actually reasonable that WSGI application could return
> it.
>
> As for capturing output from WSGI application with that debug code I
> pointed out, you may want to change the pprint calls so they use
> repr() instead. This way it will show if any garbage characters from
> Unicode or something getting passed through in response headers when
> they shouldn't.

Also perhaps look out for:

http://code.google.com/p/modwsgi/issues/detail?id=81

In embedded mode of mod_wsgi if a response header value has embedded
new lines in it, that isn't being fixed up and so can pass through
back to client. This could cause a strict checking proxy to fail.

In other words, may be nothing to do with 'Connection: close' but
simply part of a value being interpreted as being a header name but
failing as not a valid format for a header line.

Graham

Jorge Vargas

unread,
Sep 17, 2008, 3:09:01 AM9/17/08
to mod...@googlegroups.com
FIrst of all thank you very much for all your help. See response below.

I think this one nailed it, together with the debugging using repr.
here is the extract from the logs.

[Wed Sep 17 01:53:26 2008] [error] [client 127.0.0.1] ('RESPONSE',,
referer: http://dev.activengine.com/
[Wed Sep 17 01:53:26 2008] [error] [client 127.0.0.1] '302 Found',,
referer: http://dev.activengine.com/
[Wed Sep 17 01:53:26 2008] [error] [client 127.0.0.1]
"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),
('Set-Cookie', 'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA2NTNSRU1PVEVfVVNFUj1tYWU~\\\\n;
Path=/;')]"), referer: http://dev.activengine.com/

see how the Set-Cookie header is adding "\\\\n" that is a newline, so
it seems paste.auth is responsible.

bottom line it seems like paste.auth.cookie.AuthCookieHandler or the
middleware on top of it. is causing a bad header, and it has nothing
to do with this "connection". I'll try to reproduce the issue without
pylons in the middle and see if I can track it down on paste.

Again thank you.

Graham Dumpleton

unread,
Sep 17, 2008, 3:25:26 AM9/17/08
to mod...@googlegroups.com
Please let me know what the Pylons/Paste people have to say about it.
I raised this problem before on Python WEB-SIG, where Ian Bicking and
others listen and they never commented. That issue I referenced has a
reference to the Python WEB-SIG discussion.

Anyway, since this has now come up a second time, I better make a
concrete change in mod_wsgi to flag it as an error, or somehow cope
with it. Problem is I felt that no consensus came out of discussion on
Python WEB-SIG so don't know what WSGI folks believe should be done.

Graham

2008/9/17 Jorge Vargas <jorge....@gmail.com>:

Jorge Vargas

unread,
Sep 17, 2008, 5:41:17 AM9/17/08
to mod...@googlegroups.com
On Wed, Sep 17, 2008 at 1:25 AM, Graham Dumpleton
<graham.d...@gmail.com> wrote:
>
> Please let me know what the Pylons/Paste people have to say about it.
> I raised this problem before on Python WEB-SIG, where Ian Bicking and
> others listen and they never commented. That issue I referenced has a
> reference to the Python WEB-SIG discussion.
>
I haven't been able to reproduce the issue without pylons as auth
middleware needs something to post to it. so I wrote a little script
that will post using urllib2, and I found something interesting.

using this as my wsgi app.

def simple_app(environ, start_response):
"""Simplest possible application object"""
status = '200 OK'
response_headers = [('Content-type','text/plain')]
start_response(status, response_headers)
return ['Hello world!\n']

application = simple_app

my post script is returning the following.

Hello world!

{'h': {'content-length': '13', 'vary': 'User-Agent,Accept-Encoding',
'server': 'Apache/2.0.52 (Red Hat) mod_wsgi/2.0 Python/2.5',
'connection': 'close', 'date': 'Wed, 17 Sep 2008 09:29:04 GMT',
'content-type': 'text/plain'}, 'c': 200, 'r':
'http://dev.activengine.com/', 'u': 'http://dev.activengine.com/',
'd': {'url': 'http://dev.activengine.com/fooobar', 'username': 'mae',
'password': 'themis'}}

so after all it does seems the connection close is being generated by
apache/mod_wsgi.

On the other front. I haven't been able to reproduce the newline
behavior with standalone paste. I have the following test which I
added to http://trac.pythonpaste.org/pythonpaste/browser/Paste/trunk/tests/test_auth/test_auth_cookie.py

def test_complex(key='', val=''):
cookie_name = 'auth_';
secret = 'sadsadasdsafdsfdskjdfskjskdljsfkljsdkjsdlf';
timeout = 30;

data = dict(
cookie_name=cookie_name,
scanlist=['REMOTE_USER'], secret=secret,
timeout=timeout, maxlen=1024
)
app = build(dump_environ,{key:val},**data)
(status,headers,content,errors) = \
raw_interactive(app)
value = header_value(headers,'Set-Cookie')
assert "Path=/;" in value
assert "expires=" not in value
cookie = value.split(";")[0]
(status,headers,content,errors) = \
raw_interactive(app,{'HTTP_COOKIE': cookie})
assert ("%s: %s" % (key,val.replace("\n","\n "))) in content
sys.stderr.write(repr(cookie))

and I'm getting "proper" cookies, but since this isn't a real environ
I'm not sure if it's valid. On the other hand on the server, no matter
how I change the values for the "secret" I'm always getting the same
pattern. which is leading me to believe that this may be a bug on my
middleware that wraps paste.auth. Specially since the first value
(from sep 13) doesn't has the extra \\\\n.

[Sat Sep 13 08:56:15 2008] [error] [client 196.40.10.250]
HTTP_COOKIE: 'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxNDEzNTZSRU1PVEVfVVNFUj1tYWU~',
referer: http://dev.activengine.com:3652/login/error/document
[Sat Sep 13 08:56:15 2008] [error] [client 196.40.10.250]
paste.cookies: (<SimpleCookie:
tw_auth='RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxNDEzNTZSRU1PVEVfVVNFUj1tYWU~'>,
'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxNDEzNTZSRU1PVEVfVVNFUj1tYWU~'),
referer: http://dev.activengine.com:3652/login/error/document
[Wed Sep 17 00:27:30 2008] [error] [client 127.0.0.1]
'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA1MjdSRU1PVEVfVVNFUj1tYWU~\\n;
Path=/;')]), referer: http://dev.activengine.com/login
[Wed Sep 17 01:38:53 2008] [error] [client 127.0.0.1]
'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA2MzhSRU1PVEVfVVNFUj1tYWU~\\n;


Path=/;')]), referer: http://dev.activengine.com/
[Wed Sep 17 01:53:26 2008] [error] [client 127.0.0.1]
"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),
('Set-Cookie', 'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA2NTNSRU1PVEVfVVNFUj1tYWU~\\\\n;
Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:08:20 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA4MDhSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:10:00 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA4MTBSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:25:19 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA4MjVSRU1PVEVfVVNFUj1tYWU~\\\\n;
Path=/;')]"), referer: http://dev.activengine.com/login
[Wed Sep 17 03:26:35 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA4MjZSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:27:31 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA4MjdSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:28:56 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA4MjhSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:33:26 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=RPhlpOBTs4Z9jGkro9iFjY0VKW4yMDA4MDkxODA4MzNSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:36:38 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=pC_HYx9QiF5+YKGtfBtJLx96kSIyMDA4MDkxODA4MzZSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:40:16 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=jA9qvGY3myDhIIZEjU7vVOG85OUyMDA4MDkxODA4NDBSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:40:53 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=jA9qvGY3myDhIIZEjU7vVOG85OUyMDA4MDkxODA4NDBSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:43:11 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=xUaINrm2QdgRmAC12OJlRKu_dgkyMDA4MDkxODA4NDNSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:43:44 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=Cv7hB4mlSsh+Wt5K7tWpoa1waIQyMDA4MDkxODA4NDNSRU1PVEVfVVNFUj1tYWU~\\\\n;


Path=/;')]"), referer: http://dev.activengine.com/

[Wed Sep 17 03:44:13 2008] [error] [client 127.0.0.1]


"[('content-type', 'text/html; charset=UTF-8'), ('Pragma',
'no-cache'), ('Cache-Control', 'no-cache'), ('location',
'http://dev.activengine.com/'), ('Content-Length', '237'),

('Set-Cookie', 'tw_auth=EJUBioCxaQKBcMUe7I0cPBlIXGAyMDA4MDkxODA4NDRSRU1PVEVfVVNFUj1tYWU~\\\\n;
Path=/;')]"), referer: http://dev.activengine.com/login
[mae@web1 logs]$

Graham Dumpleton

unread,
Sep 17, 2008, 6:27:04 AM9/17/08
to mod...@googlegroups.com
Since WSGI application takes a dictionary as input, you should be able
to use a slightly modified version of the second code example in that
section of debugging document to capture both the request environ and
the request input. You should then be able to create a test harness
which would allow you to replay that information captured from real
application. That way you have a better guarantee of generating same
execution environment. You might even dump out os.environ so you know
what that is and possibly restore that as well.

Changes to the script would be to strip out from WSGI environment
stuff like wsgi.errors, wsgi.file_wrapper etc etc, ie., things that
are implementation specific and need to be changed in test harness
anyway, and then save headers as a pickle so easier to restore.

Got the idea?

Jorge Vargas

unread,
Sep 18, 2008, 1:06:12 AM9/18/08
to mod...@googlegroups.com
On Wed, Sep 17, 2008 at 4:27 AM, Graham Dumpleton
<graham.d...@gmail.com> wrote:
>
> Since WSGI application takes a dictionary as input, you should be able
> to use a slightly modified version of the second code example in that
> section of debugging document to capture both the request environ and
> the request input. You should then be able to create a test harness
> which would allow you to replay that information captured from real
> application. That way you have a better guarantee of generating same
> execution environment. You might even dump out os.environ so you know
> what that is and possibly restore that as well.
>
> Changes to the script would be to strip out from WSGI environment
> stuff like wsgi.errors, wsgi.file_wrapper etc etc, ie., things that
> are implementation specific and need to be changed in test harness
> anyway, and then save headers as a pickle so easier to restore.
>
> Got the idea?
>
yes thank you that's a great idea. In fact if I polish that a little
it could be a nice way to find out complex bugs (ones tha require a
series of steps) even better if done right it could serve as the base
for a performance test based on user actions. The more I learn about
WSGI the more I like it, and again thank you for all your help. Right
now I'm busy working with something else but I'm going to go ahead and
work on this as soon as I can and post back the results.

Brian Smith

unread,
Sep 18, 2008, 11:22:52 AM9/18/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> Please let me know what the Pylons/Paste people have to say about it.
> I raised this problem before on Python WEB-SIG, where Ian
> Bicking and others listen and they never commented. That
> issue I referenced has a reference to the Python WEB-SIG discussion.
>
> Anyway, since this has now come up a second time, I better
> make a concrete change in mod_wsgi to flag it as an error, or
> somehow cope with it. Problem is I felt that no consensus
> came out of discussion on Python WEB-SIG so don't know what
> WSGI folks believe should be done.

IMO, the current behavior is fine. The application is responsible for all
escaping and for ensuring that every header field is well-formed. The WSGI
gateway is just responsible for gluing them together. If the application
messes up then they will get undefined behavior.

This is demonstrated by the CGI gateway in PEP 333, even if the prose of the
PEP doesn't explicitly define it. More generally, 8f there is some part of
the specification that is ambiguous or under-specified, then WSGI gwateways
should minic the behavior of example gateway unless the example gateway's
behavior is totally nonsensical.

You could parse every header field returned by the application and raise a
500 Internal Server Error when the application would cause you to create an
invalid header. However, that will complicate the code and slow down
well-written applications. Perhaps you could ship a "link" middleware that
would do this for the people that want it.

Regards,
Brian

Graham Dumpleton

unread,
Sep 18, 2008, 7:14:38 PM9/18/08
to mod...@googlegroups.com
2008/9/19 Brian Smith <br...@briansmith.org>:

I wouldn't expect that a check of each header for embedded newline,
would cause that significant a slow down. :-)

Graham

Brian Smith

unread,
Sep 19, 2008, 9:35:34 AM9/19/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> I wouldn't expect that a check of each header for embedded
> newline, would cause that significant a slow down. :-)

It isn't that simple. In HTTP, quoted strings and comments can contain an
embedded newline if it is prefixed with a backslash, but otherwise the
backslash escape mechanism cannot be used. Plus, comments can be nested
recursively, so you cannot even parse them with regular expressions in
theory--though, in practice, you can create a regular expression that can
match up to 5 levels of nesting and that will be more than good enough.

As an optimization, you could do a simple search for a newline, and if you
find one, reparse the header field to take into consideration the escaping
rules mentioned above. That would be fast for the vast majority of cases
where no header fields contain a newline.

I think there is also the problem that you cannot know where to parse things
using the quoted-string production and where you can parse things using the
TEXT production (which allows unmatched quoted strings), unless you know
beforehand the BNF for the specific header field you are trying to parse.

- Brian

Graham Dumpleton

unread,
Sep 23, 2008, 3:28:59 AM9/23/08
to mod...@googlegroups.com
2008/9/19 Brian Smith <br...@briansmith.org>:

But given WSGI closeness to CGI, wouldn't the statement in CGI
specification RFC3875:

Note that each header field in
a CGI-Response MUST be specified on a single line; CGI/1.1 does not
support continuation lines.

effectively take precedence over that.

I think this is the only way one could have it if one wants WSGI to be
portable to CGI as a hosting mechanism.

Graham

Brian Smith

unread,
Sep 24, 2008, 7:34:16 PM9/24/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> 2008/9/19 Brian Smith <br...@briansmith.org>:

> > I think there is also the problem that you cannot know
> > where to parse things using the quoted-string production
> > and where you can parse things using the TEXT production
> > (which allows unmatched quoted strings), unless you know
> > beforehand the BNF for the specific header field you are
> > trying to parse.
>
> But given WSGI closeness to CGI, wouldn't the statement in
> CGI specification RFC3875:
>
> Note that each header field in
> a CGI-Response MUST be specified on a single line; CGI/1.1 does not
> support continuation lines.
>
> effectively take precedence over that.
>
> I think this is the only way one could have it if one wants
> WSGI to be portable to CGI as a hosting mechanism.

I agree that continuations should not be allowed in WSGI. But, escaped
newlines are different from continuations:


Continuation: Blah<CR><LF> asdfasdfadsf
Malformed-Continuation: Blah<CR><LF>asdfasdfasdf
Quoted-Newline: "foo\<LF>"
Quoted-Newline-or-Error: (maybe this is a comment, \<LF>maybe not)

The last case is definitely ambiguous without knowing the grammar of the
header. However, comments are so rare (except for User-Agent) that you could
get by with assuming that there are no quoted newlines in comments. However,
you still have to watch out for quoted newlines in quoted-string. But, I
re-read RFC 2616 and it seems like it intends for a double quote to always
start a quoted-string, regardless of the grammar of the header field in
which it appears. So, things are not as tricky as I had suspected.

Regards,
Brian

Graham Dumpleton

unread,
Sep 24, 2008, 7:55:27 PM9/24/08
to mod...@googlegroups.com
2008/9/25 Brian Smith <br...@briansmith.org>:

Ignore what it says about continuations, read it as two separate


statements. The first part says:

Note that each header field in a CGI-Response MUST be specified on a
single line;

Using a newline in a quote string would seem to violate that if one
interprets it in the simple sense of a single line is anything up to a
newline character, ignoring any escaping.

As reference I always look at what Apache does, and Apache for CGI
does not support a newline character anywhere in value, whether it be
escaped or not. Thus, you may want to try and support it, but it isn't
going to be portable and would break for CGI and SCGI just to start
with.

Graham

Brian Smith

unread,
Sep 25, 2008, 9:15:29 AM9/25/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> 2008/9/25 Brian Smith <br...@briansmith.org>:

> Ignore what it says about continuations, read it as two
> separate statements. The first part says:
>
> Note that each header field in a CGI-Response MUST be
> specified on a single line;

> Using a newline in a quote string would seem to violate that
> if one interprets it in the simple sense of a single line is
> anything up to a newline character, ignoring any escaping.

I think of a string with an escaped newline as being on the same line.

Actually, PEP 333 already explicitly disallows newlines anyway: "Each
header_value must not include any control characters, including carriage
returns or linefeeds, either embedded or at the end." I don't know how I
missed that earlier.

- Brian

Jorge Vargas

unread,
Oct 1, 2008, 5:52:57 PM10/1/08
to mod...@googlegroups.com, paste...@pythonpaste.org
Sorry for the late reply, I have been swapped with work lately. I
think I made some progress today, at least I think I found the culprit

On Tue, Sep 16, 2008 at 9:32 PM, Jorge Vargas <jorge....@gmail.com> wrote:
>
> A little off topic, since I'm forced to have this mod_proxy, wouldn't
> a better setup be to eliminate mod_wsgi, and just have their main
> mod_proxy send requests to a paster serve ?

I went for this setup so right now I have the same code running from
two deployments.

So having mod_proxy -> paster serve is giving exactly the same 302
headers error. Which means the bad header is being generated by
paste.auth. With this I can confirm Graham's theory that mod_wsgi and
apache by extend are just passing along what paste generated.

For someone reading from paste-users mailing list here is the full
thread http://groups.google.com/group/modwsgi/browse_thread/thread/16043503d522d45d/e0904f844aa125f2?#e0904f844aa125f

Reply all
Reply to author
Forward
0 new messages