I have a client sending chunked requests and configured mod_wsgi to be
in daemon mode.
The request looks like this
POST adsf HTTP/1.1
Host: adsf:adsf
SOAPAction: ""
User-Agent: AAAAAAAAAAAAAAA/4.34
Accept: */*
Content-Type: text/xml
Transfer-Encoding: chunked
From a HTTP/1.1 perspective the request if fine, i checked if the
chunk-sizes are correctly submitted and if all \r\n are correctly
send.
Furthermore I set the WSGIChunkedRequest On.
However if I do a environ['wsgi.input'].read() the following error
message is generated.
I search through mod_wsgi.c and apaches modules/http/http_filters.c
and for me it looks like that the HTTP body send to the daemon is
*free* from the chucking information. Which means no annoying (56d\r
\n, 574\r\n and \r\n0\r\n\r\n the end).
Inside the daemon the 'ap_http_filter' is added again to the input-
chain and then called by ap_get_client_block when the application
needs data. However in this scenario 'ap_http_filter' inside the
daemon process has *no* Content-Length header (obvious because it is
chunking) and *no* chunking information inside the body itself, like
the trailing '0'
'ap_http_filter' has no chance and returns APR_EOF to
ap_get_client_block. ap_get_client_block interprets APR_EOF as an
error and returns -1 which is converted by mod_wsgi.c to the exception
above.
Right now I see this solutions
1) Don't solve this at mod_wsgi level and write a separate
apache_module which does the 'dechunking of its own'. This module
reads the *whole* request at once (similar to mod_request in
apache2.4) then it can calculate the body size and replaces the
'Transfer-Encoding' header with a Content-Length.
2) Do it like above but integrate the code it to mod_wsgi itself. This
code runs then before the request is send to the daemon.
3) Don't run ap_http_filter at all inside the daemon. I mean
'ap_http_filter' runs already in the process which accepts the
request. So why execute it a second time in the daemon itself? To be
more precise this line from 2007 makes me wondering
'ap_add_input_filter("HTTP_IN", NULL, r, r->connection);'
I disabled this line just for fun and then it works, because the only
filter called now is 'ap_core_input_filter' which return APR_SUCCESS
if no more data is coming.
On the other side I can image that there is a use case why this line
makes sense, I don't simply see it.
If there is a good reason not do option three I have no problem with
the other ones, because the requests our application expects will not
blow up memory.
A college of mine tried to Option 1) but did not succeed. I guess the reason is that in daemon mode mod_wsgi sends the headers to the daemon *before* calling into the filter chain. So even if our filter adds Content-Length to the headers, they will never be send to the daemon.
On 28 April 2012 01:34, dreagon...@gmx.de <dreagon...@gmx.de> wrote:
> I search through mod_wsgi.c and apaches modules/http/http_filters.c
> and for me it looks like that the HTTP body send to the daemon is
> *free* from the chucking information. Which means no annoying (56d\r
> \n, 574\r\n and \r\n0\r\n\r\n the end).
> Inside the daemon the 'ap_http_filter' is added again to the input-
> chain and then called by ap_get_client_block when the application
> needs data. However in this scenario 'ap_http_filter' inside the
> daemon process has *no* Content-Length header (obvious because it is
> chunking) and *no* chunking information inside the body itself, like
> the trailing '0'
> 'ap_http_filter' has no chance and returns APR_EOF to
> ap_get_client_block. ap_get_client_block interprets APR_EOF as an
> error and returns -1 which is converted by mod_wsgi.c to the exception
> above.
Not quite. The filter doesn't get confused about lack of chunking in
body as Transfer-Encoding chunked header doesn't exist so it wasn't
expecting it in the first place. It quite happily reads all the
content, it is what happens after it has read all the content that is
the problem. Specifically, ap_get_client_block() is only designed to
work with a content length or with chunked. It cannot handle unbound
normal content with no length. Thus rather than see APR_EOF as end of
input it generates an error.
In practice the code shouldn't be using ap_get_client_block(). The
comments in it even say as much:
/* We lose the failure code here. This is why ap_get_client_block should
* not be used.
*/
> Right now I see this solutions
A quick hack is to do:
if (n == -1) {
if (wsgi_daemon_process && self->r->read_chunked &&
self->r->connection->keepalive) {
/* Have exhausted all the available input data. */
self->done = 1;
}
else {
PyErr_SetString(PyExc_IOError, "request data read error");
Py_DECREF(result);
return NULL;
}
}
else if (n == 0) {
/* Have exhausted all the available input data. */
self->done = 1;
}
length += n;
Next would be not to use ap_get_client_block(), duplicating what it
did and allowing it to detect the APR_EOF properly. That also
potentially means not using ap_setup_client_block and
ap_should_client_block() as well, at which point you are starting to
skip a lot of magic.
Whether HTTP_IN can be removed am not sure. I don't remember if there
was specific reason it was there for daemon. I was trying to cheat by
simulating the same filter stack in daemon so didn't have to have two
WSGI input implementations.
The last solution is to have daemon specific wrappers directly around
the socket connection and avoid all the Apache muck being in there.
> 1) Don't solve this at mod_wsgi level and write a separate
> apache_module which does the 'dechunking of its own'. This module
> reads the *whole* request at once (similar to mod_request in
> apache2.4) then it can calculate the body size and replaces the
> 'Transfer-Encoding' header with a Content-Length.
> 2) Do it like above but integrate the code it to mod_wsgi itself. This
> code runs then before the request is send to the daemon.
> 3) Don't run ap_http_filter at all inside the daemon. I mean
> 'ap_http_filter' runs already in the process which accepts the
> request. So why execute it a second time in the daemon itself? To be
> more precise this line from 2007 makes me wondering
> 'ap_add_input_filter("HTTP_IN", NULL, r, r->connection);'
> I disabled this line just for fun and then it works, because the only
> filter called now is 'ap_core_input_filter' which return APR_SUCCESS
> if no more data is coming.
> On the other side I can image that there is a use case why this line
> makes sense, I don't simply see it.
> If there is a good reason not do option three I have no problem with
> the other ones, because the requests our application expects will not
> blow up memory.
> Regards,
> Stephan
> --
> You received this message because you are subscribed to the Google Groups "modwsgi" group.
> To post to this group, send email to modwsgi@googlegroups.com.
> To unsubscribe from this group, send email to modwsgi+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
> On 28 April 2012 01:34, dreagon...@gmx.de <dreagon...@gmx.de> wrote:
>> I search through mod_wsgi.c and apaches modules/http/http_filters.c
>> and for me it looks like that the HTTP body send to the daemon is
>> *free* from the chucking information. Which means no annoying (56d\r
>> \n, 574\r\n and \r\n0\r\n\r\n the end).
>> Inside the daemon the 'ap_http_filter' is added again to the input-
>> chain and then called by ap_get_client_block when the application
>> needs data. However in this scenario 'ap_http_filter' inside the
>> daemon process has *no* Content-Length header (obvious because it is
>> chunking) and *no* chunking information inside the body itself, like
>> the trailing '0'
>> 'ap_http_filter' has no chance and returns APR_EOF to
>> ap_get_client_block. ap_get_client_block interprets APR_EOF as an
>> error and returns -1 which is converted by mod_wsgi.c to the exception
>> above.
> Not quite. The filter doesn't get confused about lack of chunking in
> body as Transfer-Encoding chunked header doesn't exist so it wasn't
> expecting it in the first place. It quite happily reads all the
> content, it is what happens after it has read all the content that is
> the problem. Specifically, ap_get_client_block() is only designed to
> work with a content length or with chunked. It cannot handle unbound
> normal content with no length. Thus rather than see APR_EOF as end of
> input it generates an error.
> In practice the code shouldn't be using ap_get_client_block(). The
> comments in it even say as much:
> /* We lose the failure code here. This is why ap_get_client_block should
> * not be used.
> */
>> Right now I see this solutions
> A quick hack is to do:
> if (n == -1) {
> if (wsgi_daemon_process && self->r->read_chunked &&
> self->r->connection->keepalive) {
> /* Have exhausted all the available input data. */
> self->done = 1;
> }
> else {
> PyErr_SetString(PyExc_IOError, "request data read error");
> Py_DECREF(result);
> return NULL;
> }
> }
> else if (n == 0) {
> /* Have exhausted all the available input data. */
> self->done = 1;
> }
> length += n;
> Next would be not to use ap_get_client_block(), duplicating what it
> did and allowing it to detect the APR_EOF properly. That also
> potentially means not using ap_setup_client_block and
> ap_should_client_block() as well, at which point you are starting to
> skip a lot of magic.
> Whether HTTP_IN can be removed am not sure. I don't remember if there
> was specific reason it was there for daemon. I was trying to cheat by
> simulating the same filter stack in daemon so didn't have to have two
> WSGI input implementations.
> The last solution is to have daemon specific wrappers directly around
> the socket connection and avoid all the Apache muck being in there.
Thinking about this some more, the most appropriate solution seems to
be that remains broken in mod_wsgi 3.X. In mod_wsgi 4.0 where support
for Apache 1.3 has been thrown out, then use the native bucket brigade
API calls.
In other words, the only reason that the old style and deprecated API
was used was because was still supporting Apache 1.3. In mod_wsgi 4.0
don't have that limitation can do it right without needing two
different implementations.
In other words, the only reason that the old style and deprecated API was
> used was because was still supporting Apache 1.3. In mod_wsgi 4.0 don't > have that limitation can do it right without needing two different > implementations.
You mean in mod_wsgi 4.0 (which is the current trunk I guess) a rewrite of Input_read is possible. So no usage of ap_*_client_block is needed anymore, instead use the bucket brigade API directly ? Then in daemon mode the registration of the HTTP_IN input_filter is obsolete too ? I like this approach, because it makes it possible to share the code for Input_read in daemon and embedded mode. Do you think such an improvement will make it to mod_wsgi-4.0 ? What is the timeline for 4.0 ?
For the long run mod_wsgi 4.0 will definitely work for us. Right know we are working with mod_wsgi-3.3, here I tried to work on a mod_dechunk myself and it seems to work. https://github.com/stephan-hof/mod_dechunk, still a prototype regarding error checking, logging and parametrization but already functional. Final version must come in the next two weeks. I know its not a good approach if requests go big, but for our application this is not expected.
On 8 May 2012 00:13, Stephan Hofmockel <dreagon...@gmx.de> wrote:
> Hello Graham,
> thanks for the response.
>> In other words, the only reason that the old style and deprecated API was
>> used was because was still supporting Apache 1.3. In mod_wsgi 4.0 don't have
>> that limitation can do it right without needing two different
>> implementations.
> You mean in mod_wsgi 4.0 (which is the current trunk I guess) a rewrite of
> Input_read is possible. So no usage of ap_*_client_block is needed anymore,
> instead use the bucket brigade API directly ?
It could.
> Then in daemon mode the registration of the HTTP_IN input_filter is obsolete
> too ?
The bucket brigade still works through the filter chain. Whether
HTTP_IN could be removed is a separate issue.
> I like this approach, because it makes it possible to share the code for
> Input_read in daemon and embedded mode.
> Do you think such an improvement will make it to mod_wsgi-4.0 ?
I have been wanting to do it for a long time.
> What is the timeline for 4.0 ?
Don't know. Things dragging on so long that have back ported a lot of
stuff so can bring out 3.4 instead. Usually I wouldn't put new
features in a minor release but will this time.
> For the long run mod_wsgi 4.0 will definitely work for us. Right know we are
> working with mod_wsgi-3.3, here I tried to work on a mod_dechunk myself and
> it seems to work.
> https://github.com/stephan-hof/mod_dechunk, still a prototype regarding
> error checking, logging and parametrization but already functional. Final
> version must come in the next two weeks.
> I know its not a good approach if requests go big, but for our application
> this is not expected.
> To post to this group, send email to modwsgi@googlegroups.com.
> To unsubscribe from this group, send email to
> modwsgi+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/modwsgi?hl=en.
In continuation to this thread, I am working on getting Openstack Swift to work with Apache and mod_wsgi and using daemon mode. The luck of support for Transfer-Encoding chunked seem be one of the last pieces of the puzzle.
I get it that it did not make it into 4.0 - is there a 4.1 planned? Does it seem to be high enough in the priority?
> Do you think such an improvement will make it to mod_wsgi-4.0 ?
> I have been wanting to do it for a long time.
> > What is the timeline for 4.0 ?
> Don't know. Things dragging on so long that have back ported a lot of > stuff so can bring out 3.4 instead. Usually I wouldn't put new > features in a minor release but will this time.
I don't know if keeping all data of a single request in memory is an option for you. For us it is acceptable and we are running this on many servers in production.
https://github.com/stephan-hof/mod_dechunk
Regards,
Stephan
Am Montag, 4. März 2013 13:04:26 UTC+1 schrieb david...@gmail.com:
> In continuation to this thread, I am working on getting Openstack Swift to > work with Apache and mod_wsgi and using daemon mode. The luck of support > for Transfer-Encoding chunked seem be one of the last pieces of the > puzzle.
> I get it that it did not make it into 4.0 - is there a 4.1 planned? Does > it seem to be high enough in the priority?
> Thanks
> David
> > Do you think such an improvement will make it to mod_wsgi-4.0 ?
>> I have been wanting to do it for a long time.
>> > What is the timeline for 4.0 ?
>> Don't know. Things dragging on so long that have back ported a lot of >> stuff so can bring out 3.4 instead. Usually I wouldn't put new >> features in a minor release but will this time.