Transfer-Encoding chunked in daemon mode

689 views
Skip to first unread message

dreag...@gmx.de

unread,
Apr 27, 2012, 11:34:19 AM4/27/12
to modwsgi
Hello,

I have a client sending chunked requests and configured mod_wsgi to be
in daemon mode.

The request looks like this
POST adsf HTTP/1.1
Host: adsf:adsf
SOAPAction: ""
User-Agent: AAAAAAAAAAAAAAA/4.34
Accept: */*
Content-Type: text/xml
Transfer-Encoding: chunked

56d
<SOAP-ENV:Envelope
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:cwmp="urn:dslforum-org:cwmp-1-0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<SOAP-ENV:Header>
<cwmp:ID SOAP-ENV:mustUnderstand="1">100</cwmp:ID>
</SOAP-ENV:Header>
<SOAP-ENV:Body>
<cwmp:Inform>
<DeviceId>
<Manufacturer>AAAAA</Manufacturer>
<OUI>AAAAAA</OUI>
<ProductClass>AAAAAAAAAAAA</ProductClass>
<SerialNumber>AAAAAAAAAAAA</SerialNumber>
</DeviceId>
<Event SOAP-ENC:arrayType="cwmp:EventStruct[2]">
<EventStruct>
<EventCode>0 BOOTSTRAP</EventCode>
<CommandKey></CommandKey>
</EventStruct>
<EventStruct>
<EventCode>4 VALUE CHANGE</EventCode>
<CommandKey></CommandKey>
</EventStruct>
</Event>
<MaxEnvelopes>1</MaxEnvelopes>
<CurrentTime>2012-04-26T14:02:37</CurrentTime>
<RetryCount>1</RetryCount>
<ParameterList SOAP-ENC:arrayType="cwmp:ParameterValueStruct[8]">
<ParameterValueStruct>
<Name>InternetGatewayDevice.DeviceSummary</Name>
<Value xsi:type="xsd:string">InternetGatewayDevice:1.4[](Baseline:
1, EthernetLAN:1, WiFiLAN:1, EthernetWAN:1, ADSLWAN:1, IPPing:1,
DSLDiagnostics:1, Time:1), VoiceService:1.0[1](Endpoint:1, SIPEndpoin
573
t:1)</Value>
</ParameterValueStruct>
<ParameterValueStruct>
<Name>InternetGatewayDevice.DeviceInfo.SpecVersion</Name>
<Value xsi:type="xsd:string">1.0</Value>
</ParameterValueStruct>
<ParameterValueStruct>
<Name>InternetGatewayDevice.DeviceInfo.HardwareVersion</Name>
<Value xsi:type="xsd:string">AAAAAAAAAAAA</Value>
</ParameterValueStruct>
<ParameterValueStruct>
<Name>InternetGatewayDevice.DeviceInfo.SoftwareVersion</Name>
<Value xsi:type="xsd:string">AAAAAAAAAAAAAAAAAAAAAAA</Value>
</ParameterValueStruct>
<ParameterValueStruct>
<Name>InternetGatewayDevice.DeviceInfo.ProvisioningCode</Name>
<Value xsi:type="xsd:string"></Value>
</ParameterValueStruct>
<ParameterValueStruct>
<Name>InternetGatewayDevice.ManagementServer.ConnectionRequestURL</
Name>
<Value xsi:type="xsd:string">http://AAAAAAAAAAAAAAAAAA/CWMP/
ConnectionRequest</Value>
</ParameterValueStruct>
<ParameterValueStruct>
<Name>InternetGatewayDevice.ManagementServer.ParameterKey</Name>
<Value xsi:type="xsd:string"></Value>
</ParameterValueStruct>
<ParameterValueStruct>
<Name>InternetGatewayDevice.WANDevice.3.WANConnectionDevice.
1.WANPPPConnection.1.ExternalIPAddress</Name>
<Value xsi:type="xsd:string">AAAAAAAAAAAAA</Value>
</ParameterValueStruct>
</ParameterList>
</cwmp:Inform>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
0

From a HTTP/1.1 perspective the request if fine, i checked if the
chunk-sizes are correctly submitted and if all \r\n are correctly
send.
Furthermore I set the WSGIChunkedRequest On.
However if I do a environ['wsgi.input'].read() the following error
message is generated.

[Fri Apr 27 09:09:42 2012] [error] [client 127.0.0.1] Traceback (most
recent call last):
[Fri Apr 27 09:09:42 2012] [error] [client 127.0.0.1] File "/tmp/
wsgi", line 3, in application
[Fri Apr 27 09:09:42 2012] [error] [client 127.0.0.1] print
environ['wsgi.input'].read()
[Fri Apr 27 09:09:42 2012] [error] [client 127.0.0.1] IOError: request
data read error

I search through mod_wsgi.c and apaches modules/http/http_filters.c
and for me it looks like that the HTTP body send to the daemon is
*free* from the chucking information. Which means no annoying (56d\r
\n, 574\r\n and \r\n0\r\n\r\n the end).
Inside the daemon the 'ap_http_filter' is added again to the input-
chain and then called by ap_get_client_block when the application
needs data. However in this scenario 'ap_http_filter' inside the
daemon process has *no* Content-Length header (obvious because it is
chunking) and *no* chunking information inside the body itself, like
the trailing '0'
'ap_http_filter' has no chance and returns APR_EOF to
ap_get_client_block. ap_get_client_block interprets APR_EOF as an
error and returns -1 which is converted by mod_wsgi.c to the exception
above.

Right now I see this solutions
1) Don't solve this at mod_wsgi level and write a separate
apache_module which does the 'dechunking of its own'. This module
reads the *whole* request at once (similar to mod_request in
apache2.4) then it can calculate the body size and replaces the
'Transfer-Encoding' header with a Content-Length.

2) Do it like above but integrate the code it to mod_wsgi itself. This
code runs then before the request is send to the daemon.

3) Don't run ap_http_filter at all inside the daemon. I mean
'ap_http_filter' runs already in the process which accepts the
request. So why execute it a second time in the daemon itself? To be
more precise this line from 2007 makes me wondering
'ap_add_input_filter("HTTP_IN", NULL, r, r->connection);'
I disabled this line just for fun and then it works, because the only
filter called now is 'ap_core_input_filter' which return APR_SUCCESS
if no more data is coming.
On the other side I can image that there is a use case why this line
makes sense, I don't simply see it.

If there is a good reason not do option three I have no problem with
the other ones, because the requests our application expects will not
blow up memory.

Regards,
Stephan


Stephan Hofmockel

unread,
May 2, 2012, 4:04:56 AM5/2/12
to mod...@googlegroups.com
A college of mine tried to Option 1) but did not succeed. I guess the reason is that in daemon mode mod_wsgi sends the headers to the daemon *before* calling into the filter chain. So even if our filter adds Content-Length to the headers, they will never be send to the daemon.

Graham Dumpleton

unread,
May 6, 2012, 10:36:40 PM5/6/12
to mod...@googlegroups.com
On 28 April 2012 01:34, dreag...@gmx.de <dreag...@gmx.de> wrote:
> I search through mod_wsgi.c and apaches modules/http/http_filters.c
> and for me it looks like that the HTTP body send to the daemon is
> *free* from the chucking information. Which means no annoying (56d\r
> \n, 574\r\n and \r\n0\r\n\r\n the end).
> Inside the daemon the 'ap_http_filter' is added again to the input-
> chain and then called by ap_get_client_block when the application
> needs data. However in this scenario 'ap_http_filter' inside the
> daemon process has *no* Content-Length header (obvious because it is
> chunking) and *no* chunking information inside the body itself, like
> the trailing '0'
> 'ap_http_filter' has no chance and returns APR_EOF to
> ap_get_client_block. ap_get_client_block interprets APR_EOF as an
> error and returns -1 which is converted by mod_wsgi.c to the exception
> above.

Not quite. The filter doesn't get confused about lack of chunking in
body as Transfer-Encoding chunked header doesn't exist so it wasn't
expecting it in the first place. It quite happily reads all the
content, it is what happens after it has read all the content that is
the problem. Specifically, ap_get_client_block() is only designed to
work with a content length or with chunked. It cannot handle unbound
normal content with no length. Thus rather than see APR_EOF as end of
input it generates an error.

In practice the code shouldn't be using ap_get_client_block(). The
comments in it even say as much:

/* We lose the failure code here. This is why ap_get_client_block should
* not be used.
*/

> Right now I see this solutions

A quick hack is to do:

if (n == -1) {
if (wsgi_daemon_process && self->r->read_chunked &&
self->r->connection->keepalive) {

/* Have exhausted all the available input data. */

self->done = 1;
}
else {
PyErr_SetString(PyExc_IOError, "request data read error");
Py_DECREF(result);
return NULL;
}
}
else if (n == 0) {
/* Have exhausted all the available input data. */

self->done = 1;
}

length += n;

Next would be not to use ap_get_client_block(), duplicating what it
did and allowing it to detect the APR_EOF properly. That also
potentially means not using ap_setup_client_block and
ap_should_client_block() as well, at which point you are starting to
skip a lot of magic.

Whether HTTP_IN can be removed am not sure. I don't remember if there
was specific reason it was there for daemon. I was trying to cheat by
simulating the same filter stack in daemon so didn't have to have two
WSGI input implementations.

The last solution is to have daemon specific wrappers directly around
the socket connection and avoid all the Apache muck being in there.

Graham

> 1) Don't solve this at mod_wsgi level and write a separate
> apache_module which does the 'dechunking of its own'. This module
> reads the *whole* request at once (similar to mod_request in
> apache2.4)  then it can calculate the body size and replaces the
> 'Transfer-Encoding' header with a Content-Length.
>
> 2) Do it like above but integrate the code it to mod_wsgi itself. This
> code runs then before the request is send to the daemon.
>
> 3) Don't run ap_http_filter at all inside the daemon. I mean
> 'ap_http_filter' runs already in the process which accepts the
> request. So why execute it a second time in the daemon itself? To be
> more precise this line from 2007 makes me wondering
> 'ap_add_input_filter("HTTP_IN", NULL, r, r->connection);'
> I disabled this line just for fun and then it works, because the only
> filter called now is 'ap_core_input_filter' which return APR_SUCCESS
> if no more data is coming.
> On the other side I can image that there is a use case why this line
> makes sense, I don't simply see it.
>
> If there is a good reason not do option three I have no problem with
> the other ones, because the requests our application expects will not
> blow up memory.
>
> Regards,
>  Stephan
>
>
> --
> You received this message because you are subscribed to the Google Groups "modwsgi" group.
> To post to this group, send email to mod...@googlegroups.com.
> To unsubscribe from this group, send email to modwsgi+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
>

Graham Dumpleton

unread,
May 7, 2012, 3:39:10 AM5/7/12
to mod...@googlegroups.com
Thinking about this some more, the most appropriate solution seems to
be that remains broken in mod_wsgi 3.X. In mod_wsgi 4.0 where support
for Apache 1.3 has been thrown out, then use the native bucket brigade
API calls.

In other words, the only reason that the old style and deprecated API
was used was because was still supporting Apache 1.3. In mod_wsgi 4.0
don't have that limitation can do it right without needing two
different implementations.

Graham

Stephan Hofmockel

unread,
May 7, 2012, 10:13:46 AM5/7/12
to mod...@googlegroups.com
Hello Graham,
thanks for the response.

In other words, the only reason that the old style and deprecated API was used was because was still supporting Apache 1.3. In mod_wsgi 4.0 don't have that limitation can do it right without needing two different implementations.

You mean in mod_wsgi 4.0 (which is the current trunk I guess) a rewrite of Input_read is possible. So no usage of ap_*_client_block is needed anymore, instead use the bucket brigade API directly ?
Then in daemon mode the registration of the HTTP_IN input_filter is obsolete too ?
I like this approach, because it makes it possible to share the code for Input_read in daemon and embedded mode.
Do you think such an improvement will make it to mod_wsgi-4.0 ? What is the timeline for 4.0 ?

For the long run mod_wsgi 4.0 will definitely work for us. Right know we are working with mod_wsgi-3.3, here I tried to work on a mod_dechunk myself and it seems to work.
https://github.com/stephan-hof/mod_dechunk, still a prototype regarding error checking, logging and parametrization but already functional. Final version must come in the next two weeks.
I know its not a good approach if requests go big, but for our application this is not expected.

Regards,
Stephan

Graham Dumpleton

unread,
May 7, 2012, 8:08:10 PM5/7/12
to mod...@googlegroups.com
On 8 May 2012 00:13, Stephan Hofmockel <dreag...@gmx.de> wrote:
> Hello Graham,
> thanks for the response.
>
>> In other words, the only reason that the old style and deprecated API was
>> used was because was still supporting Apache 1.3. In mod_wsgi 4.0 don't have
>> that limitation can do it right without needing two different
>> implementations.
>
>
> You mean in mod_wsgi 4.0 (which is the current trunk I guess) a rewrite of
> Input_read is possible. So no usage of ap_*_client_block is needed anymore,
> instead use the bucket brigade API directly ?

It could.

> Then in daemon mode the registration of the HTTP_IN input_filter is obsolete
> too ?

The bucket brigade still works through the filter chain. Whether
HTTP_IN could be removed is a separate issue.

> I like this approach, because it makes it possible to share the code for
> Input_read in daemon and embedded mode.
> Do you think such an improvement will make it to mod_wsgi-4.0 ?

I have been wanting to do it for a long time.

> What is the timeline for 4.0 ?

Don't know. Things dragging on so long that have back ported a lot of
stuff so can bring out 3.4 instead. Usually I wouldn't put new
features in a minor release but will this time.

Graham

> For the long run mod_wsgi 4.0 will definitely work for us. Right know we are
> working with mod_wsgi-3.3, here I tried to work on a mod_dechunk myself and
> it seems to work.
> https://github.com/stephan-hof/mod_dechunk, still a prototype regarding
> error checking, logging and parametrization but already functional. Final
> version must come in the next two weeks.
> I know its not a good approach if requests go big, but for our application
> this is not expected.
>
> Regards,
> Stephan
>
> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/modwsgi/-/wRYshQ7wZWMJ.

Stephan Hofmockel

unread,
Mar 6, 2013, 5:00:05 AM3/6/13
to mod...@googlegroups.com
Hi David,

I don't know if keeping all data of a single request in memory is an option for you. For us it is acceptable and we are running this on many servers in production.
https://github.com/stephan-hof/mod_dechunk

Regards,
 Stephan

Am Montag, 4. März 2013 13:04:26 UTC+1 schrieb david...@gmail.com:
Hi Graham,

In continuation to this thread, I am working on getting Openstack Swift to work with Apache and mod_wsgi and using daemon mode. The luck of support for Transfer-Encoding chunked seem be one of the last pieces of the puzzle.

I get it that it did not make it into 4.0 - is there a 4.1 planned? Does it seem to be high enough in the priority?

Thanks
David

Eran Rom

unread,
Apr 30, 2014, 1:33:14 PM4/30/14
to mod...@googlegroups.com
I was wondering if there are any news on the subject. Browsing the bug list I did could not identify any related bug (but it may be my apache/wsgi ignorance).
Does it make sense to open a bug? Were there any additional requests for having mod_wsgi in daemon mode working with chunked transfer encoding.
As a previous poster my use case is also getting Openstack Swift working with Apache and mod_wsgi.

Thanks very much!
Eran

Graham Dumpleton

unread,
Apr 30, 2014, 8:36:13 PM4/30/14
to mod...@googlegroups.com
For various reasons, mod_wsgi development has been stuck in a deep dark hole for a number of years. In general that hasn't mattered as the last version was very stable and so bugs simply weren't arising.

All the same, there has been a very slow dribble of minor improvements such as this which have been requested but haven't been done.

The hopefully good news is that am getting very close now to pulling mod_wsgi from that deep dark hole and giving it a rebirth.

I don't want to say too much about it right now but shouldn't be too long now before I can.

I have added this issue into a fresh list of new work which would be targeted once development is resurrected. This is not on the existing Google Code site where mod_wsgi is currently held. The Google Code site will be deprecated with everything there being moved off as things start up again. Only important issues from the bug tracker there will be moved across so can start over and not instantly feel overwhelmed and fall back into the hole.

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.

To post to this group, send email to mod...@googlegroups.com.
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages