Handling line-leading white space in an HTTP response

153 views
Skip to first unread message

Anh Dũng Phan

<pad6521@gmail.com>
unread,
Oct 31, 2021, 11:47:35 PM10/31/21
to seastar-dev
Hello guys,

We are now managing a reverse proxy using seastar as its network stack. We recently ran into this issue in which an http resonse (coming form the backend) which has the wrong format causes an sstring overflow exception to occur in the parsing stage. I have traced and found out that this happened somewhere in the file response_parser.hh. This eventually leads to that response (although not in the correct format, see below) never reaches its destination.
In particular, the response has a whitespace in front of one of its field names, like this:

HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT
 Server: Apache/2.2.14 (Win32)
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
Content-Length: 88
Content-Type: text/html

Notice the whitespace before the "Server" field name.
I'm aksing for anyway to handle this situation other than raising an sstring overflow, for example: ignoring the whole line and the response can still be futher processed. Any help would be greatly appriaciated!

Regards,
Dũng

Avi Kivity

<avi@scylladb.com>
unread,
Nov 1, 2021, 12:36:48 PM11/1/21
to Anh Dũng Phan, seastar-dev


On 01/11/2021 05.47, Anh Dũng Phan wrote:
Hello guys,

We are now managing a reverse proxy using seastar as its network stack. We recently ran into this issue in which an http resonse (coming form the backend) which has the wrong format causes an sstring overflow exception to occur in the parsing stage. I have traced and found out that this happened somewhere in the file response_parser.hh. This eventually leads to that response (although not in the correct format, see below) never reaches its destination.
In particular, the response has a whitespace in front of one of its field names, like this:

HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT


That's quite an old page.


 Server: Apache/2.2.14 (Win32)
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
Content-Length: 88
Content-Type: text/html


Here are the relevant lines from response_parser.rl:


header_1st = (field sp_ht* ':' value :> crlf) %assign_field;
header_cont = (sp_ht+ value sp_ht* crlf) %extend_field;


So as far as I can tell, the parser should treat it as one long header:


    Date: Mon, 27 Jul 2009 12:28:53 GMT Server: Apache/2.2.14 (Win32)


Notice the whitespace before the "Server" field name.
I'm aksing for anyway to handle this situation other than raising an sstring overflow, for example: ignoring the whole line and the response can still be futher processed. Any help would be greatly appriaciated!



Per my reading of the code it should work. Can you catch the exception in gdb and see where it comes from?



Regards,
Dũng

--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/9cb1040d-5fb3-4267-a4c1-83be69732a96n%40googlegroups.com.

Anh Dũng Phan

<pad6521@gmail.com>
unread,
Nov 2, 2021, 3:32:04 AM11/2/21
to seastar-dev
From this:

tchar = alpha | digit | '-' | '!' | '#' | '$' | '%' | '&' | '\'' | '*'
        | '+' | '.' | '^' | '_' | '`' | '|' | '~';

and this:

field = tchar+ >mark %store_field_name;

I believe that "field" can not contain a whitespace in it. When parsing something like the example I just showed (a whitespace in front of a "field"), eventually an undefined behavior will occur, since this case is not covered at all in the state machine:

field = tchar+ >mark %store_field_name;
value = any* >mark %store_value;
start_line = http_version space status_code space status_string crlf;
header_1st = (field sp_ht* ':' value :> crlf) %assign_field;
header_cont = (sp_ht+ value sp_ht* crlf) %extend_field;
header = header_1st header_cont*;
main := start_line header* :> (crlf @done);

From the above piece of code, I can say with confident that after a crlf, there must be a "field" that can not contain any whitespace, and "  Server: Apache/2.2.14 (Win32)" breaks this rule.
The sstring overflow is the result of that undefined behavior. It was caused by passing NULL to an sstring constuctor. I can go further into this but I guess there is no point since it is undefined behavior after all.

My experience with the ragel language is extremely limited so for now I can not properly handle this case, do you have any advices? I have tried things like:

tchar = alpha | digit | '-' | '!' | '#' | '$' | '%' | '&' | '\'' | '*'
        | '+' | '.' | '^' | '_' | '`' | '|' | '~' | ' ';

or:

crlf = '\r\n ' | '\r\n' ;

but they did not work, I guess because there are various other places with whitespace in the response.

Nadav Har'El

<nyh@scylladb.com>
unread,
Nov 2, 2021, 4:40:10 AM11/2/21
to Anh Dũng Phan, seastar-dev
On Mon, Nov 1, 2021 at 5:47 AM Anh Dũng Phan <pad...@gmail.com> wrote:
Hello guys,

We are now managing a reverse proxy using seastar as its network stack. We recently ran into this issue in which an http resonse (coming form the backend) which has the wrong format causes an sstring overflow exception to occur in the parsing stage. I have traced and found out that this happened somewhere in the file response_parser.hh. This eventually leads to that response (although not in the correct format, see below) never reaches its destination.
In particular, the response has a whitespace in front of one of its field names, like this:

HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT
 Server: Apache/2.2.14 (Win32)

Starting a header line with a space is wrong:

As https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.4 explains,  this used to mean a "folded" header - a continuation of the header of the previous line, but this usage was deprecated long ago. Note that this server, Apache 2.2.14, is 12 years old by now, so it's still possible that it is using this deprecated protocol feature, but...
What's weird is that in your case, it seems that this is *not* a case of header folding (continuing a header on the next line), but rather a separate header that somehow got a space in the beginning.
I wonder what could possibly cause this - what this some strange/failed attempt to prevent Apache from advertising its version?

 
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
Content-Length: 88
Content-Type: text/html

Notice the whitespace before the "Server" field name.
I'm aksing for anyway to handle this situation other than raising an sstring overflow, for example: ignoring the whole line and the response can still be futher processed. Any help would be greatly appriaciated!

What is a "sstring overflow"? Do we have some sort of buffer overflow (a serious security risk!) or we throw an exception (which sounds fine)?

Anyway, since this is HTTP *client* code, not *server* code, I agree that it's reasonable to let protocol errors like this one be ignored, instead of failing the request. We could even make the client code's strict RFC compliance optional.

Anh Dũng Phan

<pad6521@gmail.com>
unread,
Nov 2, 2021, 5:03:11 AM11/2/21
to seastar-dev
Note that this server, Apache 2.2.14, is 12 years old by now, so it's still possible that it is using this deprecated protocol feature, but...

Actually that is just an example I made up, so no need for speculate about that old Apache server, the real response I ran into looks like this:

Untitled.png
Notice the space in front of enableVersionHeader.

What is a "sstring overflow"? Do we have some sort of buffer overflow (a serious security risk!) or we throw an exception (which sounds fine)?

Luckily it threw an exception :)

I agree that it's reasonable to let protocol errors like this one be ignored, instead of failing the request. We could even make the client code's strict RFC compliance optional.

I totally agree with this. This is our current situation at the momment and I guess the best way to handle this is not to fail the request.

Nadav Har'El

<nyh@scylladb.com>
unread,
Nov 2, 2021, 5:27:07 AM11/2/21
to Anh Dũng Phan, seastar-dev
On Tue, Nov 2, 2021 at 11:03 AM Anh Dũng Phan <pad...@gmail.com> wrote:
Note that this server, Apache 2.2.14, is 12 years old by now, so it's still possible that it is using this deprecated protocol feature, but...

Actually that is just an example I made up, so no need for speculate about that old Apache server, the real response I ran into looks like this:

Untitled.png
Notice the space in front of enableVersionHeader.

This makes more sense - it looks like a bona-fide (though silly) HTTP header continuation - the server wasn't trying to set the "enableVersionHeader" header, it was trying to tell you on the X-Powered-By header, after the word ASP.NET, that the version header was disabled so don't expect to see one.

The server still shouldn't have done this - this "header folding" feature hasn't been used in decades and formally deprecated for almost a decade.
I'm guessing this ASP.NET installation is even older than the Apache version you quoted ;-)
 

What is a "sstring overflow"? Do we have some sort of buffer overflow (a serious security risk!) or we throw an exception (which sounds fine)?

Luckily it threw an exception :)

I agree that it's reasonable to let protocol errors like this one be ignored, instead of failing the request. We could even make the client code's strict RFC compliance optional.

I totally agree with this. This is our current situation at the momment and I guess the best way to handle this is not to fail the request.

Although I'm in favor of adding a "lenient HTTP parsing" option to the HTTP client code, I want to point out that the RFC suggests that your proxy actually shouldn't sweep this situation under the rug, and the request *should* be failed. The RFC says: (https://datatracker.ietf.org/doc/html/rfc7230)

   A proxy or gateway that receives an obs-fold in a response message
   that is not within a message/http container MUST either discard the
   message and replace it with a 502 (Bad Gateway) response, preferably
   with a representation explaining that unacceptable line folding was
   received, or replace each received obs-fold with one or more SP
   octets prior to interpreting the field value or forwarding the
   message downstream.

Anh Dũng Phan

<pad6521@gmail.com>
unread,
Nov 2, 2021, 6:14:49 AM11/2/21
to seastar-dev
Thanks you so much Nadav for your information!

Regarding my second message, silly me, actually seastar seems to support this kind of decades-old deprecated "folded" header:

header_1st = (field sp_ht* ':' value :> crlf) %assign_field;
header_cont = (sp_ht+ value sp_ht* crlf) %extend_field;

Notice that in the second line, sp_ht means either sp (white space) or tab (\t). So basically the first line will handle the line before the white space, and the second line will handle the additional information.
Anyway, the fact that a whitespace raises an sstring exception although seastar clearly supports it still remains true (a possible seastar bug, maybe?). That means more debugging for me, yay :(

Once again, thanks a lot Nadav and Avi, and forgive me for my limited ragel knowledge.

Anh Dũng Phan

<pad6521@gmail.com>
unread,
Nov 4, 2021, 4:48:27 AM11/4/21
to seastar-dev
Hello guys,

Regarding handling obs-fold in http header, I have finally found the root cause to the problem. However, I want to hear you guys' opinion on this, since there is a high chance that this is a seastar bug.

Steps to reproduce:
Foward to the client a http response (from the backend) with a segment like this (obs-fold style):

Untitled.png

Expectation: The client will receive something like this:

Untitled2.png
Reality: an sstring overflow is thrown, parsing fails so the reponse never reaches the client.

What I found out is that the problem comes this:

header_cont = (sp_ht+ value sp_ht* crlf) %extend_field;

Notice that value and sp_ht* stay next to each other. I can see that it is intended to eliminate trailing  whitespaces (or tabs). However, I suspect that they conflict with each other, causing the parser to fail, since both can either contain spaces or not. Putting them together will cause a logical error (trailing whitespace(s) can belong to any of them, or both).
With that in mind, I applied a fix by removing the sp_ht*, so now the code looks like this:

header_cont = (sp_ht+ value crlf) %extend_field;

And the bug just disappears immidiately, parsing succeeded and the client gets this:

Untitled2.png

Can you guys give me your thoughts on this? Am I right? Why does the fix works? Maybe confirm that value and sp_ht* really do conflict each other, or otherwise, they dont (forgive me for my limited ragel knowledge).
Also, it would mean the world for me if some of you can do a quick test with conditions as above and share with me the result. Thank you so much.

Nadav Har'El

<nyh@scylladb.com>
unread,
Nov 4, 2021, 4:50:53 PM11/4/21
to Anh Dũng Phan, Asias He, Wojciech Mitros, seastar-dev
On Thu, Nov 4, 2021 at 10:48 AM Anh Dũng Phan <pad...@gmail.com> wrote:
Hello guys,

Regarding handling obs-fold in http header, I have finally found the root cause to the problem. However, I want to hear you guys' opinion on this, since there is a high chance that this is a seastar bug.

I wouldn't be surprised. As far as I can tell the src/http/response_parser.rl code was added six years ago, in commit 51adb20bd by Asias, initially as some sort of test / demo.
Seastar's HTTP server code has since been used in a number of projects, but I think that the HTTP client code has received much less attention, and I wouldn't be surprised if it has bugs.

As I'll show below, I'm certain that it does have bugs - even bugs that we fixed for the HTTP server code and not for the HTTP client code.


Steps to reproduce:
Foward to the client a http response (from the backend) with a segment like this (obs-fold style):

Untitled.png

Expectation: The client will receive something like this:

Untitled2.png
Reality: an sstring overflow is thrown, parsing fails so the reponse never reaches the client.

What I found out is that the problem comes this:

header_cont = (sp_ht+ value sp_ht* crlf) %extend_field;

Notice that value and sp_ht* stay next to each other. I can see that it is intended to eliminate trailing  whitespaces (or tabs). However, I suspect that they conflict with each other, causing the parser to fail, since both can either contain spaces or not. Putting them together will cause a logical error (trailing whitespace(s) can belong to any of them, or both).

The HTTP 1.1 protocol (see https://www.rfc-editor.org/rfc/rfc2616#section-4.2) indeed specifies that headers can end with white space that should be ignored:

   The field-content does not include any leading or trailing LWS:
   linear white space occurring before the first non-whitespace
   character of the field-value or after the last non-whitespace
   character of the field-value. Such leading or trailing LWS MAY be
   removed without changing the semantics of the field value.

This specification is not only awkward, it also means that these headers cannot parsed by a textbook finite state machine, because you may have N spaces at the end of the field and only when you finally reach the CR-LF after the N spaces, suddenly you realize that these N spaces shouldn't be part of the field content.

I think the only solution is for the parser to include these trailing spaces and not try to remove them - and we can remove them later, in C++ code that will remove the trailing spaces. The parser can also help that last removal step by remembering when it saw the last non-space character (a classic FSM can't do this, but Ragel might be able to do this).

The strange thing is that it seems that in response_parser.rl only header_cont has this broken attempt to recognize trailing whitespace - the "header_1st" rule doesn't! So why do we need it in header_cont? I don't see any reason. It's a pure and simple bug.

In fact, we had the same bug in the HTTP server code, and it was fixed in efd61d38401f2bbe87ade8cfe7502f53b5f35e25 by Wojciech Mitros. We need to fix the HTTP client code as well.

Come to think of it, I don't even know why (accept for historic accident) we have separate code for HTTP client and server header parsing. It should have been the same code :-(

The fix in efd61d38401f2bbe87ade8cfe7502f53b5f35e25 is a bit more elaborate than just removing the sp_ht*, but it indeed starts with removing it.


Anh Dũng Phan

<pad6521@gmail.com>
unread,
Nov 5, 2021, 4:08:03 AM11/5/21
to seastar-dev
The strange thing is that it seems that in response_parser.rl only header_cont has this broken attempt to recognize trailing whitespace - the "header_1st" rule doesn't! So why do we need it in header_cont? I don't see any reason. It's a pure and simple bug.

I totally agree. It makes no sense to check for only "header_cont" and then do nothing with "header_1st", my only guess  is that the author tried it for  "header_1st" too and and then parser failed (of course). I tried it myself and now any request (even without obs-fold) failed the exact same way. The fact that you said "initially as some sort of test / demo" makes me believe this even more.

Come to think of it, I don't even know why (accept for historic accident) we have separate code for HTTP client and server header parsing. It should have been the same code :-(

Agree! They are basically the same code doing the exact same thing.

Now I want to fix this bug, can you tell me where do I start? Do I have to register an issue on https://github.com/scylladb/seastar/issues, or I can just clone the code -> make a branch -> fix it -> then make a pull request, or something else?

Anh Dũng Phan

<pad6521@gmail.com>
unread,
Nov 5, 2021, 10:40:23 PM11/5/21
to seastar-dev

Once again, thank you so much for your prescious time!

Nadav Har'El

<nyh@scylladb.com>
unread,
Nov 7, 2021, 10:59:05 AM11/7/21
to Anh Dũng Phan, seastar-dev
On Fri, Nov 5, 2021 at 10:08 AM Anh Dũng Phan <pad...@gmail.com> wrote:
The strange thing is that it seems that in response_parser.rl only header_cont has this broken attempt to recognize trailing whitespace - the "header_1st" rule doesn't! So why do we need it in header_cont? I don't see any reason. It's a pure and simple bug.

I totally agree. It makes no sense to check for only "header_cont" and then do nothing with "header_1st", my only guess  is that the author tried it for  "header_1st" too and and then parser failed (of course). I tried it myself and now any request (even without obs-fold) failed the exact same way. The fact that you said "initially as some sort of test / demo" makes me believe this even more.

Come to think of it, I don't even know why (accept for historic accident) we have separate code for HTTP client and server header parsing. It should have been the same code :-(

Agree! They are basically the same code doing the exact same thing.

Now I want to fix this bug, can you tell me where do I start? Do I have to register an issue on https://github.com/scylladb/seastar/issues,

Indeed it's always good to begin by creating an issue describing the bug. This will be helpful even if you don't end up fixing the bug.
Later, the patch will refer to issue #1234 with the text "Fixes #1234", so github will automatically close this issue when the patch is committed.

It would also be a good idea to write a unit test that reproduces this bug in response parsing - so you can be sure you are fixing a real bug, and that your fix actually fixes it

Then, you can send the fix a in the form of either a pull request sending a patch to the mailing list, as you prefer.

In this case I guess you can opt for a "small" fix, fixing this specific bug in the response parser, or a bigger change of using the (better quality) request parser parts also for the response. But I don't know if this bigger fix is actually feasible - how much Ragel allows reusing pieces of parsers, maybe it's not possible?
 
Reply all
Reply to author
Forward
0 new messages