How to correctly & efficiently handle http/1.1 style pipelines requests using libuv & http-parser?

669 views
Skip to first unread message

Dhruv Matani

unread,
Nov 10, 2012, 9:44:20 AM11/10/12
to li...@googlegroups.com
Hello,

I'm trying to implement a custom http/1.1 in-process web server using libuv & http-parser and wanted to know the best way to handle pipelined requests.

Currently, I am calling http_parser_execute() every time the on_read callback is fired, but I realize that the data in the on_read callback might include 2 or more HTTP requests. How do things work out in that case? Is the on_message_complete callback called synchronously before the on_read callback returns? If so, then does the http parser buffer any data itself in case of paused requests (see below)?

Subsequently, in the on_message_complete callback, I call uv_read_stop(), which supposedly stops invoking callbacks. However, the thing I am unable to understand is what happens when data for multiple HTTP requests is passed to the http parser since I reinitialize the parser using http_parser_init() in the after_write callback. I am pretty sure that there can be cases where the parser has parsed 2 requests, but the on_message_complete handler has been invoked for just 1 (since I return 1 from that callback), and once I re-initialize the parser, the 2nd request is lost.

I am thinking that http_parser_pause() is the right method to be calling in the on_message_complete callback (instead of calling uv_read_stop or maybe both depending on the use-case??), and subsequently resuming it in the after_write callback. However, in this case, I am having a problem understanding what happens to the rest of the requests in the pipeline, and how does the http parser ensure that the callbacks are correctly invoked for them all? Does the http parser return a short read count if you call http_parser_execute() on a paused parser?

Confused as hell,
-Dhruv.

Dhruv Matani

unread,
Nov 10, 2012, 10:21:03 AM11/10/12
to li...@googlegroups.com
Okay, been trying to figure this out and it seems that http_parser_execute() will return a short read count, and I somehow need to buffer that data and replay it into the parser. Is that correct? Also, does the short read count mean both an error and that parsing was paused? If so, how to distinguish between the 2?

Ben Noordhuis

unread,
Nov 10, 2012, 12:07:44 PM11/10/12
to li...@googlegroups.com
On Sat, Nov 10, 2012 at 3:44 PM, Dhruv Matani <dhru...@gmail.com> wrote:
> Hello,
>
> I'm trying to implement a custom http/1.1 in-process web server using libuv
> & http-parser and wanted to know the best way to handle pipelined requests.
>
> Currently, I am calling http_parser_execute() every time the on_read
> callback is fired, but I realize that the data in the on_read callback might
> include 2 or more HTTP requests. How do things work out in that case? Is the
> on_message_complete callback called synchronously before the on_read
> callback returns? If so, then does the http parser buffer any data itself in
> case of paused requests (see below)?

No, http_parser never buffers and yes, the on_message_complete
callback is called synchronously. In fact, all callbacks are.

It's something of an API design flaw. Callbacks only make sense when
invoked asynchronously. If http_parser used return values instead of
callbacks, we wouldn't have a need for http_parser_pause(). But I
digress...

> Subsequently, in the on_message_complete callback, I call uv_read_stop(),
> which supposedly stops invoking callbacks. However, the thing I am unable to
> understand is what happens when data for multiple HTTP requests is passed to
> the http parser since I reinitialize the parser using http_parser_init() in
> the after_write callback. I am pretty sure that there can be cases where the
> parser has parsed 2 requests, but the on_message_complete handler has been
> invoked for just 1 (since I return 1 from that callback), and once I
> re-initialize the parser, the 2nd request is lost.

You check the 'bytes parsed' count that http_parser_execute() returns.
If it's less than the size of the input but there's no error, the
buffer contains a message boundary. The basic logic looks something
like this:

void read_cb(uv_stream_t* handle, ssize_t nread, uv_buf_t buf)
{
const http_parser_settings* settings = ...;
http_parser* parser = ...;
const char* data = buf.data;
size_t len = buf.len;

if (nread == -1) {
// handle read error / EOF
return;
}

for (;;) {
size_t nparsed = http_parser_execute(parser, settings, data, len);

if (nparsed == len)
return; // ok

if (parser->http_errno != HPE_OK) {
// handle parse error
return;
}

// buffer contains two messages, we've parsed
// the first one, now start on the next one
http_parser_init(parser, HTTP_BOTH);
data += nparsed;
len -= nparsed;

Dhruv Matani

unread,
Nov 10, 2012, 12:59:58 PM11/10/12
to li...@googlegroups.com


On Saturday, November 10, 2012 12:07:45 PM UTC-5, Ben Noordhuis wrote:
On Sat, Nov 10, 2012 at 3:44 PM, Dhruv Matani <dhru...@gmail.com> wrote:
> Hello,
>
> I'm trying to implement a custom http/1.1 in-process web server using libuv
> & http-parser and wanted to know the best way to handle pipelined requests.
>
> Currently, I am calling http_parser_execute() every time the on_read
> callback is fired, but I realize that the data in the on_read callback might
> include 2 or more HTTP requests. How do things work out in that case? Is the
> on_message_complete callback called synchronously before the on_read
> callback returns? If so, then does the http parser buffer any data itself in
> case of paused requests (see below)?

No, http_parser never buffers and yes, the on_message_complete
callback is called synchronously.  In fact, all callbacks are.

It's something of an API design flaw.  Callbacks only make sense when
invoked asynchronously.  If http_parser used return values instead of
callbacks, we wouldn't have a need for http_parser_pause().  But I
digress...

Okay - I sort of see what you mean. that http_parser_pause() isn't strictly needed because the callback handler can always return non-zero and indicate an error, and the caller can then check the http_errno variable to check if the code is HPE_CB_message_complete right?

Currently, I am checking if the state is HPE_PAUSED in case of a short-read, and using http_parser_pause() function, which as you mention isn't strictly needed. The on_message_complete() callback sets the paused state of the parser.

Is this understanding correct?

Also 2 more questions:

1. When I run the app (server) in valgrind, it occasionally fails:

==32240== 
==32240== Process terminating with default action of signal 13 (SIGPIPE)
==32240==    at 0x42556A1: writev (writev.c:51)
==32240==    by 0x806D8DD: uv__write (stream.c:534)
==32240==    by 0x806EA3E: uv_write2 (stream.c:960)
==32240==    by 0x806EACB: uv_write (stream.c:981)
==32240==    by 0x8063913: write_response(client_t*, int, char const*, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >&, std::string&) (httpserver.cpp:82)
==32240==    by 0x8053365: handle_suggest(client_t*, parsed_url_t&) (main.cpp:657)
==32240==    by 0x8053A7C: serve_request(client_t*) (main.cpp:697)
==32240==    by 0x8064559: on_message_complete(http_parser*) (httpserver.cpp:305)
==32240==    by 0x80868C4: http_parser_execute (http_parser.c:1640)
==32240==    by 0x8063BA3: on_resume_read(client_t*, partial_buf_t&) (httpserver.cpp:124)
==32240==    by 0x806407E: after_write(uv_write_s*, int) (httpserver.cpp:219)
==32240==    by 0x806DC42: uv__write_callbacks (stream.c:624)

but it never fails under normal operation. This happens when I kill a busy client.
Should I ignore that signal at the application level?

2. Every time I unpause the http-parser, the data field is set to garbage, and I need to reset it to the right value - is this expected?

Thanks for clearing up a lot of the confusion!

-Dhruv.

Ben Noordhuis

unread,
Nov 10, 2012, 2:00:35 PM11/10/12
to li...@googlegroups.com
Sorry, my previous post wasn't quite correct. http_parser_execute()
will start parsing the next message automatically unless:

a) You pause the parser (like you do), or
b) The message contains an Upgrade header. That's the case where
http_parser_execute(p, buf, len) can return a value < len while
HTTP_PARSER_ERRNO(p) == HPE_OK.
Yes.

> 2. Every time I unpause the http-parser, the data field is set to garbage,
> and I need to reset it to the right value - is this expected?

http_parser_pause() doesn't touch the data field. It sounds like
there's some memory corruption in your application.

Dhruv Matani

unread,
Nov 10, 2012, 2:52:04 PM11/10/12
to li...@googlegroups.com

> but it never fails under normal operation. This happens when I kill a busy
> client.
> Should I ignore that signal at the application level?

Yes.

okay.
 

> 2. Every time I unpause the http-parser, the data field is set to garbage,
> and I need to reset it to the right value - is this expected?

http_parser_pause() doesn't touch the data field.  It sounds like
there's some memory corruption in your application.

I'm running it under gdb with a watch set on the memory location parser->data. This what gdb says:

http message parsed
Hardware watchpoint 5: *0x80a0c10

Old value = 134876000
New value = 2173
http_parser_pause (parser=0x80a0bf8, paused=1) at http_parser.c:2177
2177 }

The old value actually was 134876000, and somehow it is being set to 2173.

Finally traced it down to the fact that the object (http_parser_g.o) was compiled with the debug flag ON and the rest of the app (struct http_parser) was compiled w/o it, which is probably why the SET_ERRNO() macro is setting error_lineno to the current line, which happens to be the data field in the non-debug version.

Crazy! Do you see any way to detect this? Probably a runtime magic value field that is set to different constants in production on debug mode.

Regards,
-Dhruv.

Dhruv Matani

unread,
Nov 10, 2012, 3:00:45 PM11/10/12
to li...@googlegroups.com
Okay - got it - so it's the special case of the UGGRADE header that you are accounting for here eh?

Alternatively, I can make the callback return an error and check if the error was raised by that callback (again checking HTTP_PARSER_ERRNO(p)) eh?

Mostly clear now! Thanks!

-Dhruv.

Ben Noordhuis

unread,
Nov 10, 2012, 5:46:06 PM11/10/12
to li...@googlegroups.com
On Sat, Nov 10, 2012 at 8:52 PM, Dhruv Matani <dhru...@gmail.com> wrote:
>> http_parser_pause() doesn't touch the data field. It sounds like
>> there's some memory corruption in your application.
>
> I'm running it under gdb with a watch set on the memory location
> parser->data. This what gdb says:
>
> http message parsed
> Hardware watchpoint 5: *0x80a0c10
>
> Old value = 134876000
> New value = 2173
> http_parser_pause (parser=0x80a0bf8, paused=1) at http_parser.c:2177
> 2177 }
>
> The old value actually was 134876000, and somehow it is being set to 2173.
>
> Finally traced it down to the fact that the object (http_parser_g.o) was
> compiled with the debug flag ON and the rest of the app (struct http_parser)
> was compiled w/o it, which is probably why the SET_ERRNO() macro is setting
> error_lineno to the current line, which happens to be the data field in the
> non-debug version.
>
> Crazy! Do you see any way to detect this? Probably a runtime magic value
> field that is set to different constants in production on debug mode.

Ah, right - HTTP_PARSER_DEBUG. That was a bad idea from the start,
conditionally changing sizeof(struct http_parser). I've removed it in
[1].

[1] https://github.com/joyent/http-parser/commit/245f6f0
Reply all
Reply to author
Forward
0 new messages