upload handling

1 view
Skip to first unread message

Adam

unread,
Mar 31, 2008, 10:03:07 PM3/31/08
to ebbebb
Hi, I read on the status page that one of the current goals is to
improve upload handling. I am curious what exactly needs to be
improved, and what it will take to do that. I'm not sure if it's
something I can help fix but would like to be involved some how and
really like how ebb_rails is working for a site I'm running right now.
So ... a) what's the problem? b) what's the solution, and status?

Thanks,
Adam

ry

unread,
Apr 1, 2008, 6:34:46 AM4/1/08
to ebbebb
Hi Adam,

Currently each socket connection has a statically allocated buffer of
about 100 KB (defined in ebb.h) to handle requests. If it turns out
that the request is larger than this, Ebb starts a tempfile and
redirects the output from the socket into the tempfile. After the
upload is complete Ebb passes control over to Ruby. Ruby can use
ebb_client_read to get the request body (be it stored in a tempfile or
in the memory buffer).

When Ebb reads data only into memory it works well, with the usual
performance increase over the other clients. However when it writes to
a tempfile, it is not better than the other Ruby servers. You can see
that in this benchmark where I've set the memory buffer to be only
40kb
http://s3.amazonaws.com/four.livejournal/20080227/post_size.png
The file upload stuff was written rather hastily and I suspect that
with some more thought this could be done better. Perhaps by
allocating blocks of memory to read into instead of into tempfiles
for, say, requests smaller than 100MB.

Any help in this area would be welcome :)

ry

carmen

unread,
Apr 1, 2008, 12:22:48 PM4/1/08
to ebbebb
> The file upload stuff was written rather hastily and I suspect that
> with some more thought this could be done better. Perhaps by
> allocating blocks of memory to read into instead of into tempfiles
> for, say, requests smaller than 100MB.

99% of the time my framework returns a file handle. which ends up
writing to the socket in Ruby if an ETAG doesnt match.

would it be worth writing that part - in C?

i know rtorrent makes a big deal about 'transfering directly from file
pages to the network stack'. ime it can handle 1200+ torrents and 512+
connections at about 1% CPU and never breaking 100 MB of RAM


going to do some benchmarks/testing eventually, just hasnt been a
priority and curious if youve tested this already

ry

unread,
Apr 1, 2008, 12:54:41 PM4/1/08
to ebbebb
You probably don't want Ebb serving static files - it's faster to have
the front-end web server do that.
However, I would accept a patch for streaming an open file
descriptor :)

ry

Adam Elhardt

unread,
Apr 1, 2008, 1:12:04 PM4/1/08
to ebb...@googlegroups.com
Thanks for the speed reply.  I downloaded the source and have been staring at it for an hour or so.

It seems that EBB_BUFFERSIZE controls the behavior of two unrelated things: 

a) the buffer size for reads.  This buffer must be set large enough that it can't be filled before the next time libev calls on_client_readable, correct?  If the buffer gets filled all the way (client->read == EBB_BUFFERSIZE) you throw an error, because we've probably lost some data that didn't fit in the buffer.

b) Cut-off for mem versus disk. Once the request is finished, EBB_BUFFERSIZE controls which requests get saved to disk.  In this case, anything larger than 40KB gets tossed in a thread which saves it to disk. 

Please correct me on any errors, but it seems like even large uploads aren't written to disk until they've been completely stored into memory(or swap?).  I say this because read_body_into_file is only called after client_finished_parsing==true and after we have a total_request_size.  Is that assessment correct, or am I mis-reading your code?

carmen

unread,
Apr 1, 2008, 1:37:58 PM4/1/08
to ebb...@googlegroups.com
On Tue Apr 01, 2008 at 09:54:41AM -0700, ry wrote:
>
> You probably don't want Ebb serving static files - it's faster to have
> the front-end web server do that.

perhaps if one doesnt factor in a few things

- extra webserver to compile/configure/break
- diff on each (afaik:)
* lighty its X-Sendfile
* nginx its X-Accel-Redirect
* apache requires some 3rd party module
- the essence of what HTTP is is sending a file over a socket ( see http://www.ics.uci.edu/~rohit/IEEE-L7-http-gopher.html )

with Mongrel or Ebb loading a page with hundreds of image thumbnails on localhost into Dillo only takes leke half a second.

its faster than Nautilus or Konqueror. the FireFly directory browsing extension for Firefox is particularly horrid, it 'bypasses' http for something much slower

and i havent even bothered to chunk the reads/writes. so i guess a patch maybe isnt even necessary.

sorry for this off-topic distraction (also testing new msmtp install)

c

> However, I would accept a patch for streaming an open file
> descriptor :)

noted, thanks

ry

unread,
Apr 1, 2008, 1:45:50 PM4/1/08
to ebbebb
Hi Adam,

> It seems that EBB_BUFFERSIZE controls the behavior of two unrelated things:

That's correct. The initial continuous block of memory that the
request header is stored on. ebb sends pointers to each header field
and value which reference the buffer. Only the header needs to be on a
continuous block of memory, the body could be broken up. However Ebb
does not have support (yet) for growing this buffer - instead it
writes to file if the request is too big for it.

> a) the buffer size for reads.  This buffer must be set large enough that it
> can't be filled before the next time libev calls on_client_readable,
> correct?  If the buffer gets filled all the way (client->read ==
> EBB_BUFFERSIZE) you throw an error, because we've probably lost some data
> that didn't fit in the buffer.

No. Ebb tries to get the header onto the buffer. Once the header is
on, it checks the content-length - if there is enough room it puts the
rest of the request onto the buffer, otherwise spawns a thread and
writes the whole request to file. It will error out if the header is
larger than EBB_BUFFERSIZE. However, the buffer is sufficiently large
that this shouldn't happen for any normal request.

> b) Cut-off for mem versus disk. Once the request is finished, EBB_BUFFERSIZE
> controls which requests get saved to disk.  In this case, anything larger
> than 40KB gets tossed in a thread which saves it to disk.

Once the header is read, and Ebb determines the content-length +
header_size > EBB_BUFFERSIZE it
1) spawns a thread
2) opens a tempfile
3) writes the part of the request body that was already in client-
>request_buffer to the file
4) reads blocks from the socket and writes them to the tempfile.


> Please correct me on any errors, but it seems like even large uploads aren't
> written to disk until they've been completely stored into memory(or swap?).

No, that wouldn't be good. Large requests are written as they come in.

> I say this because read_body_into_file is only called after
> client_finished_parsing==true and after we have a total_request_size.  Is
> that assessment correct, or am I mis-reading your code?

the parser only handles the header, not the body.

The obvious improvement that can be done is allowing larger memory
uploads. I haven't done this yet because of the all the memory
management that will be required. Perhaps an easy little buffer
library could be made with the aid of glib to allow for this
improvement?

ry

Adam Elhardt

unread,
Apr 1, 2008, 1:53:04 PM4/1/08
to ebb...@googlegroups.com
Thanks for your quick reply .. that helped me make a lot more sense of code that I'm still very unfamiliar with.

ry

unread,
Apr 1, 2008, 1:54:40 PM4/1/08
to ebbebb
> perhaps if one doesnt factor in a few things

Ebb's goal is not to be a full featured web server. It's only a small
bit of code to prop up an application server. Ebb would be used in
multiplicity behind some front-end which would handle things like
decoding SSL and serving static files.

> - diff on each (afaik:)
>  * lighty its X-Sendfile
>  * nginx its X-Accel-Redirect
>  * apache requires some 3rd party module

This should be handled by the framework. But 99% of the time you
wouldn't be using a sendfile anyway. Your front-end would be
configured to check a document root before passing off the request to
Ebb. That is, Ebb would never see a request for a thumbnail - only for
dynamic content.

Same as Mongrel - just in written in C.

ry

ry

unread,
Apr 1, 2008, 4:49:44 PM4/1/08
to ebbebb
Another note: I'd like to get rid of the upload pthread altogether and
use the libev loop instead. Or at the very least dispatch() shouldn't
be called from a separate thread than the event loop. The client code
should be expect that all callbacks are executed while in the event
loop.

ry
Reply all
Reply to author
Forward
0 new messages