Hi, I read on the status page that one of the current goals is to
improve upload handling. I am curious what exactly needs to be
improved, and what it will take to do that. I'm not sure if it's
something I can help fix but would like to be involved some how and
really like how ebb_rails is working for a site I'm running right now.
So ... a) what's the problem? b) what's the solution, and status?
Currently each socket connection has a statically allocated buffer of
about 100 KB (defined in ebb.h) to handle requests. If it turns out
that the request is larger than this, Ebb starts a tempfile and
redirects the output from the socket into the tempfile. After the
upload is complete Ebb passes control over to Ruby. Ruby can use
ebb_client_read to get the request body (be it stored in a tempfile or
in the memory buffer).
When Ebb reads data only into memory it works well, with the usual
performance increase over the other clients. However when it writes to
a tempfile, it is not better than the other Ruby servers. You can see
that in this benchmark where I've set the memory buffer to be only
40kb
http://s3.amazonaws.com/four.livejournal/20080227/post_size.png The file upload stuff was written rather hastily and I suspect that
with some more thought this could be done better. Perhaps by
allocating blocks of memory to read into instead of into tempfiles
for, say, requests smaller than 100MB.
Any help in this area would be welcome :)
ry
On Apr 1, 4:03 am, Adam <adam.elha...@gmail.com> wrote:
> Hi, I read on the status page that one of the current goals is to
> improve upload handling. I am curious what exactly needs to be
> improved, and what it will take to do that. I'm not sure if it's
> something I can help fix but would like to be involved some how and
> really like how ebb_rails is working for a site I'm running right now.
> So ... a) what's the problem? b) what's the solution, and status?
> The file upload stuff was written rather hastily and I suspect that
> with some more thought this could be done better. Perhaps by
> allocating blocks of memory to read into instead of into tempfiles
> for, say, requests smaller than 100MB.
99% of the time my framework returns a file handle. which ends up
writing to the socket in Ruby if an ETAG doesnt match.
would it be worth writing that part - in C?
i know rtorrent makes a big deal about 'transfering directly from file
pages to the network stack'. ime it can handle 1200+ torrents and 512+
connections at about 1% CPU and never breaking 100 MB of RAM
going to do some benchmarks/testing eventually, just hasnt been a
priority and curious if youve tested this already
You probably don't want Ebb serving static files - it's faster to have
the front-end web server do that.
However, I would accept a patch for streaming an open file
descriptor :)
ry
On Apr 1, 6:22 pm, carmen <_...@whats-your.name> wrote:
> > The file upload stuff was written rather hastily and I suspect that
> > with some more thought this could be done better. Perhaps by
> > allocating blocks of memory to read into instead of into tempfiles
> > for, say, requests smaller than 100MB.
> 99% of the time my framework returns a file handle. which ends up
> writing to the socket in Ruby if an ETAG doesnt match.
> would it be worth writing that part - in C?
> i know rtorrent makes a big deal about 'transfering directly from file
> pages to the network stack'. ime it can handle 1200+ torrents and 512+
> connections at about 1% CPU and never breaking 100 MB of RAM
> going to do some benchmarks/testing eventually, just hasnt been a
> priority and curious if youve tested this already
Thanks for the speed reply. I downloaded the source and have been staring at it for an hour or so.
It seems that EBB_BUFFERSIZE controls the behavior of two unrelated things:
a) the buffer size for reads. This buffer must be set large enough that it can't be filled before the next time libev calls on_client_readable, correct? If the buffer gets filled all the way (client->read == EBB_BUFFERSIZE) you throw an error, because we've probably lost some data that didn't fit in the buffer.
b) Cut-off for mem versus disk. Once the request is finished, EBB_BUFFERSIZE controls which requests get saved to disk. In this case, anything larger than 40KB gets tossed in a thread which saves it to disk.
Please correct me on any errors, but it seems like even large uploads aren't written to disk until they've been completely stored into memory(or swap?). I say this because read_body_into_file is only called after client_finished_parsing==true and after we have a total_request_size. Is that assessment correct, or am I mis-reading your code?
On Tue, Apr 1, 2008 at 6:34 AM, ry <ry.d...@googlemail.com> wrote:
> Hi Adam,
> Currently each socket connection has a statically allocated buffer of > about 100 KB (defined in ebb.h) to handle requests. If it turns out > that the request is larger than this, Ebb starts a tempfile and > redirects the output from the socket into the tempfile. After the > upload is complete Ebb passes control over to Ruby. Ruby can use > ebb_client_read to get the request body (be it stored in a tempfile or > in the memory buffer).
> When Ebb reads data only into memory it works well, with the usual > performance increase over the other clients. However when it writes to > a tempfile, it is not better than the other Ruby servers. You can see > that in this benchmark where I've set the memory buffer to be only > 40kb > http://s3.amazonaws.com/four.livejournal/20080227/post_size.png > The file upload stuff was written rather hastily and I suspect that > with some more thought this could be done better. Perhaps by > allocating blocks of memory to read into instead of into tempfiles > for, say, requests smaller than 100MB.
> Any help in this area would be welcome :)
> ry
> On Apr 1, 4:03 am, Adam <adam.elha...@gmail.com> wrote: > > Hi, I read on the status page that one of the current goals is to > > improve upload handling. I am curious what exactly needs to be > > improved, and what it will take to do that. I'm not sure if it's > > something I can help fix but would like to be involved some how and > > really like how ebb_rails is working for a site I'm running right now. > > So ... a) what's the problem? b) what's the solution, and status?
On Tue Apr 01, 2008 at 09:54:41AM -0700, ry wrote:
> You probably don't want Ebb serving static files - it's faster to have > the front-end web server do that.
perhaps if one doesnt factor in a few things
- extra webserver to compile/configure/break - diff on each (afaik:) * lighty its X-Sendfile * nginx its X-Accel-Redirect * apache requires some 3rd party module - the essence of what HTTP is is sending a file over a socket ( see http://www.ics.uci.edu/~rohit/IEEE-L7-http-gopher.html )
with Mongrel or Ebb loading a page with hundreds of image thumbnails on localhost into Dillo only takes leke half a second.
its faster than Nautilus or Konqueror. the FireFly directory browsing extension for Firefox is particularly horrid, it 'bypasses' http for something much slower
and i havent even bothered to chunk the reads/writes. so i guess a patch maybe isnt even necessary.
sorry for this off-topic distraction (also testing new msmtp install)
c
> However, I would accept a patch for streaming an open file > descriptor :)
> It seems that EBB_BUFFERSIZE controls the behavior of two unrelated things:
That's correct. The initial continuous block of memory that the
request header is stored on. ebb sends pointers to each header field
and value which reference the buffer. Only the header needs to be on a
continuous block of memory, the body could be broken up. However Ebb
does not have support (yet) for growing this buffer - instead it
writes to file if the request is too big for it.
> a) the buffer size for reads. This buffer must be set large enough that it
> can't be filled before the next time libev calls on_client_readable,
> correct? If the buffer gets filled all the way (client->read ==
> EBB_BUFFERSIZE) you throw an error, because we've probably lost some data
> that didn't fit in the buffer.
No. Ebb tries to get the header onto the buffer. Once the header is
on, it checks the content-length - if there is enough room it puts the
rest of the request onto the buffer, otherwise spawns a thread and
writes the whole request to file. It will error out if the header is
larger than EBB_BUFFERSIZE. However, the buffer is sufficiently large
that this shouldn't happen for any normal request.
> b) Cut-off for mem versus disk. Once the request is finished, EBB_BUFFERSIZE
> controls which requests get saved to disk. In this case, anything larger
> than 40KB gets tossed in a thread which saves it to disk.
Once the header is read, and Ebb determines the content-length +
header_size > EBB_BUFFERSIZE it
1) spawns a thread
2) opens a tempfile
3) writes the part of the request body that was already in client-
>request_buffer to the file
4) reads blocks from the socket and writes them to the tempfile.
> Please correct me on any errors, but it seems like even large uploads aren't
> written to disk until they've been completely stored into memory(or swap?).
No, that wouldn't be good. Large requests are written as they come in.
> I say this because read_body_into_file is only called after
> client_finished_parsing==true and after we have a total_request_size. Is
> that assessment correct, or am I mis-reading your code?
the parser only handles the header, not the body.
The obvious improvement that can be done is allowing larger memory
uploads. I haven't done this yet because of the all the memory
management that will be required. Perhaps an easy little buffer
library could be made with the aid of glib to allow for this
improvement?
On Tue, Apr 1, 2008 at 1:45 PM, ry <ry.d...@googlemail.com> wrote:
> Hi Adam,
> > It seems that EBB_BUFFERSIZE controls the behavior of two unrelated > things:
> That's correct. The initial continuous block of memory that the > request header is stored on. ebb sends pointers to each header field > and value which reference the buffer. Only the header needs to be on a > continuous block of memory, the body could be broken up. However Ebb > does not have support (yet) for growing this buffer - instead it > writes to file if the request is too big for it.
> > a) the buffer size for reads. This buffer must be set large enough that > it > > can't be filled before the next time libev calls on_client_readable, > > correct? If the buffer gets filled all the way (client->read == > > EBB_BUFFERSIZE) you throw an error, because we've probably lost some > data > > that didn't fit in the buffer.
> No. Ebb tries to get the header onto the buffer. Once the header is > on, it checks the content-length - if there is enough room it puts the > rest of the request onto the buffer, otherwise spawns a thread and > writes the whole request to file. It will error out if the header is > larger than EBB_BUFFERSIZE. However, the buffer is sufficiently large > that this shouldn't happen for any normal request.
> > b) Cut-off for mem versus disk. Once the request is finished, > EBB_BUFFERSIZE > > controls which requests get saved to disk. In this case, anything > larger > > than 40KB gets tossed in a thread which saves it to disk.
> Once the header is read, and Ebb determines the content-length + > header_size > EBB_BUFFERSIZE it > 1) spawns a thread > 2) opens a tempfile > 3) writes the part of the request body that was already in client- > >request_buffer to the file > 4) reads blocks from the socket and writes them to the tempfile.
> > Please correct me on any errors, but it seems like even large uploads > aren't > > written to disk until they've been completely stored into memory(or > swap?).
> No, that wouldn't be good. Large requests are written as they come in.
> > I say this because read_body_into_file is only called after > > client_finished_parsing==true and after we have a total_request_size. > Is > > that assessment correct, or am I mis-reading your code?
> the parser only handles the header, not the body.
> The obvious improvement that can be done is allowing larger memory > uploads. I haven't done this yet because of the all the memory > management that will be required. Perhaps an easy little buffer > library could be made with the aid of glib to allow for this > improvement?
Ebb's goal is not to be a full featured web server. It's only a small
bit of code to prop up an application server. Ebb would be used in
multiplicity behind some front-end which would handle things like
decoding SSL and serving static files.
> - diff on each (afaik:)
> * lighty its X-Sendfile
> * nginx its X-Accel-Redirect
> * apache requires some 3rd party module
This should be handled by the framework. But 99% of the time you
wouldn't be using a sendfile anyway. Your front-end would be
configured to check a document root before passing off the request to
Ebb. That is, Ebb would never see a request for a thumbnail - only for
dynamic content.
Another note: I'd like to get rid of the upload pthread altogether and
use the libev loop instead. Or at the very least dispatch() shouldn't
be called from a separate thread than the event loop. The client code
should be expect that all callbacks are executed while in the event
loop.