Google Groups Home
Help | Sign in
upload handling
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Adam  
View profile
 More options Mar 31, 10:03 pm
From: Adam <adam.elha...@gmail.com>
Date: Mon, 31 Mar 2008 19:03:07 -0700 (PDT)
Local: Mon, Mar 31 2008 10:03 pm
Subject: upload handling
Hi, I read on the status page that one of the current goals is to
improve upload handling.  I am curious what exactly needs to be
improved, and what it will take to do that.  I'm not sure if it's
something I can help fix but would like to be involved some how and
really like how ebb_rails is working for a site I'm running right now.
So ... a)  what's the problem?  b)  what's the solution, and status?

Thanks,
Adam


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ry  
View profile
 More options Apr 1, 6:34 am
From: ry <ry.d...@googlemail.com>
Date: Tue, 1 Apr 2008 03:34:46 -0700 (PDT)
Local: Tues, Apr 1 2008 6:34 am
Subject: Re: upload handling
Hi Adam,

Currently each socket connection has a statically allocated buffer of
about 100 KB (defined in ebb.h) to handle requests. If it turns out
that the request is larger than this, Ebb starts a tempfile and
redirects the output from the socket into the tempfile. After the
upload is complete Ebb passes control over to Ruby. Ruby can use
ebb_client_read to get the request body (be it stored in a tempfile or
in the memory buffer).

When Ebb reads data only into memory it works well, with the usual
performance increase over the other clients. However when it writes to
a tempfile, it is not better than the other Ruby servers. You can see
that in this benchmark where I've set the memory buffer to be only
40kb
http://s3.amazonaws.com/four.livejournal/20080227/post_size.png
The file upload stuff was written rather hastily and I suspect that
with some more thought this could be done better. Perhaps by
allocating blocks of memory to read into instead of into tempfiles
for, say, requests smaller than 100MB.

Any help in this area would be welcome :)

ry

On Apr 1, 4:03 am, Adam <adam.elha...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "download handling" by carmen
carmen  
View profile
 More options Apr 1, 12:22 pm
From: carmen <_...@whats-your.name>
Date: Tue, 1 Apr 2008 09:22:48 -0700 (PDT)
Local: Tues, Apr 1 2008 12:22 pm
Subject: Re: download handling

> The file upload stuff was written rather hastily and I suspect that
> with some more thought this could be done better. Perhaps by
> allocating blocks of memory to read into instead of into tempfiles
> for, say, requests smaller than 100MB.

99% of the time my framework returns a file handle. which ends up
writing to the socket in Ruby if an ETAG doesnt match.

would it be worth writing that part - in C?

i know rtorrent makes a big deal about 'transfering directly from file
pages to the network stack'. ime it can handle 1200+ torrents and 512+
connections at about 1% CPU and never breaking 100 MB of RAM

going to do some benchmarks/testing eventually, just hasnt been a
priority and curious if youve tested this already


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ry  
View profile
 More options Apr 1, 12:54 pm
From: ry <ry.d...@googlemail.com>
Date: Tue, 1 Apr 2008 09:54:41 -0700 (PDT)
Local: Tues, Apr 1 2008 12:54 pm
Subject: Re: download handling
You probably don't want Ebb serving static files - it's faster to have
the front-end web server do that.
However, I would accept a patch for streaming an open file
descriptor :)

ry

On Apr 1, 6:22 pm, carmen <_...@whats-your.name> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "upload handling" by Adam Elhardt
Adam Elhardt  
View profile
 More options Apr 1, 1:12 pm
From: "Adam Elhardt" <adam.elha...@gmail.com>
Date: Tue, 1 Apr 2008 13:12:04 -0400
Local: Tues, Apr 1 2008 1:12 pm
Subject: Re: [ebb] Re: upload handling

Thanks for the speed reply.  I downloaded the source and have been staring
at it for an hour or so.

It seems that EBB_BUFFERSIZE controls the behavior of two unrelated things:

a) the buffer size for reads.  This buffer must be set large enough that it
can't be filled before the next time libev calls on_client_readable,
correct?  If the buffer gets filled all the way (client->read ==
EBB_BUFFERSIZE) you throw an error, because we've probably lost some data
that didn't fit in the buffer.

b) Cut-off for mem versus disk. Once the request is finished, EBB_BUFFERSIZE
controls which requests get saved to disk.  In this case, anything larger
than 40KB gets tossed in a thread which saves it to disk.

Please correct me on any errors, but it seems like even large uploads aren't
written to disk until they've been completely stored into memory(or swap?).
I say this because read_body_into_file is only called after
client_finished_parsing==true and after we have a total_request_size.  Is
that assessment correct, or am I mis-reading your code?


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "download handling" by carmen
carmen  
View profile
 More options Apr 1, 1:37 pm
From: carmen <_...@whats-your.name>
Date: Tue, 1 Apr 2008 13:37:58 -0400
Local: Tues, Apr 1 2008 1:37 pm
Subject: Re: [ebb] Re: download handling
On Tue Apr 01, 2008 at 09:54:41AM -0700, ry wrote:

> You probably don't want Ebb serving static files - it's faster to have
> the front-end web server do that.

perhaps if one doesnt factor in a few things

- extra webserver to compile/configure/break
- diff on each (afaik:)
 * lighty its X-Sendfile
 * nginx its X-Accel-Redirect
 * apache requires some 3rd party module
- the essence of what HTTP is is sending a file over a socket ( see http://www.ics.uci.edu/~rohit/IEEE-L7-http-gopher.html )

with Mongrel or Ebb loading a page with hundreds of image thumbnails on localhost into Dillo only takes leke half a second.

its faster than Nautilus or Konqueror. the FireFly directory browsing extension for Firefox is particularly horrid, it 'bypasses' http for something much slower

and i havent even bothered to chunk the reads/writes. so i guess a patch maybe isnt even necessary.

sorry for this off-topic distraction (also testing new msmtp install)

c

> However, I would accept a patch for streaming an open file
> descriptor :)

noted, thanks

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "upload handling" by ry
ry  
View profile
 More options Apr 1, 1:45 pm
From: ry <ry.d...@googlemail.com>
Date: Tue, 1 Apr 2008 10:45:50 -0700 (PDT)
Local: Tues, Apr 1 2008 1:45 pm
Subject: Re: upload handling
Hi Adam,

> It seems that EBB_BUFFERSIZE controls the behavior of two unrelated things:

That's correct. The initial continuous block of memory that the
request header is stored on. ebb sends pointers to each header field
and value which reference the buffer. Only the header needs to be on a
continuous block of memory, the body could be broken up. However Ebb
does not have support (yet) for growing this buffer - instead it
writes to file if the request is too big for it.

> a) the buffer size for reads.  This buffer must be set large enough that it
> can't be filled before the next time libev calls on_client_readable,
> correct?  If the buffer gets filled all the way (client->read ==
> EBB_BUFFERSIZE) you throw an error, because we've probably lost some data
> that didn't fit in the buffer.

No. Ebb tries to get the header onto the buffer. Once the header is
on, it checks the content-length - if there is enough room it puts the
rest of the request onto the buffer, otherwise spawns a thread and
writes the whole request to file. It will error out if the header is
larger than EBB_BUFFERSIZE. However, the buffer is sufficiently large
that this shouldn't happen for any normal request.

> b) Cut-off for mem versus disk. Once the request is finished, EBB_BUFFERSIZE
> controls which requests get saved to disk.  In this case, anything larger
> than 40KB gets tossed in a thread which saves it to disk.

Once the header is read, and Ebb determines the content-length +
header_size > EBB_BUFFERSIZE it
1) spawns a thread
2) opens a tempfile
3) writes the part of the request body that was already in client-
>request_buffer to the file

4) reads blocks from the socket and writes them to the tempfile.

> Please correct me on any errors, but it seems like even large uploads aren't
> written to disk until they've been completely stored into memory(or swap?).

No, that wouldn't be good. Large requests are written as they come in.

> I say this because read_body_into_file is only called after
> client_finished_parsing==true and after we have a total_request_size.  Is
> that assessment correct, or am I mis-reading your code?

the parser only handles the header, not the body.

The obvious improvement that can be done is allowing larger memory
uploads. I haven't done this yet because of the all the memory
management that will be required. Perhaps an easy little buffer
library could be made with the aid of glib to allow for this
improvement?

ry


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Adam Elhardt  
View profile
 More options Apr 1, 1:53 pm
From: "Adam Elhardt" <adam.elha...@gmail.com>
Date: Tue, 1 Apr 2008 13:53:04 -0400
Local: Tues, Apr 1 2008 1:53 pm
Subject: Re: [ebb] Re: upload handling

Thanks for your quick reply .. that helped me make a lot more sense of code
that I'm still very unfamiliar with.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "download handling" by ry
ry  
View profile
 More options Apr 1, 1:54 pm
From: ry <ry.d...@googlemail.com>
Date: Tue, 1 Apr 2008 10:54:40 -0700 (PDT)
Local: Tues, Apr 1 2008 1:54 pm
Subject: Re: download handling

> perhaps if one doesnt factor in a few things

Ebb's goal is not to be a full featured web server. It's only a small
bit of code to prop up an application server. Ebb would be used in
multiplicity behind some front-end which would handle things like
decoding SSL and serving static files.

> - diff on each (afaik:)
>  * lighty its X-Sendfile
>  * nginx its X-Accel-Redirect
>  * apache requires some 3rd party module

This should be handled by the framework. But 99% of the time you
wouldn't be using a sendfile anyway. Your front-end would be
configured to check a document root before passing off the request to
Ebb. That is, Ebb would never see a request for a thumbnail - only for
dynamic content.

Same as Mongrel - just in written in C.

ry


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "upload handling" by ry
ry  
View profile
 More options Apr 1, 4:49 pm
From: ry <ry.d...@googlemail.com>
Date: Tue, 1 Apr 2008 13:49:44 -0700 (PDT)
Local: Tues, Apr 1 2008 4:49 pm
Subject: Re: upload handling
Another note: I'd like to get rid of the upload pthread altogether and
use the libev loop instead. Or at the very least dispatch() shouldn't
be called from a separate thread than the event loop. The client code
should be expect that all callbacks are executed while in the event
loop.

ry


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google