Implementing file upload progress bar with PSGI (related to "POST requests with a large body")

215 views
Skip to first unread message

Zbigniew Lukasiak

unread,
Aug 23, 2010, 2:53:06 AM8/23/10
to psgi-...@googlegroups.com
Hi there,

To implement a file upload progress bar, in an environment where we
normally use Mason, I had to use a classic CGI script so that I could
read the PATH_INFO and Content-Length headers even before the whole
request was received. I don't pretend to having created any kind of
optimal solution - but all the solutions that I've seen were working
in a similar way - there is a bare CGI or PHP script that saves the
file and the value of Content-Length - and then Ajax callbacks
compares the current size of the saved file with the Content-Length
header and computes the progress (it assumes that the file is much
bigger than the headers and that the percentage does not need to be
exact). There is also a need to pass the file name into that CGI
script - so that the file is saved into a location known to the Ajax
request I did that with PATH_INFO. In other words the file saving
script needs to know the values of some HTTP headers even before it
received all the query parameters - this is similar to the streaming
discussed in the "POST requests with a large body" thread. I've
described my solution at
http://perlalchemy.blogspot.com/2010/06/implementing-file-upload-progress-bar.html

Has anyone devised a way to do that in pure PSGI environment?

--
Zbigniew Lukasiak
http://brudnopis.blogspot.com/
http://perlalchemy.blogspot.com/

jbjbjb

unread,
Oct 13, 2010, 12:39:00 AM10/13/10
to psgi-plack
Hi,
I did this today via hacking Starman Server.pm to hook into the
sysread() loop which goes around once per TCP data packet, typically.
It was quite an easy change and then the ajax upload progress checker
can get the info it needs. I wrote the status to a file in /tmp named
according to a query_string parameter value, so that the ajax call
(which gets back JSON data) can get the right status if there are two
or more posts going on simultaneously.

(by the way, and off topic to this topic. Plack is awesome and in two
days I tossed out an entire modperl+apache cruft mountain and re-homed
a fairly complex app under psgi and Plack::Request and Starman.
Performance is double in about every dimension (cpu usage, memory
footprint, etc)).

On Aug 23, 5:53 pm, Zbigniew Lukasiak <zzb...@gmail.com> wrote:
> Hi there,
>
> To implement a file upload progress bar, in an environment where we
> normally use Mason, I had to use a classic CGI script so that I could
> read the PATH_INFO and Content-Length headers even before the whole
> request was received.  I don't pretend to having created any kind of
> optimal solution - but all the solutions that I've seen were working
> in a similar way - there is a bare CGI or PHP script that saves the
> file and the value of Content-Length - and then Ajax callbacks
> compares the current size of the saved file with the Content-Length
> header and computes the progress (it assumes that the file is much
> bigger than the headers and that the percentage does not need to be
> exact).   There is also a need to pass the file name into that CGI
> script - so that the file is saved into a location known to the Ajax
> request I did that with PATH_INFO.  In other words the file saving
> script needs to know the values of some HTTP headers even before it
> received all the query parameters - this is similar to the streaming
> discussed  in the "POST requests with a large body" thread.  I've
> described my solution athttp://perlalchemy.blogspot.com/2010/06/implementing-file-upload-prog...

jbjbjb

unread,
Oct 12, 2010, 8:54:31 PM10/12/10
to psgi-plack
I just hacked this into Starman because it was the only barrier to
deleting apache and modperl on my backend server and I needed a quick
solution, and things were running so well on starman.

In starman Server.pm there is the read loop, and typically it reads
one POST tcp data payload packet at a time In there, I write out the
unique progress to a /tmp file named by a unique code (given as a
query arg for the POST request) now and again in that loop. Then the
uploadprogress ajax call can get the status from a handler (giving the
same code) that looks for the file with that unique code and returns
it in JSON format. It is horrible but it works and now I've been able
to switch over from apache+modperl to starman+psgi .. twice the speed,
half the memory, so much less configuration cruft and the Debug panels
are awesome!

But in general I wish any new web server implemented upload progress
hooks by default it is vital.

On Aug 23, 5:53 pm, Zbigniew Lukasiak <zzb...@gmail.com> wrote:
> Hi there,
>
> To implement a file upload progress bar, in an environment where we
> normally use Mason, I had to use a classic CGI script so that I could
> read the PATH_INFO and Content-Length headers even before the whole
> request was received.  I don't pretend to having created any kind of
> optimal solution - but all the solutions that I've seen were working
> in a similar way - there is a bare CGI or PHP script that saves the
> file and the value of Content-Length - and then Ajax callbacks
> compares the current size of the saved file with the Content-Length
> header and computes the progress (it assumes that the file is much
> bigger than the headers and that the percentage does not need to be
> exact).   There is also a need to pass the file name into that CGI
> script - so that the file is saved into a location known to the Ajax
> request I did that with PATH_INFO.  In other words the file saving
> script needs to know the values of some HTTP headers even before it
> received all the query parameters - this is similar to the streaming
> discussed  in the "POST requests with a large body" thread.  I've
> described my solution athttp://perlalchemy.blogspot.com/2010/06/implementing-file-upload-prog...

jbjbjb

unread,
Oct 13, 2010, 12:40:01 AM10/13/10
to psgi-plack
(sorry for posting twice - now three times! - google groups appears to
be hanging onto posts and not showing them anywhere so I reposted).

On Aug 23, 5:53 pm, Zbigniew Lukasiak <zzb...@gmail.com> wrote:
> Hi there,
>
> To implement a file upload progress bar, in an environment where we
> normally use Mason, I had to use a classic CGI script so that I could
> read the PATH_INFO and Content-Length headers even before the whole
> request was received.  I don't pretend to having created any kind of
> optimal solution - but all the solutions that I've seen were working
> in a similar way - there is a bare CGI or PHP script that saves the
> file and the value of Content-Length - and then Ajax callbacks
> compares the current size of the saved file with the Content-Length
> header and computes the progress (it assumes that the file is much
> bigger than the headers and that the percentage does not need to be
> exact).   There is also a need to pass the file name into that CGI
> script - so that the file is saved into a location known to the Ajax
> request I did that with PATH_INFO.  In other words the file saving
> script needs to know the values of some HTTP headers even before it
> received all the query parameters - this is similar to the streaming
> discussed  in the "POST requests with a large body" thread.  I've
> described my solution athttp://perlalchemy.blogspot.com/2010/06/implementing-file-upload-prog...

Zbigniew Lukasiak

unread,
Oct 13, 2010, 3:03:22 AM10/13/10
to psgi-...@googlegroups.com
On Wed, Oct 13, 2010 at 6:39 AM, jbjbjb <justi...@gmail.com> wrote:
> Hi,
> I did this today via hacking Starman Server.pm to hook into the
> sysread() loop which goes around once per TCP data packet, typically.
> It was quite an easy change and then the ajax upload progress checker
> can get the info it needs. I wrote the status to a file in /tmp named
> according to a query_string parameter value, so that the ajax call
> (which gets back JSON data) can get the right status if there are two
> or more posts going on simultaneously.

How do you get the value of a query_string in the sysread() loop?
This could be better then using the PATH_INFO as I suggested in my
post - but this is complicated by the fact that you need to parse the
query to get it and the 'normal' way to do that with Plack::Request
would not work here.

--
Zbigniew

jbjbjb

unread,
Oct 13, 2010, 5:37:45 PM10/13/10
to psgi-plack
In Server.pm even at this point $env, and the query string is already
available!
$env->{QUERY_STRING}

So my POSTs have ?id=(some random string) as the target URL,
and that is the handle to use in the Ajax call to get interim status.

Don't write the status out every time around that loop (non chunked),
it is a waste, write it out every 16k milestone or something.

It would be better to use Memcached or some IPC to share this info,
but making the server depend on memcache seems like a waste so
I just use a temporary file.

By the way, does anyone know of a front end proxy that handles
this stuff before passing it on? so that your Plack back-end memory
footprint is not tied up watching a long & super slow POST?

On Oct 13, 6:03 pm, Zbigniew Lukasiak <zzb...@gmail.com> wrote:

Tatsuhiko Miyagawa

unread,
Oct 13, 2010, 5:41:23 PM10/13/10
to psgi-...@googlegroups.com
On Thu, Oct 14, 2010 at 6:37 AM, jbjbjb <justi...@gmail.com> wrote:

> By the way, does anyone know of a front end proxy that handles
> this stuff before passing it on? so that your Plack back-end memory
> footprint is not tied up watching a long & super slow POST?

nginx http://wiki.nginx.org/NginxHttpUploadProgressModule
perlbal http://brad.livejournal.com/2171184.html

There should be a way to do the same with lighttpd or mod_proxy.

> On Oct 13, 6:03 pm, Zbigniew Lukasiak <zzb...@gmail.com> wrote:
>> On Wed, Oct 13, 2010 at 6:39 AM, jbjbjb <justinbe...@gmail.com> wrote:
>> > Hi,
>> > I did this today via hacking Starman Server.pm to hook into the
>> > sysread() loop which goes around once per TCP data packet, typically.
>> > It was quite an easy change and then the ajax upload progress checker
>> > can get the info it needs. I wrote the status to a file in /tmp named
>> > according to a query_string parameter value, so that the ajax call
>> > (which gets back JSON data) can get the right status if there are two
>> > or more posts going on simultaneously.
>>
>> How do you get the value of a query_string in the sysread() loop?
>> This could be better then using the PATH_INFO as I suggested in my
>> post - but this is complicated by the fact that you need to parse the
>> query to get it and the 'normal' way to do that with Plack::Request
>> would not work here.
>>
>> --
>> Zbigniew

--
Tatsuhiko Miyagawa

jbjbjb

unread,
Oct 13, 2010, 6:32:34 PM10/13/10
to psgi-plack
answering myself..

if you setup nginx as a front end with HttpUploadProgressModule, then
the problem is solved there, instead of bothering a psgi compliant
back-end app server.

I just compiled up nginx default setup, with the progressmodule, and a
few lines of extra config and now it proxies for Starman but handles
the entire POST operation including status queries before waking up
the back end.
Reply all
Reply to author
Forward
0 new messages