Quick CGI question (specific to the CGI package)

Ted Byers

unread,

Nov 25, 2009, 4:33:37 PM11/25/09

to

I am using CGI, and have been able to do most of the things I need to
do, until now.

With package CGI (and my question is specific to what is in that
package and what I might have to do beyond what it intrinsicly
supports), the documentation beginning with the title "CREATING A
STANDARD HTTP HEADER" gives, among other examples, the following
example:

print header('image/gif');

From this I believed that I could write something like:

print $query->header('video/$format');
open(FIN,"<","$fname");
binmode(FIN);
binmode STDOUT;
my $fcontent;
read FIN, $fcontent, $flength;
print $fcontent;

Is this appropriate, (I have seen equivalent code on examples on the
web), or is there a way to just write the header first and then send
whatever file has the content without opening it and writing it out in
binary mode within my own code?

This works adequately if you provide something like $format='avi',
$fname to the name of whatever avi file you have, and $flength to the
size of that file.

In fact, it works OK when my browser (firefox) asks what program to
use to view the content because it doesn't know what to do with a file
with an extension of cgi. With the above code, my browser invariably
asked me what to use to view the file, and the file name it gave was
the name of the cgi script. If I told it to use Windows Media Player,
it played the content as desired.

In actuality, my script makes a video file based on request
parameters, and puts the content into a file with a name like
result.avi (or asf, or mpg, depending on the format of the component
clips).

The only way I found to get this cgi script to work as I expected was
to use redirection instead of just writing the content of the file in
binary mode. In other words, the following two lines (with NOTHING
else written to standard out) work as I expected.

my $url = "http://localhost:9080/videos/$fname";
print $query->redirect("$url",-status=>303);

Now, is there a way to tell the client that although the URL requested
pointed at my cgi script, the name of the file containing the content
is result.avi? Or do I have to resort to redirection as I have done
now (pending further insight from CGI experts out there). Or is there
some other package, other than CGI, that I ought to be examining?

Thanks

Ted

Uri Guttman

unread,

Nov 25, 2009, 5:11:24 PM11/25/09

to

>>>>> "TB" == Ted Byers <r.ted...@gmail.com> writes:

TB> From this I believed that I could write something like:

TB> print $query->header('video/$format');
TB> open(FIN,"<","$fname");

don't quote scalars like that. not needed and could cause a bug down the
line

TB> binmode(FIN);
TB> binmode STDOUT;

be consistant in your style. why parens on one and not the other? also
if on a unix platform, binmode won't matter but this makes it portable
to winblows.

TB> my $fcontent;
TB> read FIN, $fcontent, $flength;

where is $flength set? i assume you would do a -s to get the file size

TB> print $fcontent;

if you want more speed, use sysread and syswrite. if you want simpler
code, use File::Slurp and its read_file and write_file subs.

TB> Is this appropriate, (I have seen equivalent code on examples on the
TB> web), or is there a way to just write the header first and then send
TB> whatever file has the content without opening it and writing it out in
TB> binary mode within my own code?

perl has no builtin way to print a file to another handle. there may be
some OS specific ways to do it but i don't know them.

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

Ted Byers

unread,

Nov 25, 2009, 5:57:06 PM11/25/09

to

On Nov 25, 5:11 pm, "Uri Guttman" <u...@StemSystems.com> wrote:

> >>>>> "TB" == Ted Byers <r.ted.by...@gmail.com> writes:
>
> TB> From this I believed that I could write something like:
>
> TB> print $query->header('video/$format');
> TB> open(FIN,"<","$fname");
>
> don't quote scalars like that. not needed and could cause a bug down the
> line
>

OK. NB, though, that code was copied from a very quick and dirty
script used to test ideas (with code copied from various sources
including examples on the web).

> TB> binmode(FIN);
> TB> binmode STDOUT;
>
> be consistant in your style. why parens on one and not the other? also
> if on a unix platform, binmode won't matter but this makes it portable
> to winblows.
>

I know. I have to make certain I can run it on any platform the boss
may bring in, which may well include a Windows server.

> TB> my $fcontent;
> TB> read FIN, $fcontent, $flength;
>
> where is $flength set? i assume you would do a -s to get the file size
>

You assume correctly. As I said in my remarks, the key variables need
to be set before the code shown.

> TB> print $fcontent;
>
> if you want more speed, use sysread and syswrite. if you want simpler
> code, use File::Slurp and its read_file and write_file subs.
>

Good to know.

> TB> Is this appropriate, (I have seen equivalent code on examples on the
> TB> web), or is there a way to just write the header first and then send
> TB> whatever file has the content without opening it and writing it out in
> TB> binary mode within my own code?
>
> perl has no builtin way to print a file to another handle. there may be
> some OS specific ways to do it but i don't know them.
>

I would have been cleaning up the various things you mentioned as I
refined the program to be ready to deploy.

But the key problem remains. In my testing, the client browser thinks
the video file content has the cgi script as the file name. Did I
misunderstand what the CGI package documentation showed, or did I miss
something in that package that would tell the browser that the content
sent after the header is a video file? Does the CGI package have a
function that is used after the header to tell the script to send a
given file? The CGI package is huge and I may well have missed
something in it that relates to this problem. Or is there another
package that can be used with the CGI package to facilitate sending a
video file (or any other MIME type)? Or do I have to rely entirely on
redirection? Without the redirection, the browser seemed to be
deciding what to do based on the CGI script name rather than the
content type header (unless the header functionin the CGI package
doesn't do what the documentation implies).

Thanks

Ted

Tad McClellan

unread,

Nov 25, 2009, 6:11:29 PM11/25/09

to

Ted Byers <r.ted...@gmail.com> wrote:

> print $query->header('video/$format');

^ ^
^ ^

Single quotes do not interpolate...

> This works adequately if you provide something like $format='avi',

That is simply not possible.

If it worked adequately, then it must certainly have NOT been
the code you've shown us...

--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Uri Guttman

unread,

Nov 25, 2009, 6:16:35 PM11/25/09

to

>>>>> "TB" == Ted Byers <r.ted...@gmail.com> writes:

TB> On Nov 25, 5:11�pm, "Uri Guttman" <u...@StemSystems.com> wrote:
>> >>>>> "TB" == Ted Byers <r.ted.by...@gmail.com> writes:
>>
>> � TB> From this I believed that I could write something like:
>>
>> � TB> � print $query->header('video/$format');

as tad pointed out, that will not generate the right header. use double
quotes and retest it.

TB> But the key problem remains. In my testing, the client browser
TB> thinks the video file content has the cgi script as the file name.
TB> Did I misunderstand what the CGI package documentation showed, or
TB> did I miss something in that package that would tell the browser
TB> that the content sent after the header is a video file? Does the
TB> CGI package have a function that is used after the header to tell
TB> the script to send a given file? The CGI package is huge and I
TB> may well have missed something in it that relates to this problem.
TB> Or is there another package that can be used with the CGI package
TB> to facilitate sending a video file (or any other MIME type)? Or
TB> do I have to rely entirely on redirection? Without the
TB> redirection, the browser seemed to be deciding what to do based on
TB> the CGI script name rather than the content type header (unless
TB> the header functionin the CGI package doesn't do what the
TB> documentation implies).

that is probably because it doesn't recognize video/$format as a known
type. fix the quotes bug and see what happens.

Mumia W.

unread,

Nov 25, 2009, 6:46:54 PM11/25/09

to

On 11/25/2009 03:33 PM, Ted Byers wrote:
> I am using CGI, and have been able to do most of the things I need to
> do, until now.
>
> With package CGI (and my question is specific to what is in that
> package and what I might have to do beyond what it intrinsicly
> supports), the documentation beginning with the title "CREATING A
> STANDARD HTTP HEADER" gives, among other examples, the following
> example:
>
> print header('image/gif');
>
> From this I believed that I could write something like:
>
> print $query->header('video/$format');

Try this instead:

print $query->header(
'-content-type' => 'text/html',
'-content-disposition' => 'attachment; filename=result.avi',
)

See RFC 2616.

> open(FIN,"<","$fname");
> binmode(FIN);
> binmode STDOUT;
> my $fcontent;
> read FIN, $fcontent, $flength;
> print $fcontent;

> [...]

Ted Byers

unread,

Nov 25, 2009, 7:57:21 PM11/25/09

to

On Nov 25, 6:11 pm, Tad McClellan <ta...@seesig.invalid> wrote:

> Ted Byers <r.ted.by...@gmail.com> wrote:
> > print $query->header('video/$format');
>
> ^ ^
> ^ ^
>
> Single quotes do not interpolate...
>
> > This works adequately if you provide something like $format='avi',
>
> That is simply not possible.
>
> If it worked adequately, then it must certainly have NOT been
> the code you've shown us...
>

I now know why it appeared to work adequately. The code I showed
wrote the contents of the video file to standard out. When it does
so, and you tell it to use the Windows media Player to view it, the
palyer displayed the contents of the file anyway. Which led to the
question I asked. The client received the data I intended to send,
but didn't know what to do with it unless told to ignore the extension
on the name of the cgi script that sent it.

Now I changed the code to use the double quotes, and I then changed
the surrounding code to display what that line prints, and obtained
the following:

Content-Type: video/mpg

However, when I comment out the text output statements and instead
write the contents of the video file, the result is the same.

Any other ideas?

Thanks

Ted

Ted Byers

unread,

Nov 25, 2009, 7:59:12 PM11/25/09

to

On Nov 25, 6:16 pm, "Uri Guttman" <u...@StemSystems.com> wrote:
> >>>>> "TB" == Ted Byers <r.ted.by...@gmail.com> writes:
>
> TB> On Nov 25, 5:11 pm, "Uri Guttman" <u...@StemSystems.com> wrote:
> >> >>>>> "TB" == Ted Byers <r.ted.by...@gmail.com> writes:
> >>
> >> TB> From this I believed that I could write something like:
> >>
> >> TB> print $query->header('video/$format');
>
> as tad pointed out, that will not generate the right header. use double
> quotes and retest it.
>

Done, with no change in behaviour/

> TB> But the key problem remains. In my testing, the client browser
> TB> thinks the video file content has the cgi script as the file name.
> TB> Did I misunderstand what the CGI package documentation showed, or
> TB> did I miss something in that package that would tell the browser
> TB> that the content sent after the header is a video file? Does the
> TB> CGI package have a function that is used after the header to tell
> TB> the script to send a given file? The CGI package is huge and I
> TB> may well have missed something in it that relates to this problem.
> TB> Or is there another package that can be used with the CGI package
> TB> to facilitate sending a video file (or any other MIME type)? Or
> TB> do I have to rely entirely on redirection? Without the
> TB> redirection, the browser seemed to be deciding what to do based on
> TB> the CGI script name rather than the content type header (unless
> TB> the header functionin the CGI package doesn't do what the
> TB> documentation implies).
>
> that is probably because it doesn't recognize video/$format as a known
> type. fix the quotes bug and see what happens.
>

Yes, As reported, I did that. When I checked the string sent by the
call to the header function, it printed precisely "Content-Type: video/
mpg".

Why would it not recognize video/mpg?

Cheers,

Ted

Ted Byers

unread,

Nov 25, 2009, 8:27:45 PM11/25/09

to

On Nov 25, 6:46 pm, "Mumia W." <paduille.4061.mumia.w

+nos...@earthlink.net> wrote:
> On 11/25/2009 03:33 PM, Ted Byers wrote:
>
> > I am using CGI, and have been able to do most of the things I need to
> > do, until now.
>
> > With package CGI (and my question is specific to what is in that
> > package and what I might have to do beyond what it intrinsicly
> > supports), the documentation beginning with the title "CREATING A
> > STANDARD HTTP HEADER" gives, among other examples, the following
> > example:
>
> > print header('image/gif');
>
> > From this I believed that I could write something like:
>
> > print $query->header('video/$format');
>
> Try this instead:
>
> print $query->header(
> '-content-type' => 'text/html',
> '-content-disposition' => 'attachment; filename=result.avi',
> )
>
> See RFC 2616.
>

That produces the following server error:
[Wed Nov 25 20:24:54 2009] [error] [client 127.0.0.1] Bad name after
disposition' at C:/ApacheAndPerl/Apache2/cgi-bin/video.server.cgi line
45.

Might there be a typo in the disposition line?

Ted

Sherm Pendley

unread,

Nov 25, 2009, 11:43:23 PM11/25/09

to

Ted Byers <r.ted...@gmail.com> writes:

> Why would it not recognize video/mpg?

Perhaps because it's supposed to be video/mpeg. Similarly, the MIME type
for a .avi file is video/x-msvideo.

sherm--

Ted Byers

unread,

Nov 26, 2009, 10:23:20 AM11/26/09

to

On Nov 25, 11:43 pm, Sherm Pendley <spamt...@shermpendley.com> wrote:

OK, so the root of my problem seems to that the video subtype is
wrong.

Changing video/mpg to video/mpeg fixes the problem for mpeg, but the
problem remains for the asf and avi files. For asf files I tried
video/asf and video/x-ms-asf, and for avi files I tried video/avi,
video/msvideo and video/x-msvideo' all to no avail. I found each of
the variants I tried on the web (such as www.webmaster-toolkit.com/mime-types.shtml
and pcs.cruz-network.net/faq.php, to list only two of those pages I
found).

NB: My line that sets content type has been changed to:

print $query->header('-content-type' => "video/$format",'-content-
length' => $flength);

I figured I might as well set the content length header at the same
time.

I did notice that once I changed video/mpg to video/mpeg, the client
added the mpg extension to the script name and the media player opened
immediately. With the other formats, it left the script name
unchanged. Now, if I tell it to display the content using the media
player, the content is displayed.

Any ideas on what to use for the MIME subtype for AVI and ASF files
that would be recognized by clients like firefox and MS IE?

Thanks

Ted

Mumia W.

unread,

Nov 26, 2009, 10:18:54 AM11/26/09

to

On 11/25/2009 07:27 PM, Ted Byers wrote:
> On Nov 25, 6:46 pm, "Mumia W." <paduille.4061.mumia.w
> +nos...@earthlink.net> wrote:
>> Try this instead:
>>
>> print $query->header(
>> '-content-type' => 'text/html',
>> '-content-disposition' => 'attachment; filename=result.avi',
>> )
>>
>> See RFC 2616.
>>
>
> That produces the following server error:
> [Wed Nov 25 20:24:54 2009] [error] [client 127.0.0.1] Bad name after
> disposition' at C:/ApacheAndPerl/Apache2/cgi-bin/video.server.cgi line
> 45.
>
> Might there be a typo in the disposition line?
>
> Ted
>

It should work. Try this test program:

#!/usr/bin/perl
use strict;
use warnings;
use CGI qw/-no_xhtml :standard/;

my $file = 'content.avi';

print header(
'-content_type' => 'video/avi',
'-content_disposition' => 'attachment; filename=result.avi',
);

open my $fh, '<', $file or die("Failure: $!");
fpassthrough($fh);
close $fh;

sub fpassthrough {
my ($handle) = @_;
local $/ = \1000;
local $_;
while (<$handle>) {
print;
}
}

Ted Byers

unread,

Nov 26, 2009, 11:19:05 AM11/26/09

to

On Nov 26, 10:18 am, "Mumia W." <paduille.4061.mumia.w

Yup. I had to edit a bit so it would work on Windows (path to perl
and use binmode on the file handle), but that worked. So I have to
compare that with what I had yesterday to discover why mine didn't
work.

Do you know if that works for mpeg and asf files? What would you set
the content type to? And I notice you don't set content length with
this.

Thanks.

Ted

Sherm Pendley

unread,

Nov 26, 2009, 11:49:46 AM11/26/09

to

Ted Byers <r.ted...@gmail.com> writes:

> problem remains for the asf and avi files. For asf files I tried
> video/asf and video/x-ms-asf, and for avi files I tried video/avi,
> video/msvideo and video/x-msvideo' all to no avail.

There's no need to guess - just look at Apache's mime.types file to see
what MIME type it maps to a given filename extension. The relevant lines
from my local copy of that file are:

video/x-msvideo avi
video/x-ms-asf asf asx

sherm--

Ted Byers

unread,

Nov 26, 2009, 12:40:35 PM11/26/09

to

On Nov 26, 11:49 am, Sherm Pendley <spamt...@shermpendley.com> wrote:

OK, On mine, there is a line like your's for avi files, but there is
nothing in the mime.types file on my system for asf files. This is
puzzling since everything works well if I just redirect to an asf file
in htdocs instead of setting the content type and then writing the
content of the file to standard out in binary mode.

But that doesn't cover what is happening on the client side. Even
though the server may not send video/avi as the MIME type for an avi
file, both Firefox and MS IE recognize video/avi. I know this because
Mumia's latest example worked fine even though he set the content type
to video/avi. That give's me an idea, from what you said and what
Mumia's example does, that I will have to test after lunch.

Cheers,

Ted

Jochen Lehmeier

unread,

Nov 26, 2009, 2:06:45 PM11/26/09

to

On Wed, 25 Nov 2009 23:57:06 +0100, Ted Byers <r.ted...@gmail.com>
wrote:

> In my testing, the client browser thinks
> the video file content has the cgi script as the file name.

Of course, the URL is http://host/cgi-bin/script.pl or something like
that. The browser thinks the file is called "script.pl".

An easy way to change this is to use the URL
http://host/cgi-bin/script.pl/file.avi (or whatever file name you want to
have). Apache will know to actually call your script.pl, and not try to
access script.pl as a directory.

Ted Byers

unread,

Nov 26, 2009, 2:13:12 PM11/26/09

to

OK, final result: the idea that I got just before lunch, from
combining what Mumia provided and what sherm said paid off, and now
everything works as expected. And I even improved performance by
modifying Mumia's example to use sysread and syswrite.

There are still aspects of the behaviour I saw previously that I don't
understand. For example, once I used video/mpeg as the content type
(not using Mumia's example) the client believed the file name was
'my.cgi.script.cgi.mpg' and knew enough to try to open it using
Windows Media Player, but with all the other content types, the same
client believed the file name was'my.cgi.script.cgi'. Why the
difference? Anyway, although I am not happy with this gap in my
understanding, I can proceed to the next step.

When I applied Mumia's example to my own code, in each case, whether I
was sending an asf file, an avi file or an mpg file, in every case the
client believed the file name was what was actually the correct file
name for the clip being sent, and as a result, in each case the file
was displayed correctly using Windows Media Player.

Thanks all.

Ted

Ted Byers

unread,

Nov 26, 2009, 4:04:10 PM11/26/09

to

On Nov 26, 2:06 pm, "Jochen Lehmeier" <OJZGSRPBZ...@spammotel.com>
wrote:
> On Wed, 25 Nov 2009 23:57:06 +0100, Ted Byers <r.ted.by...@gmail.com>

> wrote:
>
> > In my testing, the client browser thinks
> > the video file content has the cgi script as the file name.
>

> Of course, the URL ishttp://host/cgi-bin/script.plor something like

> that. The browser thinks the file is called "script.pl".
>

I can understand that. What isn't clear is why either the client or
the server is changing that to script.pl.mpg when an mpeg is requested
and not when the files with other video formats are requested (even
avi and there is a line in mime.types saying what the mime type is for
avi files).

> An easy way to change this is to use the URL http://host/cgi-bin/script.pl/file.avi(or whatever file name you want to

> have). Apache will know to actually call your script.pl, and not try to
> access script.pl as a directory.

Oh. OK.

The only issue with that solution is that I won't know until run time
what format the requested clip is actually in.

Thanks

Ted

Ben Morrow

unread,

Nov 26, 2009, 4:31:31 PM11/26/09

to

Quoth Ted Byers <r.ted...@gmail.com>:

> On Nov 26, 2:06�pm, "Jochen Lehmeier" <OJZGSRPBZ...@spammotel.com>
> wrote:
> > On Wed, 25 Nov 2009 23:57:06 +0100, Ted Byers <r.ted.by...@gmail.com> �
> > wrote:
> >
> > > In my testing, the client browser thinks
> > > the video file content has the cgi script as the file name.
> >
> > Of course, the URL ishttp://host/cgi-bin/script.plor something like �
> > that. The browser thinks the file is called "script.pl".
>
> I can understand that. What isn't clear is why either the client or
> the server is changing that to script.pl.mpg when an mpeg is requested
> and not when the files with other video formats are requested (even
> avi and there is a line in mime.types saying what the mime type is for
> avi files).

This has not been a Perl problem for some time now. The question of what
the browser does with the filename is entirely at the discretion of the
browser. You may find that the different types are configured
differently in some way in the browser's MIME settings, or are being
picked up by different plugins. The mime.types file on the server has no
effect here: all that does is map extensions to content types for files
served directly (as opposed to through a CGI script) by the server.

> > An easy way to change this is to use the URL
> �http://host/cgi-bin/script.pl/file.avi(or whatever file name you want
> to �
> > have). Apache will know to actually call your script.pl, and not try to �
> > access script.pl as a directory.
>
> Oh. OK.
>
> The only issue with that solution is that I won't know until run time
> what format the requested clip is actually in.

<shrug> Then simply send the appropriate content-type, and tell people
with broken browsers that think URLs have file extensions to get rid of
them :).

Ben

Peter J. Holzer

unread,

Nov 27, 2009, 12:03:36 PM11/27/09

to

On 2009-11-26 19:13, Ted Byers <r.ted...@gmail.com> wrote:
[generating videos (or actually any content-type) from CGI]

> There are still aspects of the behaviour I saw previously that I don't
> understand. For example, once I used video/mpeg as the content type
> (not using Mumia's example) the client believed the file name was
> 'my.cgi.script.cgi.mpg' and knew enough to try to open it using
> Windows Media Player, but with all the other content types, the same
> client believed the file name was'my.cgi.script.cgi'. Why the
> difference?

As Ben already noted, the "file name" in an URI is supposed to be
completely immaterial to the browser. Whether the URL ends in
"video.cgi" or "video.mpg" or "video.html" should not make any
difference. The only thing that is important for the browser is the
content-type. When the browser recognizes the content-type, it knows how
to handle the file, e.g., to call ms media player. It also knows (on
Windows) which extension a file of this type is supposed to have, so it
can add a proper extension.

(Unfortunately, Firefox subscribes to the "the truth is much too
complicated for the average user, so we lie to them and confuse the heck
out of them" school of thought - so you can't believe anything it
displays in dialog boxes. But at least it does the right thing
internally, unlike IE, which both ignores the content type whenever it
feels like it and lies to the user)

hp

Peter J. Holzer

unread,

Nov 27, 2009, 12:26:30 PM11/27/09

to

On 2009-11-25 22:11, Uri Guttman <u...@StemSystems.com> wrote:
>>>>>> "TB" == Ted Byers <r.ted...@gmail.com> writes:
> TB> my $fcontent;
> TB> read FIN, $fcontent, $flength;
>
> where is $flength set? i assume you would do a -s to get the file size
>
> TB> print $fcontent;
>
> if you want more speed, use sysread and syswrite.

sysread/syswrite probably aren't much faster than read/print. The latter
have a bit more buffer handling overhead but that is almost certainly
negligible when you read data from a disk and send it over the network.

However, if the files are large (and videos can be quite large), you can
save quite a lot of time by reading the file in smallish chunks (a few
kB to a few MB) and send each chunk immediately. If you read the whole
file into memory first and then send it to the client the times for
reading from disk and sending over the net add up. Otherwise they
overlap resulting in a shorter total time.

hp

Uri Guttman

unread,

Nov 27, 2009, 1:49:45 PM11/27/09

to

>>>>> "PJH" == Peter J Holzer <hjp-u...@hjp.at> writes:

PJH> On 2009-11-25 22:11, Uri Guttman <u...@StemSystems.com> wrote:
>>>>>>> "TB" == Ted Byers <r.ted...@gmail.com> writes:
TB> my $fcontent;
TB> read FIN, $fcontent, $flength;
>>
>> where is $flength set? i assume you would do a -s to get the file size
>>
TB> print $fcontent;
>>
>> if you want more speed, use sysread and syswrite.

PJH> sysread/syswrite probably aren't much faster than read/print. The latter
PJH> have a bit more buffer handling overhead but that is almost certainly
PJH> negligible when you read data from a disk and send it over the network.

they both avoid stdio (or perl's version) so they are faster. how much
depends on the amount of i/o and how many calls are made. this is why
file::slurp uses sysread/write. see its benchmarks to see the difference
from read/print.

PJH> However, if the files are large (and videos can be quite large),
PJH> you can save quite a lot of time by reading the file in smallish
PJH> chunks (a few kB to a few MB) and send each chunk immediately. If
PJH> you read the whole file into memory first and then send it to the
PJH> client the times for reading from disk and sending over the net
PJH> add up. Otherwise they overlap resulting in a shorter total time.

for some definition of large and small! :)

Peter J. Holzer

unread,

Nov 28, 2009, 10:26:24 AM11/28/09

to

On 2009-11-27 18:49, Uri Guttman <u...@StemSystems.com> wrote:
>>>>>> "PJH" == Peter J Holzer <hjp-u...@hjp.at> writes:
>
> PJH> On 2009-11-25 22:11, Uri Guttman <u...@StemSystems.com> wrote:
> >> if you want more speed, use sysread and syswrite.
>
> PJH> sysread/syswrite probably aren't much faster than read/print. The latter
> PJH> have a bit more buffer handling overhead but that is almost certainly
> PJH> negligible when you read data from a disk and send it over the network.
>
> they both avoid stdio (or perl's version) so they are faster. how much
> depends on the amount of i/o and how many calls are made. this is why
> file::slurp uses sysread/write. see its benchmarks to see the difference
> from read/print.

Your benchmark was for a 300 MHz SPARC. CPU speed has improved more than
disk speed since then.

So I grabbed the server with the fastest disks I had access to (disk
array of SSDs), created a file with 400 million lines of 80 characters
(plus newline) each and ran some benchmarks:

method time speed (MB/s)
----------------------------------------------
perlio $/ = "\n" 2:35.12 209
perlio $/ = \4096 1:35.36 340
perlio $/ = \1048576 1:35.25 340
sysread bs = 4096 1:35.28 340
sysread bs = 1048576 1:35.18 340

The times are the median of three runs. Times between the runs differed
by about 1 second, so the difference between reading line by line and
block by block is significant, but the difference between perlio and
sysread or between different blocksizes isn't.

I was a bit surprised that reading line by line was so much slower than
blockwise reading. Was it because of the higher loop overhead (81 bytes
read per loop instead of 4096 means 50 times more overhead) or because
splitting a block into lines is so expensive?

So I did another run of benchmarks with different block sizes:

method block user system cpu total
read_file_by_perlio_block 4096 0.64s 26.87s 31% 1:27.91
read_file_by_perlio_block 2048 1.48s 28.65s 34% 1:28.56
read_file_by_perlio_block 1024 5.14s 29.03s 37% 1:30.59
read_file_by_perlio_block 512 11.98s 31.33s 47% 1:31.22
read_file_by_perlio_block 256 26.84s 33.13s 61% 1:36.85
read_file_by_perlio_block 128 43.53s 29.05s 71% 1:41.66
read_file_by_perlio_block 64 77.26s 28.16s 88% 1:59.70
read_file_by_line 104.68s 28.01s 93% 2:22.34

(the times are a bit lower now because here the system was idle while it
had a (relatively constant) load during the first batch)

As expected elapsed time as well as CPU time increases with shrinking
block size. However, even at 64 bytes, reading in blocks is still 20%
faster than reading in lines, even though the loop is now executed 27%
more often.

Conclusions:

* The difference between sysread and blockwise <> isn't even measurable.

* Above 512 Bytes the block size matters very little (and above 4k, not
at all).

* Reading line by line is significantly slower than reading by blocks.

> PJH> However, if the files are large (and videos can be quite large),
> PJH> you can save quite a lot of time by reading the file in smallish
> PJH> chunks (a few kB to a few MB) and send each chunk immediately. If
> PJH> you read the whole file into memory first and then send it to the
> PJH> client the times for reading from disk and sending over the net
> PJH> add up. Otherwise they overlap resulting in a shorter total time.
>
> for some definition of large and small! :)

Let's use a specific example. I have several videos on my disk. The
largest of them is 542 MB.

Let's assume I have this file on the aforementioned SSD array and want to
send it over a gbit network connection. I can read the whole file in
542MB / 340MB/s == 1.6s. I can send it over the network in
542MB / 120MB/s == 4.5 seconds. If I first read it completely into memory
and then send it over the network, the total transfer time is
1.6s + 4.5s == 6.1s. If I read the file in 4kB blocks or even line by
line (not that reading a video line by line makes much sense) I can
still read it faster than it can be sent over the network, but since I
start sending only milliseconds after I start reading, the total
transfer time now is 4.5 seconds, or 35% faster.

hp

Ilya Zakharevich

unread,

Nov 28, 2009, 8:13:25 PM11/28/09

to

On 2009-11-28, Peter J. Holzer <hjp-u...@hjp.at> wrote:
> * Reading line by line is significantly slower than reading by blocks.

Remember that when reading line-by-line (with 80char line), you
actually read 80 times char-by-char.

Yours,
Ilya

Ben Morrow

unread,

Nov 28, 2009, 9:42:46 PM11/28/09

to

Quoth Ilya Zakharevich <nospam...@ilyaz.org>:

Not under normal circumstances. When perl is using buffered IO, it reads
a bufferful and then goes grovelling through it for line endings.

This is true, however, when reading line-by-line with the :unix PerlIO
layer, which is why that should be avoided unless strictly necessary.

Ben

Ilya Zakharevich

unread,

Nov 29, 2009, 3:54:02 AM11/29/09

to

On 2009-11-29, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth Ilya Zakharevich <nospam...@ilyaz.org>:
>> On 2009-11-28, Peter J. Holzer <hjp-u...@hjp.at> wrote:
>> > * Reading line by line is significantly slower than reading by blocks.
>>
>> Remember that when reading line-by-line (with 80char line), you
>> actually read 80 times char-by-char.
>
> Not under normal circumstances. When perl is using buffered IO, it reads
> a bufferful and then goes grovelling through it for line endings.

But "grovelling" happens char-by-char [*]; then one must re-seek() to the
position in question. Inspect how

perl -wle "$in = <STDIN>; print qq({$in}); system $^X, @ARGV" -- -wle "print qq([$_]) while <STDIN>"

behaves when reading from a file and from a pipe...

Yours,
Ilya

[*] Last time I checked, every PerlIO operation would go a dozen
levels deep in subroutine calls - even when a simple macro
count--, c = *buf++ if count > 0
would suffice. PerlIO was written without any regard to
maintainability and efficiency...

Ben Morrow

unread,

Nov 29, 2009, 4:06:29 PM11/29/09

to

Quoth Ilya Zakharevich <nospam...@ilyaz.org>:
> On 2009-11-29, Ben Morrow <b...@morrow.me.uk> wrote:
> > Quoth Ilya Zakharevich <nospam...@ilyaz.org>:
> >> On 2009-11-28, Peter J. Holzer <hjp-u...@hjp.at> wrote:
> >> > * Reading line by line is significantly slower than reading by blocks.
> >>
> >> Remember that when reading line-by-line (with 80char line), you
> >> actually read 80 times char-by-char.
> >
> > Not under normal circumstances. When perl is using buffered IO, it reads
> > a bufferful and then goes grovelling through it for line endings.
>
> But "grovelling" happens char-by-char [*]; then one must re-seek() to the
> position in question.

If I run

~% perl -E'say for 1..1000' >foo
~% ktrace perl -pe1 foo >/dev/null

then the only syscalls I see for fd 3 are

67709 perl CALL open(0x81020d4,O_RDONLY,<unused>0x1b6)
67709 perl RET open 3
67709 perl CALL ioctl(0x3,TIOCGETA,0xbfbfe0e8)
67709 perl RET ioctl -1 errno 25 Inappropriate ioctl for device
67709 perl CALL lseek(0x3,0,SEEK_SET,0x1)
67709 perl RET lseek 0
67709 perl CALL fstat(0x3,0x281a7a20)
67709 perl RET fstat 0
67709 perl CALL fcntl(0x3,F_SETFD,FD_CLOEXEC)
67709 perl RET fcntl 0
67709 perl CALL read(0x3,0x811c804,0x1000)
67709 perl RET read 3893/0xf35
67709 perl CALL read(0x3,0x811c804,0x1000)
67709 perl RET read 0
67709 perl CALL close(0x3)
67709 perl RET close 0

so once the file has been opened and examined perl calls read(2) exactly
twice, and lseek(2) not at all.

> Inspect how
>
> perl -wle "$in = <STDIN>; print qq({$in}); system $^X, @ARGV" -- -wle
> "print qq([$_]) while <STDIN>"
>
> behaves when reading from a file and from a pipe...

ktrace says (AFAICT) that perl does a single lseek to where perl thinks
the file pointer should be just before calling fork(2).

Ben

Ilya Zakharevich

unread,

Nov 29, 2009, 11:31:45 PM11/29/09

to

On 2009-11-29, Ben Morrow <b...@morrow.me.uk> wrote:
>> >> Remember that when reading line-by-line (with 80char line), you
>> >> actually read 80 times char-by-char.

>> > Not under normal circumstances. When perl is using buffered IO, it reads
>> > a bufferful and then goes grovelling through it for line endings.

>> But "grovelling" happens char-by-char [*]; then one must re-seek() to the
>> position in question.

> If I run
>
> ~% perl -E'say for 1..1000' >foo
> ~% ktrace perl -pe1 foo >/dev/null
>
> then the only syscalls I see for fd 3 are

First, I have no idea what `say' would do. But, judging by the name,
it probably would not do anything with line-orented read?

>> Inspect how
>>
>> perl -wle "$in = <STDIN>; print qq({$in}); system $^X, @ARGV" -- -wle
>> "print qq([$_]) while <STDIN>"
>>
>> behaves when reading from a file and from a pipe...
>
> ktrace says (AFAICT) that perl does a single lseek to where perl thinks
> the file pointer should be just before calling fork(2).

This is even better than how it was before PerlIO was introduced!

Compare this with how it was quite recently: IIRC, about 5-7 years
after PerlIO was introduced, when I reported a spurious seek() per
character read (!), everybody behaved as if it was a surprise to them...

Thanks for clarifications,
Ilya

Ben Morrow

unread,

Nov 30, 2009, 12:15:19 AM11/30/09

to

Quoth Ilya Zakharevich <nospam...@ilyaz.org>:

> On 2009-11-29, Ben Morrow <b...@morrow.me.uk> wrote:
> >> >> Remember that when reading line-by-line (with 80char line), you
> >> >> actually read 80 times char-by-char.
>
> >> > Not under normal circumstances. When perl is using buffered IO, it reads
> >> > a bufferful and then goes grovelling through it for line endings.
>
> >> But "grovelling" happens char-by-char [*]; then one must re-seek() to the
> >> position in question.
>
> > If I run
> >
> > ~% perl -E'say for 1..1000' >foo
> > ~% ktrace perl -pe1 foo >/dev/null
> >
> > then the only syscalls I see for fd 3 are
>
> First, I have no idea what `say' would do. But, judging by the name,
> it probably would not do anything with line-orented read?

The first command is just to create a data file. It is equivalent to

perl -le "print for 1..1000" >foo

'say' was introduced with perl 5.10, and the -E option is equivalent to
-e but allows the new 5.10 features.

Ben

Ben Morrow

unread,

Nov 30, 2009, 1:03:16 AM11/30/09

to

Quoth Ben Morrow <b...@morrow.me.uk>:

>
> Quoth Ilya Zakharevich <nospam...@ilyaz.org>:
> > On 2009-11-29, Ben Morrow <b...@morrow.me.uk> wrote:
> >
> > > If I run
> > >
> > > ~% perl -E'say for 1..1000' >foo
> > > ~% ktrace perl -pe1 foo >/dev/null
> > >
> > > then the only syscalls I see for fd 3 are
> >
> > First, I have no idea what `say' would do. But, judging by the name,
> > it probably would not do anything with line-orented read?
>
> The first command is just to create a data file. It is equivalent to

Sorry, I realise I may still have been unclear. The important command is
the second,

ktrace perl -pe1 foo >/dev/null

which runs

perl -pe1 foo >/dev/null

and records all the syscalls it makes. (Obviously, that is doing
line-oriented read; I also wanted the data file to be larger than one
bufferful.)

Ben