Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

inefficient print <$fh> in a slurp mode

0 views
Skip to first unread message

Stas Bekman

unread,
Mar 4, 2004, 9:36:35 PM3/4/04
to The Perl5 Porters Mailing List
let's say we have a file with 10 lines of text in it and we have $fh opened to
read this file. This code:

local $/;
my $data = <$fh>;
print $data;

prints $data at once. (it calls PerlIO_write once)

This code, which I'd think should do exactly the same:

local $/;
print <$fh>;

calls PerlIO_write 10 times, on each line. The only way I found to make it
equivalent to the slurp-into-var-and-then-print is to undef $\ as well:

local $\;
local $/;
print <$fh>;

My guess is that it's non-specific to the io system, but has to do with how
perl handles <$fh> in a slurp mode. I just used PerlIO_ layers to trace it.

__________________________________________________________________
Stas Bekman JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide ---> http://perl.apache.org
mailto:st...@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org http://ticketmaster.com

Glenn Linderman

unread,
Mar 4, 2004, 10:36:20 PM3/4/04
to Stas Bekman, The Perl5 Porters Mailing List
On approximately 3/4/2004 6:36 PM, came the following characters from
the keyboard of Stas Bekman:

> let's say we have a file with 10 lines of text in it and we have $fh
> opened to read this file. This code:
>
> local $/;
> my $data = <$fh>;
> print $data;
>
> prints $data at once. (it calls PerlIO_write once)
>
> This code, which I'd think should do exactly the same:
>
> local $/;
> print <$fh>;
>
> calls PerlIO_write 10 times, on each line. The only way I found to make
> it equivalent to the slurp-into-var-and-then-print is to undef $\ as well:
>
> local $\;
> local $/;
> print <$fh>;
>
> My guess is that it's non-specific to the io system, but has to do with
> how perl handles <$fh> in a slurp mode. I just used PerlIO_ layers to
> trace it.

It is context. <$fh> is in scaler context in the first case, and array
context in the second case.

--
Glenn -- http://nevcal.com/
===========================
The best part about procrastination is that you are never bored,
because you have all kinds of things that you should be doing.

Glenn Linderman

unread,
Mar 4, 2004, 11:44:51 PM3/4/04
to Glenn Linderman, Stas Bekman, The Perl5 Porters Mailing List
On approximately 3/4/2004 7:36 PM, came the following characters from
the keyboard of Glenn Linderman:

> On approximately 3/4/2004 6:36 PM, came the following characters from
> the keyboard of Stas Bekman:
>
>> let's say we have a file with 10 lines of text in it and we have $fh
>> opened to read this file. This code:
>>
>> local $/;
>> my $data = <$fh>;
>> print $data;
>>
>> prints $data at once. (it calls PerlIO_write once)
>>
>> This code, which I'd think should do exactly the same:
>>
>> local $/;
>> print <$fh>;
>>
>> calls PerlIO_write 10 times, on each line. The only way I found to
>> make it equivalent to the slurp-into-var-and-then-print is to undef $\
>> as well:
>>
>> local $\;
>> local $/;
>> print <$fh>;
>>
>> My guess is that it's non-specific to the io system, but has to do
>> with how perl handles <$fh> in a slurp mode. I just used PerlIO_
>> layers to trace it.
>
>
> It is context. <$fh> is in scaler context in the first case, and array
> context in the second case.

Of course, it is more than context, too.

Nick Ing-Simmons

unread,
Mar 5, 2004, 4:33:38 PM3/5/04
to st...@stason.org, The Perl5 Porters Mailing List
Stas Bekman <st...@stason.org> writes:
>let's say we have a file with 10 lines of text in it and we have $fh opened to
>read this file. This code:
>
> local $/;
> my $data = <$fh>;
> print $data;
>
>prints $data at once. (it calls PerlIO_write once)
>
>This code, which I'd think should do exactly the same:
>
> local $/;
> print <$fh>;

Not quite. That calls <$fh> in a list context.


Ton Hospel

unread,
Mar 5, 2004, 6:38:58 PM3/5/04
to perl5-...@perl.org
In article <2004030521...@llama.ing-simmons.net>,

Nick Ing-Simmons <ni...@ing-simmons.net> writes:
> Stas Bekman <st...@stason.org> writes:
>>let's say we have a file with 10 lines of text in it and we have $fh opened to
>>read this file. This code:
>>
>> local $/;
>> my $data = <$fh>;
>> print $data;
>>
>>prints $data at once. (it calls PerlIO_write once)
>>
>>This code, which I'd think should do exactly the same:
>>
>> local $/;
>> print <$fh>;
>>calls PerlIO_write 10 times, on each line. >
> Not quite. That calls <$fh> in a list context.

Sure. But since $/ is undef, in both cases you'd expect
the whole filecontents to be passed in one go to print.
And print receives a one-element list in both cases,
so I'd still expect the number of PerlIO_write's to be
the same.

Stas Bekman

unread,
Mar 5, 2004, 9:52:04 PM3/5/04
to Ton Hospel, perl5-...@perl.org

Yup, that has nothing to do with the context. At least it shouldn't have to.
this program should have everything in the first item of the array:

local $/;
my @data = <DATA>;
print join '%', @data;
__DATA__
1
2
3

and it does, as it prints:

1
2
3

whereas commenting out the first line, prints:

1
%2
%3

I'd expect perlio to behave the same. looks like it ignores the local value of $/?

Andreas J Koenig

unread,
Mar 6, 2004, 12:40:28 AM3/6/04
to Stas Bekman, The Perl5 Porters Mailing List
>>>>> On Thu, 04 Mar 2004 18:36:35 -0800, Stas Bekman <st...@stason.org> said:

> My guess is that it's non-specific to the io system, but has to do
> with how perl handles <$fh> in a slurp mode. I just used PerlIO_
> layers to trace it.

can you post the code you used?

--
andreas

Rafael Garcia-Suarez

unread,
Mar 6, 2004, 5:20:36 PM3/6/04
to Stas Bekman, Ton Hospel, perl5-...@perl.org
Stas Bekman wrote:
> this program should have everything in the first item of the array:
>
> local $/;
> my @data = <DATA>;
> print join '%', @data;
> __DATA__
> 1
> 2
> 3
>
> and it does, as it prints:
>
> 1
> 2
> 3
>
> whereas commenting out the first line, prints:
>
> 1
> %2
> %3
>
> I'd expect perlio to behave the same. looks like it ignores the local
> value of $/?

So the bug would be with print() rather than with <$fh> ?

Dave Mitchell

unread,
Mar 6, 2004, 7:01:16 PM3/6/04
to Rafael Garcia-Suarez, Stas Bekman, Ton Hospel, perl5-...@perl.org

Having has a look at this, in all cases having $/ undef causes the whole
file to be slurped in as a single PV. In the case of

local $/;
print <DATA>;
__END__
foo1
bar2
baz3

it shows the following:

(/tmp/p1:4) gv(main::DATA)
=> * GV()
(/tmp/p1:4) readline
=> * PV("foo1\12bar2\12baz3\12"\0)
(/tmp/p1:4) print
foo1
bar2
baz3
=> SV_YES

So print just prints a single 3-line string; however, once it gets as far as
PerlIOBuf_write() this function indirectly calls PerlIO_write once for
each line in the string. As to whether this is a good thing for it to do, I
have no opinion.


--
You live and learn (although usually you just live).

Ton Hospel

unread,
Mar 6, 2004, 7:25:22 PM3/6/04
to perl5-...@perl.org
In article <20040307000116.GA20858@_disolutions.com>,

Dave Mitchell <da...@fdisolutions.com> writes:
> So print just prints a single 3-line string; however, once it gets as far as
> PerlIOBuf_write() this function indirectly calls PerlIO_write once for
> each line in the string. As to whether this is a good thing for it to do, I
> have no opinion.
>
Whether this is seen as a problem or not, I'm curious how print knows
how to behave different for a passed scalar or a one element list. I
thought that by the time print gets its fingers on the argument, the
cases are exactly the same.

Dave Mitchell

unread,
Mar 6, 2004, 8:17:24 PM3/6/04
to Ton Hospel, perl5-...@perl.org

Well, on my system the following script:

local $/;
open my $fh, "/etc/hosts" or die "$!\n";


my $data = <$fh>;
print $data;

Calls PerlIO_write() once for each line in the file. I Guess what determines
whether PerlIO_write() is called once or multiple times is to do with
how STDOUT is buffered. In fact, looking at PerlIOBuf_write(), it has code
along the lines of:

if (PerlIOBase(f)->flags & PERLIO_F_LINEBUF) {
.. print a line at a line
}
else {
... print as a single chunk.
}

Perhaps that's the phenomenon(*) what Stas was seeing???

Dave.

(*) Try typing that word correctly at 1:15am.

--
The Enterprise is captured by a vastly superior alien intelligence which
does not put them on trial.
-- Things That Never Happen in "Star Trek" #10

Stas Bekman

unread,
Mar 6, 2004, 9:19:25 PM3/6/04
to Dave Mitchell, Ton Hospel, perl5-...@perl.org, Andreas J Koenig

Yes, I'm looking at that code too. I can't figure out who sets the
PERLIO_F_LINEBUF flag. It seems like a crlf layer would do that.

Andreas, I had a crash and I also did some updates while installing 5.8.4-tobe
before I came to work on this case again and something has changed. I can no
longer reproduce the case :( The original fh was coming from CGI.pm's file
upload method.

Most likely that was the case that I was seeing - some layer that called
Setlinebuf was pushed onto the STDOUT layers stack, causing the behaviour that
I saw.

perliol.pod has this entry:

---------------------
=item Setlinebuf

void (*Setlinebuf)(pTHX_ PerlIO *f);

Mark the stream as line buffered. C<PerlIOBase_setlinebuf()> sets the
PERLIO_F_LINEBUF flag and is normally sufficient.
----------------------

but it doesn't explain when this gets called by perl (I can't find any
occurence of perl calling this callback). Or when a layer should call it. I
set this callback in :Apache layer as well (I don't call it) and now I think I
should replace it with PerlIOBase_noop_ok? Or will this break something?

Stas Bekman

unread,
Mar 7, 2004, 5:48:52 PM3/7/04
to Nick Ing-Simmons, Dave Mitchell, Andreas J Koenig, perl5-...@perl.org, Ton Hospel
Nick Ing-Simmons wrote:

> Stas Bekman <st...@stason.org> writes:
>
>>>Calls PerlIO_write() once for each line in the file. I Guess what determines
>>>whether PerlIO_write() is called once or multiple times is to do with
>>>how STDOUT is buffered. In fact, looking at PerlIOBuf_write(), it has code
>>>along the lines of:
>>>
>>> if (PerlIOBase(f)->flags & PERLIO_F_LINEBUF) {
>>> .. print a line at a line
>>> }
>>> else {
>>> ... print as a single chunk.
>>> }
>>>
>>>Perhaps that's the phenomenon(*) what Stas was seeing???
>>
>>Yes, I'm looking at that code too. I can't figure out who sets the
>>PERLIO_F_LINEBUF flag.
>
>
> PerlIOBuf_pushed() sets stream to line buffered if attached to a tty.
> This minics stdio "spec".

OK, so it messes up with the flags directly. So, what is this PerlIO tab entry
for:
Setlinebuf PerlIOBase_setlinebuf

Who calls it? I can't figure out whether I need to set this entry, without
knowing when does it get called.

>>It seems like a crlf layer would do that.
>

> No, it diddles with the buffer. You still get a whole buffer's write()
> unless stream is line buffered.

Thank you.

Nick Ing-Simmons

unread,
Mar 7, 2004, 3:15:09 PM3/7/04
to st...@stason.org, Dave Mitchell, Andreas J Koenig, perl5-...@perl.org, Ton Hospel
Stas Bekman <st...@stason.org> writes:
>> Calls PerlIO_write() once for each line in the file. I Guess what determines
>> whether PerlIO_write() is called once or multiple times is to do with
>> how STDOUT is buffered. In fact, looking at PerlIOBuf_write(), it has code
>> along the lines of:
>>
>> if (PerlIOBase(f)->flags & PERLIO_F_LINEBUF) {
>> .. print a line at a line
>> }
>> else {
>> ... print as a single chunk.
>> }
>>
>> Perhaps that's the phenomenon(*) what Stas was seeing???
>
>Yes, I'm looking at that code too. I can't figure out who sets the
>PERLIO_F_LINEBUF flag.

PerlIOBuf_pushed() sets stream to line buffered if attached to a tty.

This minics stdio "spec".

>It seems like a crlf layer would do that.

Nick Ing-Simmons

unread,
Mar 9, 2004, 10:59:44 AM3/9/04
to st...@stason.org, Dave Mitchell, Andreas J Koenig, Nick Ing-Simmons, Ton Hospel, perl5-...@perl.org
Stas Bekman <st...@stason.org> writes:
>
>OK, so it messes up with the flags directly. So, what is this PerlIO tab entry
>for:
> Setlinebuf PerlIOBase_setlinebuf
>
>Who calls it?

As far as I know nothing. It is there from 5.003_02's PerlIO API
at the time a goal was to mimic stdio via #define layer and
some XS then called setlinebuf() passing in what it thought was
a FILE *. Without this entry encapsulation broke and you got segfaults.
I can't find anything in //depot/maint-5.8/perl/... which uses it.

>I can't figure out whether I need to set this entry, without
>knowing when does it get called.

I think you can just use the PerlIOBase_setlinebuf to set the flag.
I left it as a Vtable entry as :stdio wanted an active hook
to call setlinebuf().

Chip Salzenberg

unread,
Mar 9, 2004, 11:08:15 AM3/9/04
to Ton Hospel, perl5-...@perl.org
According to Dave Mitchell:

> if (PerlIOBase(f)->flags & PERLIO_F_LINEBUF) {
> .. print a line at a line
> }
> else {
> ... print as a single chunk.
> }

Oh, that's a bug -- it's a misunderstanding of what "line buffering"
means.

"Line buffered" doesn't mean "one write per line", it means "flush
everything before a newline along with the newline".

So, given "a\nb\nc", where there is no trailing newline, "a\nb\n"
should be written out as a single write, and "c" should be left in the
output buffer.
--
Chip Salzenberg - a.k.a. - <ch...@pobox.com>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K

Nick Ing-Simmons

unread,
Mar 9, 2004, 12:20:07 PM3/9/04
to ch...@pobox.com, perl5-...@perl.org, Ton Hospel
Chip Salzenberg <ch...@pobox.com> writes:
>According to Dave Mitchell:
>> if (PerlIOBase(f)->flags & PERLIO_F_LINEBUF) {
>> .. print a line at a line
>> }
>> else {
>> ... print as a single chunk.
>> }
>
>Oh, that's a bug -- it's a misunderstanding of what "line buffering"
>means.
>
>"Line buffered" doesn't mean "one write per line", it means "flush
>everything before a newline along with the newline".
>
>So, given "a\nb\nc", where there is no trailing newline, "a\nb\n"
>should be written out as a single write, and "c" should be left in the
>output buffer.

PerlIOBuf_write currently does two writes and leaves "c" in the buffer.
Question is whether the efficiency gain of doing one write is worth
either the extra house keeping entry to remember that last \n is
at offset N and then a memmove() to get fragment at start of buffer.

Patch welcome ;-)

Chip Salzenberg

unread,
Mar 9, 2004, 12:31:38 PM3/9/04
to Nick Ing-Simmons, perl5-...@perl.org, Ton Hospel
According to Nick Ing-Simmons:

> PerlIOBuf_write currently does two writes and leaves "c" in the buffer.
> Question is whether the efficiency gain of doing one write is worth
> either the extra house keeping entry to remember that last \n is
> at offset N and then a memmove() to get fragment at start of buffer.

The only change required is that the linebuffer code should look for
the *last* newline instead of the *first*. Everything else is the
same. How hard can it be? <- famous last words

Stas Bekman

unread,
Mar 9, 2004, 1:09:50 PM3/9/04
to Nick Ing-Simmons, Dave Mitchell, Andreas J Koenig, Nick Ing-Simmons, Ton Hospel, perl5-...@perl.org
Nick Ing-Simmons wrote:
> Stas Bekman <st...@stason.org> writes:
>
>>OK, so it messes up with the flags directly. So, what is this PerlIO tab entry
>>for:
>> Setlinebuf PerlIOBase_setlinebuf
>>
>>Who calls it?
>
>
> As far as I know nothing. It is there from 5.003_02's PerlIO API
> at the time a goal was to mimic stdio via #define layer and
> some XS then called setlinebuf() passing in what it thought was
> a FILE *. Without this entry encapsulation broke and you got segfaults.
> I can't find anything in //depot/maint-5.8/perl/... which uses it.

Neither did I. I thought may be some 3rd party module uses it.

>>I can't figure out whether I need to set this entry, without
>>knowing when does it get called.
>
>
> I think you can just use the PerlIOBase_setlinebuf to set the flag.
> I left it as a Vtable entry as :stdio wanted an active hook
> to call setlinebuf().

So can the perliol.pod manpage be amended to explain what is it for, and what
are the implications of having this flag set? So we don't have to repeat this
thread in the future. You've just explained it in another branch of this
thread ("writing everything up to and including the last \n, leaving the rest
in the buffer").

Chip Salzenberg

unread,
Mar 9, 2004, 1:22:58 PM3/9/04
to Nick Ing-Simmons, perl5-...@perl.org, Ton Hospel
It was a bit trickier than I anticipated, but I think this patch will
do. It makes Perl only call write() twice, instead of three times, on
C<print "A\nB\nC">. And it passes 'make test'. And it cuts the total
number of lines in perlio.c by merging common code. It's also a bit
more efficient.

This patch is appropriate for both blead and maint-5.8, IMO.

(A possible further optimization is to call memchr() once to determine
whether there even *are* newlines in the target string, before going
through the whole thing backwards by hand. But given that the target
is probably a tty, there's likely no point.)

==== //depot/perl/perlio.c#247 - /u/projects/perl/current/perlio.c ====
@@ -3692,4 +3692,5 @@
PerlIOBuf *b = PerlIOSelf(f, PerlIOBuf);
const STDCHAR *buf = (const STDCHAR *) vbuf;
+ const STDCHAR *flushptr = buf;
Size_t written = 0;
if (!b->buf)
@@ -3702,30 +3703,24 @@
}
}
+ if (PerlIOBase(f)->flags & PERLIO_F_LINEBUF) {
+ flushptr = buf + count;
+ while (flushptr > buf && *(flushptr - 1) != '\n')
+ --flushptr;
+ }
while (count > 0) {
SSize_t avail = b->bufsiz - (b->ptr - b->buf);
if ((SSize_t) count < avail)
avail = count;
+ if (flushptr > buf && flushptr <= buf + avail)
+ avail = flushptr - buf;
PerlIOBase(f)->flags |= PERLIO_F_WRBUF;
- if (PerlIOBase(f)->flags & PERLIO_F_LINEBUF) {
- while (avail > 0) {
- int ch = *buf++;
- *(b->ptr)++ = ch;
- count--;
- avail--;
- written++;
- if (ch == '\n') {
- PerlIO_flush(f);
- break;
- }
- }
- }
- else {
- if (avail) {
- Copy(buf, b->ptr, avail, STDCHAR);
- count -= avail;
- buf += avail;
- written += avail;
- b->ptr += avail;
- }
+ if (avail) {
+ Copy(buf, b->ptr, avail, STDCHAR);
+ count -= avail;
+ buf += avail;
+ written += avail;
+ b->ptr += avail;
+ if (buf == flushptr)
+ PerlIO_flush(f);
}
if (b->ptr >= (b->buf + b->bufsiz))

Tim Bunce

unread,
Mar 9, 2004, 5:16:40 PM3/9/04
to Stas Bekman, Nick Ing-Simmons, Dave Mitchell, Andreas J Koenig, Nick Ing-Simmons, Ton Hospel, perl5-...@perl.org
On Tue, Mar 09, 2004 at 10:09:50AM -0800, Stas Bekman wrote:
> Nick Ing-Simmons wrote:
> >Stas Bekman <st...@stason.org> writes:
> >
> >>OK, so it messes up with the flags directly. So, what is this PerlIO tab
> >>entry for:
> >> Setlinebuf PerlIOBase_setlinebuf
> >>
> >>Who calls it?
> >
> >As far as I know nothing. It is there from 5.003_02's PerlIO API
> >at the time a goal was to mimic stdio via #define layer and
> >some XS then called setlinebuf() passing in what it thought was
> >a FILE *. Without this entry encapsulation broke and you got segfaults.
> >I can't find anything in //depot/maint-5.8/perl/... which uses it.
>
> Neither did I. I thought may be some 3rd party module uses it.

I don't know if this is relevant to the thread (I've not followed it closely)
but FYI the DBI uses PerlIO_setlinebuf on trace files.

Tim.

Stas Bekman

unread,
Mar 9, 2004, 8:41:20 PM3/9/04
to Tim Bunce, Nick Ing-Simmons, Dave Mitchell, Andreas J Koenig, Nick Ing-Simmons, Ton Hospel, perl5-...@perl.org

it's relevant to the point of where is it used ;)

> but FYI the DBI uses PerlIO_setlinebuf on trace files.

that's to avoid interleaving of log messages?

Tim Bunce

unread,
Mar 10, 2004, 12:45:15 PM3/10/04
to Stas Bekman, Tim Bunce, Nick Ing-Simmons, Dave Mitchell, Andreas J Koenig, Nick Ing-Simmons, Ton Hospel, perl5-...@perl.org
On Tue, Mar 09, 2004 at 05:41:20PM -0800, Stas Bekman wrote:
> Tim Bunce wrote:
> >On Tue, Mar 09, 2004 at 10:09:50AM -0800, Stas Bekman wrote:
> >
> >>Nick Ing-Simmons wrote:
> >>
> >>>Stas Bekman <st...@stason.org> writes:
> >>>
> >>>
> >>>>OK, so it messes up with the flags directly. So, what is this PerlIO
> >>>>tab entry for:
> >>>> Setlinebuf PerlIOBase_setlinebuf
> >>>>
> >>>>Who calls it?
> >>>
> >>>As far as I know nothing. It is there from 5.003_02's PerlIO API
> >>>at the time a goal was to mimic stdio via #define layer and
> >>>some XS then called setlinebuf() passing in what it thought was
> >>>a FILE *. Without this entry encapsulation broke and you got segfaults.
> >>>I can't find anything in //depot/maint-5.8/perl/... which uses it.
> >>
> >>Neither did I. I thought may be some 3rd party module uses it.
> >
> >
> >I don't know if this is relevant to the thread (I've not followed it
> >closely)
>
> it's relevant to the point of where is it used ;)
>
> >but FYI the DBI uses PerlIO_setlinebuf on trace files.
>
> that's to avoid interleaving of log messages?

Partly, more so that a hard crash (segfault) won't leave much unwritten.

Tim.

0 new messages