[perl #21600] [PATCH] Enable buffer io in PIO

Dan Sugalski

unread,

Mar 17, 2003, 5:02:15 PM3/17/03

to perl6-i...@perl.org, bugs-bi...@netlabs.develooper.com

At 10:38 AM +0000 3/17/03, "Jürgen" "Bömmels" (via RT) wrote:
>Next baby-step in PIO: Enabling buffering.
>
>This patch patch enables the formerly stubbed out buffering, and
>shakes out some bugs (Only the first part: Write buffering). Certainly
>all tests passed on my machine.

Applied, thanks.

Looks like it's getting close to time to finish the async IO docs...
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Benjamin Goldberg

unread,

Mar 17, 2003, 7:10:26 PM3/17/03

to perl6-i...@perl.org

"JüRgen BöMmels" wrote:
[snip]
> void
> PIO_unix_flush(theINTERP, ParrotIOLayer *layer, ParrotIO *io)
> {
> -# if 0
> fsync(io->fd);
> -# endif
> }

AFAIK, for disk files, fsync has (should have) no visible effect from
the point of view of any user program -- all it does is tell the OS to
start writing the OS-level cache for that handle to disk, and it blocks
until all is copied. So... it is a slow system call, with no visible
effect -- why do we do it?

It's possible that fsync()ing will sometimes be desired, but, IMHO, I
don't think that it should be done by flush -- I'd rather it be done by
an explicit call to sychronized the handle with the disk.

--
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "$@[$a%6
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}

Juergen Boemmels

unread,

Mar 18, 2003, 9:00:23 AM3/18/03

to Benjamin Goldberg, perl6-i...@perl.org

Benjamin Goldberg <gol...@earthlink.net> writes:

> "JüRgen BöMmels" wrote:
> [snip]
> > void
> > PIO_unix_flush(theINTERP, ParrotIOLayer *layer, ParrotIO *io)
> > {
> > -# if 0
> > fsync(io->fd);
> > -# endif
> > }
>
> AFAIK, for disk files, fsync has (should have) no visible effect from
> the point of view of any user program -- all it does is tell the OS to
> start writing the OS-level cache for that handle to disk, and it blocks
> until all is copied.

I think this is correct.

> So... it is a slow system call, with no visible
> effect -- why do we do it?

Deepends on the semantics we want to have for flush. I thought of
flush to disk, but just flush to OS is also a valid asumption. This is
what fflush also does.

> It's possible that fsync()ing will sometimes be desired, but, IMHO, I
> don't think that it should be done by flush -- I'd rather it be done by
> an explicit call to sychronized the handle with the disk.

Maybe another API-Function "Sync" should be added. Not sure.

bye
boe

Dan Sugalski

unread,

Mar 18, 2003, 9:28:36 AM3/18/03

to perl6-i...@perl.org

At 3:00 PM +0100 3/18/03, Juergen Boemmels wrote:
>Benjamin Goldberg <gol...@earthlink.net> writes:
>
>> "JüRgen BöMmels" wrote:
>> [snip]
>> > void
>> > PIO_unix_flush(theINTERP, ParrotIOLayer *layer, ParrotIO *io)
>> > {
>> > -# if 0
>> > fsync(io->fd);
>> > -# endif
>> > }
>>
>> AFAIK, for disk files, fsync has (should have) no visible effect from
>> the point of view of any user program -- all it does is tell the OS to
>> start writing the OS-level cache for that handle to disk, and it blocks
>> until all is copied.
>
>I think this is correct.
>
>> So... it is a slow system call, with no visible
>> effect -- why do we do it?
>
>Deepends on the semantics we want to have for flush. I thought of
>flush to disk, but just flush to OS is also a valid asumption. This is
>what fflush also does.

Given that we're not going to be using the C std library except in
the most extraordinary of circumstances, I don't know that there's
much point in a flush that doesn't actually force the write to the
disk.

Nicholas Clark

unread,

Mar 18, 2003, 4:57:59 PM3/18/03

to Dan Sugalski, perl6-i...@perl.org, Nick Ing-Simmons

On Tue, Mar 18, 2003 at 09:28:36AM -0500, Dan Sugalski wrote:
> At 3:00 PM +0100 3/18/03, Juergen Boemmels wrote:
> >Benjamin Goldberg <gol...@earthlink.net> writes:

> >> So... it is a slow system call, with no visible
> >> effect -- why do we do it?
> >
> >Deepends on the semantics we want to have for flush. I thought of
> >flush to disk, but just flush to OS is also a valid asumption. This is
> >what fflush also does.
>
> Given that we're not going to be using the C std library except in
> the most extraordinary of circumstances, I don't know that there's
> much point in a flush that doesn't actually force the write to the
> disk.

perl5.8 turned PerlIO from a stdio veneer into a full IO system, designed
to provide perl with Unicode capable IO, and also look enough like stdio
to keep embedded programs happy at source level.

The "layers" it use eventually pump data to the OS via calls such as write()
on Unix, and layers have a C struct that acts as a vtable full of methods.

One problem that I felt was present during development, which persists into
release, is that there is only one sync/flush method. The buffered layers
write methods are tending to be implemented as write into the layer's
buffer, and then call that layer's flush method to empty it. This allows new
layers to be derived, with the write being over-ridden in the child, but
flush being unchanged as the parent's implementation.

The problem the upper levels were using flush as described - a call to make
on themselves when they wanted their buffer emptied by the layer below. The
layers at the base of the heap (such as "Unix") were treating flush as
fsync() - get the data out as far as possible.

I'm biased - I know that Nick Ing-Simmons (the implementor) spotted a
problem with this recently, but I can't remember what it is. However, I know
what my problem was with all this - I was writing a gzip compression
layer. There's a big distinction between "bulk pump more data" and "this
data is hot, get it out fast". The former is normal operation, the latter
means call zlib with the flag to flush all data straight through, even if
the compression ratio drops. (It's what you need to compress interactive
streams)

In PerlIO as currently implemented, the upper layer calls my flush method
(from its flush method) every time it empties its buffer. But it also might
call flush when it really wants interactive data passed on in a timely
fashion. So should I drop the compression off every time I get flushed? Or
was it crying wolf?

Oops. This turned into a ramble. "flush" ne "sync". If both are needed in
different places, provide both as distinct methods.

Nicholas Clark

Dan Sugalski

unread,

Mar 18, 2003, 5:36:26 PM3/18/03

to Nicholas Clark, perl6-i...@perl.org, Nick Ing-Simmons

At 9:57 PM +0000 3/18/03, Nicholas Clark wrote:
>Oops. This turned into a ramble. "flush" ne "sync". If both are needed in
>different places, provide both as distinct methods.

Valid point though. We need both a flush and a sync, since we can't
be sure that what's living underneath an io handle is a real file.
'Kay, sync and flush are both on the list.

Steve Fink

unread,

Mar 18, 2003, 10:10:04 PM3/18/03

to Dan Sugalski, Nicholas Clark, perl6-i...@perl.org, Nick Ing-Simmons

On Mar-18, Dan Sugalski wrote:
> At 9:57 PM +0000 3/18/03, Nicholas Clark wrote:
> >Oops. This turned into a ramble. "flush" ne "sync". If both are needed in
> >different places, provide both as distinct methods.
>
> Valid point though. We need both a flush and a sync, since we can't
> be sure that what's living underneath an io handle is a real file.
> 'Kay, sync and flush are both on the list.
> --

I agree with the need for both, or at least the need for flush to not
call fsync. The zlib example is a good one, but I wanted to mention
that the issue exists and is significant even when using vanilla file
I/O. fsync is slow and rarely necessary or useful. If I write to a
temp file and then run a subprocess that uses that temp file, then I
just want the subprocess to see what I wrote. I don't care if it ever
makes it to disk -- and would usually prefer that it didn't, because
I'm going to delete the file soon after.

Anyway, there are many ways that fsync can return before the data
actually makes it to permanent storage (disk drive write caching, for
example.) So you should only use it when you know what you're doing
and can safely make certain assumptions about the system you're
running on.