IO opinions -- lines vs records vs streams

Dan Sugalski

unread,

May 8, 2004, 1:30:41 PM5/8/04

to perl6-i...@perl.org

Okay, as I work on this, time to weigh in with opinions.

Do we want to make a distinction between record reads and just plain
"read me X (bytes|codepoints|graphemes)" requests on filehandles and,
if so, do we think it's worth distinguishing between fake records
(line-oriented things) and real records (where there's a fixed record
size or absolute record marker)? Having worked for years on systems
that did real filesystem-level records I can make a case either
way--they both have their own unique problems, especially when you're
working with cross-platform apps.

(Note that, regardless of anything else, we do need to separate out
stream IO and record IO, both for layer filtering reasons and for
pure practicality as there still are some pure-record filehandles
(UDP sockets and such) even on a Unix system)
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Jeff Clites

unread,

May 8, 2004, 7:59:52 PM5/8/04

to Dan Sugalski, perl6-i...@perl.org

On May 8, 2004, at 10:30 AM, Dan Sugalski wrote:

> Do we want to make a distinction between record reads and just plain
> "read me X (bytes|codepoints|graphemes)" requests on filehandles and,
> if so, do we think it's worth distinguishing between fake records
> (line-oriented things) and real records (where there's a fixed record
> size or absolute record marker)?

I'd say that there's no need to distinguish. C's stdlib tries to be
record-oriented, and I've never found it to be useful. Trying to be
record-oriented (for what people today want from records) at the IO
level seems awkward--and it's easy to write (at the user level) a "give
me the next token" interface on top of a byte-source or a
character-source, and there's not a lot of benefit to modeling this as
IO.

> (Note that, regardless of anything else, we do need to separate out
> stream IO and record IO, both for layer filtering reasons and for pure
> practicality as there still are some pure-record filehandles (UDP
> sockets and such) even on a Unix system)

On Unix, record-oriented IO is specific to sockets only (not
filehandles in general). Not sure what you mean by "layer filtering".

On a possibly related note: I've seen mention of "IO filters" for
Parrot (in addition to "IO layers"). What are these filters supposed to
be/do/look like?

JEff

Tim Bunce

unread,

May 10, 2004, 5:34:59 AM5/10/04

to Jeff Clites, Dan Sugalski, perl6-i...@perl.org

On Sat, May 08, 2004 at 04:59:52PM -0700, Jeff Clites wrote:
> On May 8, 2004, at 10:30 AM, Dan Sugalski wrote:
>
> >Do we want to make a distinction between record reads and just plain
> >"read me X (bytes|codepoints|graphemes)" requests on filehandles and,
> >if so, do we think it's worth distinguishing between fake records
> >(line-oriented things) and real records (where there's a fixed record
> >size or absolute record marker)?
>
> I'd say that there's no need to distinguish. C's stdlib tries to be
> record-oriented, and I've never found it to be useful. Trying to be
> record-oriented (for what people today want from records) at the IO
> level seems awkward--and it's easy to write (at the user level) a "give
> me the next token" interface on top of a byte-source or a
> character-source, and there's not a lot of benefit to modeling this as
> IO.
>
> >(Note that, regardless of anything else, we do need to separate out
> >stream IO and record IO, both for layer filtering reasons and for pure
> >practicality as there still are some pure-record filehandles (UDP
> >sockets and such) even on a Unix system)
>
> On Unix, record-oriented IO is specific to sockets only (not
> filehandles in general). Not sure what you mean by "layer filtering".

If I write to a filehandle for a file opened in append mode I want
(to be able to make) that write still be atomic when it gets to the
operating system (ie not broken up into multiple writes, or merged
with previous data).

Tim.

Dan Sugalski

unread,

May 10, 2004, 10:31:46 AM5/10/04

to Tim Bunce, Jeff Clites, perl6-i...@perl.org

Read and write requests will be defined as atomic unless explicitly
overridden. (Though some filehandles will allow partial reads at
least by default. I don't think any will allow fragmenting reads or
writes, however)

Jeff Clites

unread,

May 10, 2004, 11:31:42 AM5/10/04

to Dan Sugalski, perl6-i...@perl.org, Tim Bunce

On May 10, 2004, at 7:31 AM, Dan Sugalski wrote:

> At 10:34 AM +0100 5/10/04, Tim Bunce wrote:
>> On Sat, May 08, 2004 at 04:59:52PM -0700, Jeff Clites wrote:
>> > On May 8, 2004, at 10:30 AM, Dan Sugalski wrote:
>> > >(Note that, regardless of anything else, we do need to separate
>> out
>>> >stream IO and record IO, both for layer filtering reasons and for
>>> pure
>>> >practicality as there still are some pure-record filehandles (UDP
>>> >sockets and such) even on a Unix system)
>>>
>>> On Unix, record-oriented IO is specific to sockets only (not
>>> filehandles in general). Not sure what you mean by "layer
>>> filtering".
>>
>> If I write to a filehandle for a file opened in append mode I want
>> (to be able to make) that write still be atomic when it gets to the
>> operating system (ie not broken up into multiple writes, or merged
>> with previous data).

That's what "layer filtering" means?

> Read and write requests will be defined as atomic unless explicitly
> overridden. (Though some filehandles will allow partial reads at least
> by default. I don't think any will allow fragmenting reads or writes,
> however)

I assume that's not the case if there is a buffering layer in place for
the filehandle? (Since the point of write-buffering is to aggregate
writes.)

Also, of course, you can't guarantee that things are atomic at the OS
level. That is, if I try to write a large amount of data to a socket,
the OS will only take part of that data at a time. Parrot can keep
trying, and not return until all the data is written, but it's not
"atomic" at the OS level. (That is, it can complete partially and then
fail, or other data can get sent in between the "chunks".)

JEff

Dan Sugalski

unread,

May 10, 2004, 12:05:11 PM5/10/04

to Jeff Clites, perl6-i...@perl.org, Tim Bunce

At 8:31 AM -0700 5/10/04, Jeff Clites wrote:
>On May 10, 2004, at 7:31 AM, Dan Sugalski wrote:
>
>>At 10:34 AM +0100 5/10/04, Tim Bunce wrote:
>>>On Sat, May 08, 2004 at 04:59:52PM -0700, Jeff Clites wrote:
>>> > On May 8, 2004, at 10:30 AM, Dan Sugalski wrote:
>>> > >(Note that, regardless of anything else, we do need to separate out
>>>> >stream IO and record IO, both for layer filtering reasons and for pure
>>>> >practicality as there still are some pure-record filehandles (UDP
>>>> >sockets and such) even on a Unix system)
>>>>
>>>> On Unix, record-oriented IO is specific to sockets only (not
>>>> filehandles in general). Not sure what you mean by "layer filtering".
>>>
>>>If I write to a filehandle for a file opened in append mode I want
>>>(to be able to make) that write still be atomic when it gets to the
>>>operating system (ie not broken up into multiple writes, or merged
>>>with previous data).
>
>That's what "layer filtering" means?

Nope, that's somethign else instead.

>>Read and write requests will be defined as atomic unless explicitly
>>overridden. (Though some filehandles will allow partial reads at
>>least by default. I don't think any will allow fragmenting reads or
>>writes, however)
>
>I assume that's not the case if there is a buffering layer in place
>for the filehandle? (Since the point of write-buffering is to
>aggregate writes.)

Yes. If you put a buffer layer in place then you've explicitly
overridden the atomic guarantees of the stream. :)

>Also, of course, you can't guarantee that things are atomic at the
>OS level. That is, if I try to write a large amount of data to a
>socket, the OS will only take part of that data at a time. Parrot
>can keep trying, and not return until all the data is written, but
>it's not "atomic" at the OS level. (That is, it can complete
>partially and then fail, or other data can get sent in between the
>"chunks".)

There are two things here which need to be separated out.

Parrot will make atomic writes to the OS. If we guarantee that we
will and we can't, well... the write throws an exception. (If a
filehandle has lifted that guarantee then we won't, of course) So
trying to write 10K to a UDP socket's going to fail and throw an
exception. That's fine.

If the OS then fragments the write, that's not our problem, and
there's not a whole lot we can do about that. It's likely OK, though,
since it's what generally happens now, with most IO living behind
buffers. (And apps that bypass the buffers for reliability will
presumably know what to do to make that happen)

Jeff Clites

unread,

May 10, 2004, 1:00:20 PM5/10/04

to Dan Sugalski, perl6-i...@perl.org, Tim Bunce

On May 10, 2004, at 9:05 AM, Dan Sugalski wrote:

> At 8:31 AM -0700 5/10/04, Jeff Clites wrote:
>> Also, of course, you can't guarantee that things are atomic at the OS
>> level. That is, if I try to write a large amount of data to a socket,
>> the OS will only take part of that data at a time. Parrot can keep
>> trying, and not return until all the data is written, but it's not
>> "atomic" at the OS level. (That is, it can complete partially and
>> then fail, or other data can get sent in between the "chunks".)
>
> There are two things here which need to be separated out.
>
> Parrot will make atomic writes to the OS. If we guarantee that we will
> and we can't, well... the write throws an exception. (If a filehandle
> has lifted that guarantee then we won't, of course) So trying to write
> 10K to a UDP socket's going to fail and throw an exception. That's
> fine.

Reads and writes from sockets are almost always going to be partial (at
the OS level)--a write will only fill up the kernel write buffer and
return (only blocking if it can't write even one byte), and a read will
only return what's in the read buffer in the kernel (only blocking if
there's not even one byte there to read). It's completely
commonplace--we don't want to throw an exception in these cases. We can
certainly loop until we've written or read as much as requested, and
throw an exception if something goes wrong there, but we shouldn't give
people the impression that it's expected to be able to write several KB
to a pipe or stream socket in a manner which is "atomic" at the OS
level--that's just not how these entities work.

JEff