I/O PDD - ready for implementation

Allison Randal

unread,

Jan 3, 2007, 3:47:03 AM1/3/07

to Perl 6 Internals

I've just moved pdd22 out of the clip directory, marking it as ready for
beginning implementation efforts. Comments on the pdd are welcomed. A
few things worth highlighting:

- I/O layers will be replaced with role composition.

- Parrot's concurrency model for async I/O is a modified form of the
callback function model, combined with a central concurrency scheduler
(see recent comments added to PDD 25).

- One question still under discussion is error handling. Should all
errors be exceptions? Integer status codes? More details in the PDD.

Allison

Jonathan Worthington

unread,

Jan 4, 2007, 6:00:23 PM1/4/07

to Allison Randal, Perl 6 Internals

Allison,

Allison Randal wrote:
> I've just moved pdd22 out of the clip directory, marking it as ready
> for beginning implementation efforts.

Excellent! I've just read through it and like what I'm seeing.

> Comments on the pdd are welcomed. A few things worth highlighting:
>
> - I/O layers will be replaced with role composition.

I like the look of this, but if I wanted to go about implementing it I
feel I'm kinda short of what that means implementation wise. I'm happy
enough with flattening composition and all that jazz, but a little extra
guidance on what you're expecting to compose into the object and where
the roles come from would be very helpful to me, and maybe to others who
are considering trying to implement some of this. (I hope the question
makes sense. If not, I'll try and ask it a better way. Well, a different
way at least. :-))

> - One question still under discussion is error handling. Should all
> errors be exceptions? Integer status codes? More details in the PDD.

I suspect it's cheaper/easier for a compiler to generate code to check a
return value and throw an exception of its own choosing, than it is to
emit code to catch the exception and return the error value.

Thanks,

Jonathan

Allison Randal

unread,

Jan 6, 2007, 1:59:59 AM1/6/07

to Jonathan Worthington, Perl 6 Internals

Jonathan Worthington wrote:
> I like the look of this, but if I wanted to go about implementing it I
> feel I'm kinda short of what that means implementation wise. I'm happy
> enough with flattening composition and all that jazz, but a little extra
> guidance on what you're expecting to compose into the object and where
> the roles come from would be very helpful to me, and maybe to others who
> are considering trying to implement some of this. (I hope the question
> makes sense. If not, I'll try and ask it a better way. Well, a different
> way at least. :-))

I'm working on defining roles in the objects PDD. So, the short answer
is "not defined yet".

The longer answer is that we're probably going to need more than one
stage of composition:

- The roles that implement OS-specific behavior can be composed at the
time Parrot itself is compiled. There's no need to duplicate the effort
of determining which operating system you're running on every time you
load a ParrotI/O object.

- Some roles can be composed at the time the object is constructed, such
as the network or socket I/O functionality.

- Some roles can be composed into an already instantiated object, such
as utf8 or binary file-handling functionality. These might be
implemented by changing the object to belong to a lightweight singleton
class created on the fly as a subclass of the object's original class.

For a first implementation, before the new objects PDD is finished we
have basically two options: Implement the functionality as a simple
low-level PMC, with a few subclasses (much like the current Array and
Hash PMC sets). Or, implement it as a high-level (PIR) class, with a set
of partial classes and simulate composition with '.include' statements.
The former gives us better immediate access to low-level C
functionality. The latter gives us a better way to play with composition
behavior and lightweight embedding mechanisms.

One thing that would be useful is a list of all the currently existing
I/O layers, what they do, and where they're used. I'll get to it at some
point, if no one else gets there first.

> I suspect it's cheaper/easier for a compiler to generate code to check a
> return value and throw an exception of its own choosing, than it is to
> emit code to catch the exception and return the error value.

Yeah, I like that perspective. Implement the alternative that's easiest
to build on top of.

Allison

Larry Wall

unread,

Jan 9, 2007, 12:33:52 PM1/9/07

to Perl 6 Internals

On Fri, Jan 05, 2007 at 10:59:59PM -0800, Allison Randal wrote:
: Jonathan Worthington wrote:
: >I suspect it's cheaper/easier for a compiler to generate code to check a

: >return value and throw an exception of its own choosing, than it is to
: >emit code to catch the exception and return the error value.

Depends on whether by "error value" you mean "scalar error value"...

: Yeah, I like that perspective. Implement the alternative that's easiest

: to build on top of.

However, I think you can make it even easier to build on top of if
you take the third of those two alternatives...

The Perl 6 perspective on this is that error values should be allowed to
be as "interesting" as you like. The lower level routine goes ahead and
pregenerates the exception object but returns it as an interesting
error value instead of throwing it. Then the calling code can just
decide to throw the unthrown exception, or it can generate its own
exception (that perhaps includes the unthrown exception). In any case,
you get a better error message if you include all the relevant facts
from the lower-level routine, and those tend to get lost with scalar
error values. By returning an object you still a simple test to see
whether there's an exception, but you're not limiting the information
flow by assuming all the information passes through whatever scalar
is functioning as the boolean value of "oops".

In any case, this would certainly make it easier to put Perl 6 on top. :)

Larry

Allison Randal

unread,

Jan 10, 2007, 1:28:22 AM1/10/07

to Perl 6 Internals

Larry Wall wrote:
>
> The Perl 6 perspective on this is that error values should be allowed to
> be as "interesting" as you like. The lower level routine goes ahead and
> pregenerates the exception object but returns it as an interesting
> error value instead of throwing it. Then the calling code can just
> decide to throw the unthrown exception, or it can generate its own
> exception (that perhaps includes the unthrown exception). In any case,
> you get a better error message if you include all the relevant facts
> from the lower-level routine, and those tend to get lost with scalar
> error values. By returning an object you still a simple test to see
> whether there's an exception, but you're not limiting the information
> flow by assuming all the information passes through whatever scalar
> is functioning as the boolean value of "oops".
>
> In any case, this would certainly make it easier to put Perl 6 on top. :)

Aye, async operations will return a status object and only a status
object (a multi-layered object that contains integer status, an unthrown
exception/error object, the return value after the async operation is
complete, etc).

So the question is mainly about sync ops. The PDD currently says that
sync ops return integer status codes and async ops return status
objects, but that's a potentially confusing difference in behavior
between the two and potentially added and unnecessary complexity for
compiler writers trying to generate code for both sync and async ops.

If both sync and async ops returned status objects, it would be
conveniently consistent. It would also be annoyingly heavyweight for a
simple synchronous operation, but then the biggest cost for I/O ops is
generally the I/O.

The remaining question is: many sync I/O ops return a value in addition
to the status. Some return values are integers, many are strings, some
are PMCs. Should we return just the status object from synchronous ops
like we do for asynchronous ops (retrieving the return value from the
status object), or return a status object followed by the return value?
What's easiest for the compiler writers? What's easiest for people
writing low-level libraries in PIR?

Allison

Larry Wall

unread,

Jan 10, 2007, 12:26:22 PM1/10/07

to Perl 6 Internals

On Tue, Jan 09, 2007 at 10:28:22PM -0800, Allison Randal wrote:
: If both sync and async ops returned status objects, it would be

: conveniently consistent. It would also be annoyingly heavyweight for a
: simple synchronous operation, but then the biggest cost for I/O ops is
: generally the I/O.

Possible optimization: for those success values that are sufficiently
"uninteresting" maybe they could just be refs to constant shared
objects so you avoid allocating them every time. Even if you have to
return some integer like a number of characters read, this is usually
the same number till the last block of the file, so that could be
factored out.

Or it could return a scalar plus an optional object that's only there
if the scalar indicates an unthrown exception or other objectified return.

Or it could be an out-of-band thing like errno, but it would just
happen to be an out-of-band object instead of an integer. I can imagine
various states in between where it looks out-of-band but really comes
through the return interface for cleanness.

And, actually, now that I think of it, the way Perl 6 handles $! is
that it's always a lexical, and the fail in the called routine looks
it up in the caller's lexical pad and sets it there, which foists
the work off on the fail code rather than cluttering the return code.
It's also cleaner than P5's global $! approach. In fact, in P6 $! is
just a lexically scoped alias to the failure object that's coming back
in-band anyway as an interesting undef object. We're only keeping
$! for the convenience of the caller, so the caller can compose error
messages as in P5.

Anyway, just thought I'd toss out a few ideas. How it appears to
do it, and how it actually does it may or may not be two different
things depending on how much abstraction you want in your OO assembly
language. And thankfully that is not my decision. :)

Larry

Nicholas Clark

unread,

Jan 17, 2007, 2:47:30 PM1/17/07

to Perl 6 Internals

On Tue, Jan 09, 2007 at 09:33:52AM -0800, Larry Wall wrote:

> The Perl 6 perspective on this is that error values should be allowed to
> be as "interesting" as you like. The lower level routine goes ahead and
> pregenerates the exception object but returns it as an interesting
> error value instead of throwing it. Then the calling code can just
> decide to throw the unthrown exception, or it can generate its own
> exception (that perhaps includes the unthrown exception). In any case,
> you get a better error message if you include all the relevant facts
> from the lower-level routine, and those tend to get lost with scalar
> error values. By returning an object you still a simple test to see
> whether there's an exception, but you're not limiting the information
> flow by assuming all the information passes through whatever scalar
> is functioning as the boolean value of "oops".
>
> In any case, this would certainly make it easier to put Perl 6 on top. :)

It would actually also make it easier to put Perl 5 done-with-hindsight on
top of :-)

One of the issues with writing IO layers in Perl 5 is that the existing
interface is defined in terms of Perl builtins that return undef on failure,
and set C<$!>, and in turn C<$!> is only allowed to hold a small vocabulary
of integer codes (typically around 100) which are defined by the operating
system, for use in reporting operating system level errors.

This works well on regular IO, talking direct to the operating system, but
goes pear shaped when you write an IO layer, and want to report an error
condition. You're forced to make a lossy mapping of your true error
condition (such as detecting an invalid character encoding or corrupt
compressed data) into the least inappropriate errno value. It would be much
nicer to have the option of returning true objects.

On Wed, Jan 10, 2007 at 09:26:22AM -0800, Larry Wall wrote:

> Possible optimization: for those success values that are sufficiently
> "uninteresting" maybe they could just be refs to constant shared
> objects so you avoid allocating them every time. Even if you have to
> return some integer like a number of characters read, this is usually
> the same number till the last block of the file, so that could be
> factored out.

If most IO operations are actually returning the OS error code, then having
around 128 cached shared objects for boxing up each errno value seems feasible
to me.

> Or it could be an out-of-band thing like errno, but it would just
> happen to be an out-of-band object instead of an integer. I can imagine
> various states in between where it looks out-of-band but really comes
> through the return interface for cleanness.

Out of band things feel bad. I'm not sure how parrot will implement
concurrency, but C's return -1 with out-of-band errno feels like a mistake
to avoid. It ends up with C implementations having to use icky hacks to make
something that feels like it's

extern int errno;

but is actually thread local (whilst still being read/write).

The Linux Kernel made a nicer decision to change from -1 to a negative value
(just as efficient to check) where the negative value happens to be the
errno value. The POSIX threads API avoids conflating value returns with
error returns by specifying that the return value is
success-or-positive-errno, but again it's avoiding anything out of band,
or seemingly-out-of-band.

Nicholas Clark