early draft of I/O PDD

Allison Randal

unread,

Mar 3, 2006, 2:27:05 PM3/3/06

to Internals List

We're going to try something a little different. With Chip's blessing
I've written a very early draft of the PDD for I/O (not numbered
yet). The attached PDD isn't a completed document with Chip's seal of
approval, it's a seed for discussion.

What I need from you all is comments. What's missing? What's
inaccurate? What's accurate for the current state of Parrot, but is
something you always intended to write out later? What thoughts have
you had on how the I/O subsystem should work? All contributions
cheerfully welcomed, from a single sentence to several pages of text.

Chip won't be making design decisions during the conversation. He may
not even participate much, since part of the point is to split the
work into parallel tasks so we can get more done than is humanly
possible for one person. I'll use the discussion to write up a more
complete PDD (probably with a few alternatives written in), and then
work with Chip to review/revise it.

Thanks,
Allison

pddXX_io.pod

Joshua Isom

unread,

Mar 3, 2006, 6:54:43 PM3/3/06

to Allison Randal, Internals List

How do you verify that a print succeeded? Currently there's no way to
know. Throwing an exception if a global flag is set would suffice and
wouldn't require constantly pushing exception handlers in case the
program doesn't care enough (e.g. the run it and delete it variety).
Plus using exceptions would probably make platform independence easier
from PIR's standpoint as it won't need to know what the error code
actually means, and the exception object can include a message saying
so. As well, the lines of code can go straight from open to print to
close without throwing in error checking after each opcode. Since that
runs contrary to how most HLL's operate, I don't know how good of an
idea it really is for parrot.

The 'write' opcode does seem redundant, but what if it's changed to
print to a user selected file stream? Just select the file stream with
one opcode, and each write call after that will print to that instead
of stdout. That's the best idea I can come up with to remove
redundancy(without removing the opcode of course), even though I don't
know if it'd be worth all the side effects.

Concerning all the byte/character issues, all the string opcodes except
bytelength work with characters. But the io subsystem currently only
deals with bytes. I know there is the speed issue for things like
reading with dealing with utf8, but something like 'peek' should
probably be able to get the full character. Also, what's supposed to
be the default encoding for all data read in from a stream where a
layer's not added that explicitly states so? Ascii, binary?

I don't think it's really been addressed, at least not recently, but
what about IPv6? By the time perl6 becomes commonplace and used
often(and thus, parrot), IPv6 will be common enough that problems could
occur. Currently it's not speced or stated, aside from a comment in
PIO_sockaddr_in.

One more thing, what about specing directory handling? Nothing is
speced yet for it.

> <pddXX_io.pod>

Yuval Kogman

unread,

Mar 3, 2006, 7:20:47 PM3/3/06

to Allison Randal, Internals List

With respect to async IO (regretfully I get to see a lot of this at
$job):

Each operation can be async, or sync, with a similar API. There
should be enough hooks to be able to wait on a specific operation
happenning on a stream, any operation on a stream, any operation on
a group of streams, and any operation on any stream.

The resulting set of operations should be optionally available, as
this is one of the biggest sources of boilerplate in posix aio in C.

aio_suspend (or a wrapper on top of it) should probably be the
event loops default idle loop, and it should keep a flag raised such
that if the aio event loop wrapper knows that an event that was not
waited on finished, the user code can check on that easily.

A possiblity for a unified AIO/SIO interface could be that each IO
op returns an operation handle, and you can ask that handle about
it's status (running, finished, error), get it's results from the
handle, and also ask it to block. This could have a high overhead
due to storage creation, but it's generally a pretty flexible and
portable abstraction.

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me sushi-spin-kicks : neeyah!!!!!!!!!!!!!!!!!!!!

Nicholas Clark

unread,

Mar 5, 2006, 1:34:26 PM3/5/06

to Joshua Isom, Allison Randal, Internals List

On Fri, Mar 03, 2006 at 05:54:43PM -0600, Joshua Isom wrote:

> I don't think it's really been addressed, at least not recently, but
> what about IPv6? By the time perl6 becomes commonplace and used
> often(and thus, parrot), IPv6 will be common enough that problems could
> occur. Currently it's not speced or stated, aside from a comment in
> PIO_sockaddr_in.

The draft has:

=item *

C<sockaddr> returns a string representing a socket address, generated
from a port number (integer) and an address (string).

I don't think that this is appropriate. It's IPv4 specific. It doesn't cover
AF_UNIX (er, AF_LOCAL), IPv6, or any of the other address formats (of which
I'm not sure if any other than IPX are still commonly used.

I'm not sure how to make an appropriate interface, partly as I have little
idea if it's possible to make a sufficiently flexible abstraction for
address names. IPv4 and IPv6 both use addresses and port numbers. AF_LOCAL
just uses a string, which is a file system path. But I think that specifying
an op for just one address format is too narrow.

Nicholas Clark

unread,

Mar 5, 2006, 1:24:31 PM3/5/06

to Joshua Isom, Allison Randal, Internals List

On Fri, Mar 03, 2006 at 05:54:43PM -0600, Joshua Isom wrote:

> How do you verify that a print succeeded? Currently there's no way to
> know. Throwing an exception if a global flag is set would suffice and

I assumed that the lack of documentation of any return code meant that it
would return as Perl 5 does. Bad me.

> wouldn't require constantly pushing exception handlers in case the
> program doesn't care enough (e.g. the run it and delete it variety).
> Plus using exceptions would probably make platform independence easier
> from PIR's standpoint as it won't need to know what the error code
> actually means, and the exception object can include a message saying
> so. As well, the lines of code can go straight from open to print to
> close without throwing in error checking after each opcode. Since that
> runs contrary to how most HLL's operate, I don't know how good of an
> idea it really is for parrot.

I get the impression that a lot of people would like to be able to switch
Perl 5 to a mode where IO operations that return errors (but usually don't)
throw exceptions instead. So rather than needing

open $fh, ... or die ...;
print $fh, ... or die ...;
close $fh or die ...;
# Some file systems may not notice that you are over quota until close time

you do

use fatal 'io'; # making this up

open $fh, ...;
print $fh, ...;
close $fh;

and errors are still reported correctly. This approach works even better if
the void context open/print/close are in someone else's code that you're
using, as you can impose decent error handling on it without needing to
rewrite it.

(An alternative setting is to run in exception throwing mode when builtins are
called in void context, but return error codes when someone is listening for
them)

This is a HLL way of working, but not one currently easily supported. However,
my understanding was that PIR doesn't have Perl 5's concept of context, so
some of this isn't going to translate directly. But I think it will be
possible to generate much tighter PIR if IO opcodes can be told to throw
exceptions.

> The 'write' opcode does seem redundant, but what if it's changed to
> print to a user selected file stream? Just select the file stream with
> one opcode, and each write call after that will print to that instead
> of stdout. That's the best idea I can come up with to remove
> redundancy(without removing the opcode of course), even though I don't
> know if it'd be worth all the side effects.

write in Perl 5 is for formats. I don't really see a need for it and print.

> Concerning all the byte/character issues, all the string opcodes except
> bytelength work with characters. But the io subsystem currently only
> deals with bytes. I know there is the speed issue for things like
> reading with dealing with utf8, but something like 'peek' should
> probably be able to get the full character. Also, what's supposed to

peek can't really guarantee to get more than 1 byte. This restriction sort of
comes from C's stdio, which will allow you to ungetc() 1 byte, so you can
(sort of) implement peek, in that you fake peek by pushing it back. However,
the socket API allows a 1 byte peek from a socket without actually consuming
it, so there is precedent at lower level APIs for 1 byte peek rather than a 1
character peek. The perl 5 PerlIO system allows unlimited unread() by pushing
back a temporary buffer - this is more "peek" by "ungetc" than peek by really
peeking.

> be the default encoding for all data read in from a stream where a
> layer's not added that explicitly states so? Ascii, binary?

EBCDIC?

Logically the default default is it's either compile time chosen
"ASCII or EBCDIC" or "binary".

Nicholas Clark

unread,

Mar 5, 2006, 1:40:06 PM3/5/06

to Allison Randal, Internals List

On Fri, Mar 03, 2006 at 11:27:05AM -0800, Allison Randal wrote:

> What I need from you all is comments. What's missing? What's
> inaccurate? What's accurate for the current state of Parrot, but is
> something you always intended to write out later? What thoughts have
> you had on how the I/O subsystem should work? All contributions
> cheerfully welcomed, from a single sentence to several pages of text.

=item *

C<seek> sets the current file position of a stream object to an integer
byte offset from an integer starting position (0 for the start of the
file, 1 for the current position, and 2 for the end of the file).

=item *

C<tell> retrieves the current file position of a stream object. It also
has a 64-bit variant that returns the byte offset as two integers (one
for the first 32 bits of the 64-bit offset, and one for the second 32
bits).

Shouldn't there be a 64-bit variant of seek?
In fact, one doesn't need a tell opcode at all, given that seeking 0 from the
current position is tell. (Although in turn that could function as a test for
"is this file is seekable?"

What does seek return on an unseekable file? What does tell return?
IIRC sfio returned the number of bytes read (or written) for an unseekable
file. Is that useful?

Presumably seek() on a buffered stream discards any written but not flushed
data.

Mmm. Flush opcode needed for buffered streams?

Nicholas Clark

Yuval Kogman

unread,

Mar 5, 2006, 1:44:10 PM3/5/06

to Joshua Isom, Allison Randal, Internals List

Objects that stringify! Objects that stringify!

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &

/\ kung foo master: /me whallops greyface with a fnord: neeyah!!!!!!!

Nicholas Clark

unread,

Mar 5, 2006, 2:11:59 PM3/5/06

to Allison Randal, Internals List

On Fri, Mar 03, 2006 at 11:27:05AM -0800, Allison Randal wrote:

=item *

C<readline> retrieves a single line from a stream into a string. Calling
C<readline> flags the stream as operating in line-buffer mode (see
C<pioctl> below). Lines are truncated at 64K.

Is there a fundamental need for a hard hard limit? I can see that it would
be flexible to allow the stream to set this value, but it doesn't seem very
perlish to deny programmer sufficient rope to hang themselves if they so
wish.

=item *

C<getfd> retrieves the UNIX integer file descriptor of a stream object,
or 0 if it doesn't have an integer file descriptor. [Maybe -1 would be a
better code for "undefined", since standard input is 0.]

I think that this has to be -1, as in an embedded system (such as under
Apache) standard input is often closed, and so descriptor 0 can be re-used.

=item *

C<stat> retrieves information about a file on the filesystem. It takes a
string filename or an integer argument of a UNIX file descriptor, and an
integer flag for the type of information requested. It returns an
integer containing the requested information. The following constants
are defined for the type of information requested (see
F<runtime/parrot/include/stat.pasm>):

To me it seems that stat should also be able to take a PMC representing an
open parrot file handle. I assume that systems exist where we'll be
layering Parrot IO onto underlying OS IO, where the OS uses tokens other
than integers for its files. For example pointers, if miniparrot is built
by layering Parrot IO onto C's stdio.

3 STAT_ISDEV
Whether the file is a device such as a terminal or a disk.

Don't we need to get more flexible than this?
At least file/terminal/socket/character special/block special/fifo/symlink/door

Nicholas Clark

unread,

Mar 5, 2006, 2:46:42 PM3/5/06

to Allison Randal, Internals List

On Fri, Mar 03, 2006 at 11:27:05AM -0800, Allison Randal wrote:

[It's worth considering making all the network I/O opcodes use a
consistent way of marking errors. At the moment, all return an integer
status code except for C<socket>, C<sockaddr>, and C<accept>.]

IIRC the Linux kernel uses negative values as return codes, where these values
are the negation of the value errno would hold. If we can use simple numeric
values to cover all possibilities, this seems like a good approach, as it
avoids thread local issues with errno, and issues such as whether return
values are in errno, h_errno, or returned by the call itself
(C API for system calls, hostname lookups, threading calls respectively)

So I'd suggest that the -1 in these two would be replaced:

=item *

C<recv> receives a message from a connected socket object into a string.
It returns an integer indicating the status of the call, -1 if
unsuccessful.

=item *

C<send> sends a message string to a connected socket object. It returns
an integer indicating the status of the call, -1 if unsuccessful.

As is, these are no different from print and read.
The C socket API provides 3 functions, C<send>, C<sendto> and C<sendmsg>.

C<send> is C<write> with flags.
C<sendto> is C<send> plus an address to use.

[Neither Stevens nor the FreeBSD man page say what happens if the address is
given as NULL]

C<sendmsg> is C<sendto> with the kitchen sink.

We need a C<sendto> or better to do unconnected UDP sockets. If we make the
address optional on our sendto-or-better then we don't need send. And given
the frequency with which people write UDP socket code, I suspect that it would
be better to provide ops only for C<sendmsg> and C<resvmsg>.

=item *

C<poll> polls a socket object for particular types of events (an integer
flag) at a frequency set by seconds and microseconds (the final two
integer arguments). It returns an integer indicating the status of the
call, -1 if unsuccessful. [See the system documentation for C<poll> to
see the constants for event types and return status.]

poll should be available for all file descriptors, shouldn't it, not just
sockets?

Should the network opcodes even be loaded as standard? C<socket> et al aren't
actually that useful on Perl 5 without all the constants in the Socket module,
so in practical terms a redesigned Perl 5 would do better to remove all the
socket specific keywords and make them functions exported by the Socket
module. Should parrot go the same way?

[And likewise System 5 IPC and shared memory builtins really should be
functions, as should all the hosts/protocols/services/password lookups]

Should the IO system provide symbolic lookup on AF_* and PF_* constants.
IIRC at least one of these groups is OS dependant, as demonstrated by Solaris
and SunOS differed on the values they used. It would be "useful" if this
lookup could be flagged to happen in some way at bytecode load time.

(Or maybe I want the moon on a stick alone with a "JIT this lazy constant"
routine. Which probably does generalise, if functions can be marked as
"cacheable" and the JIT can see a cacheable function taking only constants,
and hence constant fold it)

Nicholas Clark

Leopold Toetsch

unread,

Mar 5, 2006, 3:06:33 PM3/5/06

to Nicholas Clark, Internals List, Allison Randal

On Mar 5, 2006, at 20:11, Nicholas Clark wrote:

> C<readline> flags the stream as operating in line-buffer mode (see
> C<pioctl> below). Lines are truncated at 64K.
>
> Is there a fundamental need for a hard hard limit?

There used to be a hard limit until about a year ago. This is of course
gone now.

leo

Joshua Isom

unread,

Mar 5, 2006, 3:53:29 PM3/5/06

to Internals List

On Mar 5, 2006, at 1:46 PM, Nicholas Clark wrote:

> On Fri, Mar 03, 2006 at 11:27:05AM -0800, Allison Randal wrote:
>
> Should the network opcodes even be loaded as standard? C<socket> et al
> aren't
> actually that useful on Perl 5 without all the constants in the Socket
> module,
> so in practical terms a redesigned Perl 5 would do better to remove
> all the
> socket specific keywords and make them functions exported by the Socket
> module. Should parrot go the same way?

A pasm include, such as the signal.pasm(even though signals don't work
yet), would suffice and is generated at compile time. Parsing .h files
with anything but a c preprocessor would cause problems. Writing PMC's
is almost a given, and the PMC's could even provide convenience methods
that would do common things(like open a socket to foo.com:80). If
it/they extend ParrotIO, the possibility of buffered networking would
be possible(although I doubt many would want it on by default). But
for all the defines, a pasm include would be needed.

> Should the IO system provide symbolic lookup on AF_* and PF_*
> constants.
> IIRC at least one of these groups is OS dependant, as demonstrated by
> Solaris
> and SunOS differed on the values they used. It would be "useful" if
> this
> lookup could be flagged to happen in some way at bytecode load time.

With the pasm include method, the problem is they're mainly just
preprocessing. There's currently no way for the include to put a flag
in the resulting bytecode, unless platform specific pasm includes
include some code that'll do something like that(most likely requiring
a new opcode, 'pbcplatform "darwin"' for instance), and whenever the
pasm file is included, it gets ran, and when the pbc is saved that
state can be stored in a pbc header. Sounds really complicated and
troublesome.

Nicholas Clark

unread,

Mar 5, 2006, 4:46:03 PM3/5/06

to Joshua Isom, Internals List

On Sun, Mar 05, 2006 at 02:53:29PM -0600, Joshua Isom wrote:
>
> On Mar 5, 2006, at 1:46 PM, Nicholas Clark wrote:
>
> >On Fri, Mar 03, 2006 at 11:27:05AM -0800, Allison Randal wrote:
> >
> >Should the network opcodes even be loaded as standard? C<socket> et al
> >aren't
> >actually that useful on Perl 5 without all the constants in the Socket
> >module,
> >so in practical terms a redesigned Perl 5 would do better to remove
> >all the
> >socket specific keywords and make them functions exported by the Socket
> >module. Should parrot go the same way?
>
> A pasm include, such as the signal.pasm(even though signals don't work
> yet), would suffice and is generated at compile time. Parsing .h files

This way does the numeric values of the constants (at compile time) get
frozen into bytecode written out to disk? Or does the generated bytecode do
the lookup at run time on the run time platform?

Nicholas Clark

Joshua Isom

unread,

Mar 5, 2006, 4:52:41 PM3/5/06

to Nicholas Clark, Internals List

Compile time. Using load_bytecode would be runtime, but a pbc of one
of those includes doesn't include any of the defines so those would
have to be rewritten to do something else, such as subroutines.

> Nicholas Clark
>

Chromatic

unread,

Mar 5, 2006, 6:35:52 PM3/5/06

to perl6-i...@perl.org, Nicholas Clark, Allison Randal

On Sunday 05 March 2006 11:46, Nicholas Clark wrote:

> On Fri, Mar 03, 2006 at 11:27:05AM -0800, Allison Randal wrote:

> [It's worth considering making all the network I/O opcodes use a
> consistent way of marking errors. At the moment, all return an integer
> status code except for C<socket>, C<sockaddr>, and C<accept>.]
>
> IIRC the Linux kernel uses negative values as return codes, where these
> values are the negation of the value errno would hold. If we can use simple
> numeric values to cover all possibilities, this seems like a good approach,
> as it avoids thread local issues with errno, and issues such as whether
> return values are in errno, h_errno, or returned by the call itself
> (C API for system calls, hostname lookups, threading calls respectively)

Parrot, not being C, does have exceptions....

-- c

Leopold Toetsch

unread,

Mar 6, 2006, 7:08:55 AM3/6/06

to Allison Randal, Internals List

On Mar 3, 2006, at 20:27, Allison Randal wrote:

> We're going to try something a little different. With Chip's blessing
> I've written a very early draft of the PDD for I/O (not numbered yet).
> The attached PDD isn't a completed document with Chip's seal of
> approval, it's a seed for discussion.

Some remarks re the pdd and discussion so far.

o "write" ... [Is this redundant?]

"write" isn't needed. It is there, as some time ago, "print" was't able
to write strings with "\0"s inside.

o "readline" ... Lines are truncated at 64K.

This limitation is history already.

* opcode vs function / method

open P0, "data.txt", ">"
print P0, "sample data\n"

Using opcodes for all the IO has some disadvantages:
a) namespace pollution: all opcodes are reserved words in Parrot
b) opcodes aren't overridable, that is you can't provide your own
'print' opcode for e.g. debugging
c) all such IO opcodes have to verify that the given PMC is actually a
ParrotIO PMC.

E.g.

new P0, .Undef # or .Integer, ...
print P0, "foo"

I'm in favor of using methods for almost all IO functionality:

P0.'print'("sample data\n")

Combined with ...

* [ return code vs exception ]

... we can also check, if this was a void call or not (Parrot *does
have* the concept of a result context):

$I0 = pio.'print'("sample data\n") # return sucess (>=0) or
failure (<0)
pio.'print'("sample data\n") # throw exception on failure

* C<sockaddr> returns a string representing a socket address

[Nicholas] "I don't think that this is appropriate. It's IPv4
specific."

A more general SocketAddr PMC seems to be needed here.

* [ seek, tell ] and 64bit values

We want sooner or later extended Parrot register types. One of these
would be a C<Int64>) Parrot register. We currently have:

op tell(out INT, in PMC)
op tell(out INT, out INT, in PMC)

Depending on the arch (32 vs 64 bits) one of these opcodes is
suboptimal. With a new "L" (Long) register type the functionality could
be handled transparently:

$L0 = pio.'tell'()

The register allocator would map 'L0' either to a pair (I0, I1) on 32
bit arch or just to 'I0' on 64 bit arch.
Actually the type mapping bits in pdd03 got extended to cope with such
register types.

* [Nicholas] "Should the IO system provide symbolic lookup on AF_*

and PF_* constants.
IIRC at least one of these groups is OS dependant"

Any such constants that aren't the same on all architectures have to be
delt with at runtime, i.e. these constants can't be integers, because
integer constants are compiled into the bytecode in the flavor of the
compiling machine. That is: instead of

.include "xF_foo.pasm" # constants are resolved at compile time

we'd need something like:

load_bytecode "xF_foo.pasm" # postpone to runtime

We don't have a proper syntax for such (not so-) constants yet, but it
could just be:

pio = socket(".AF_UNIX", ...)

leo

Nicholas Clark

unread,

Mar 6, 2006, 7:06:33 AM3/6/06

to Allison Randal, Internals List

On Fri, Mar 03, 2006 at 11:27:05AM -0800, Allison Randal wrote:

=head2 Network I/O Opcodes

Functionality wise, the following are missing:

shutdown

getpeername/getsockname

getsockopt/setsockopt

I'd view shutdown as most important, as I believe that there are some
protocols you can't implement without it.

C<socketpair> isn't listed, but I'd assume that that is more a class method
called on the class representing Unix Domain sockets.

Would it work to have classes representing each address format, each
providing a packsockaddr method?

There's no direct access to fnctl or ioctl given. Specifically it would be
useful to have a way to set handles non-blocking (and have the entire IO
system cope with synchronous-but-non-blocking IO, even if async IO is more
powerful still)

Nicholas Clark

Chris Dolan

unread,

Mar 6, 2006, 9:38:36 AM3/6/06

to Leopold Toetsch, Allison Randal, Internals List

On Mar 6, 2006, at 6:08 AM, Leopold Toetsch wrote:

> Using opcodes for all the IO has some disadvantages:
> a) namespace pollution: all opcodes are reserved words in Parrot
> b) opcodes aren't overridable, that is you can't provide your own
> 'print' opcode for e.g. debugging
> c) all such IO opcodes have to verify that the given PMC is
> actually a ParrotIO PMC.
>
> E.g.
>
> new P0, .Undef # or .Integer, ...
> print P0, "foo"
>
> I'm in favor of using methods for almost all IO functionality:
>
> P0.'print'("sample data\n")

I agree. Additionally, I speculate that the latter would make it
easier to write the Parrot equivalent of Safe.pm's sandbox later. It
would likely be easier to disable a PMC class than a diverse
collection of opcodes.

Chris
--
Chris Dolan, Software Developer, http://www.chrisdolan.net/
Public key: http://www.chrisdolan.net/public.key
vCard: http://www.chrisdolan.net/ChrisDolan.vcf

Allison Randal

unread,

Mar 6, 2006, 4:13:34 PM3/6/06

to chromatic, perl6-i...@perl.org, Nicholas Clark

That was my thought. It partly depends on how people will be using
it. For asynchronous I/O, exceptions actually make more sense than
integer error codes.

It would be a pain to maintain two separate sets of opcodes, one with
integer error codes and one with exceptions. So, to figure out if
it's feasible to switch entirely over to exceptions, we need a
"story" for building the system based on exceptions and emulating
integer error codes for the HLLs that use them. Perhaps the HLL
implementation can trap certain exceptions, retrieve the error code
from the exception itself and return that. Or, if that's too much to
put on the shoulders of compiler implementers, perhaps we provide a
layer of opcodes that trap the I/O exceptions and return integer
error codes instead.

Allison

Jonathan Worthington

unread,

Mar 6, 2006, 5:16:23 PM3/6/06

to Leopold Toetsch, Allison Randal, Internals List

"Leopold Toetsch" <l...@toetsch.at> wrote:
> * opcode vs function / method
>
> open P0, "data.txt", ">"
> print P0, "sample data\n"
>
> Using opcodes for all the IO has some disadvantages:
> a) namespace pollution: all opcodes are reserved words in Parrot
> b) opcodes aren't overridable, that is you can't provide your own 'print'
> opcode for e.g. debugging
> c) all such IO opcodes have to verify that the given PMC is actually a
> ParrotIO PMC.
>

All good points; just to chalk up another opinion, I'd very much rather see
I/O stuff working as PMCs rather than as opcodes.

> Combined with ...
>
> * [ return code vs exception ]
>
> ... we can also check, if this was a void call or not (Parrot *does have*
> the concept of a result context):
>
> $I0 = pio.'print'("sample data\n") # return sucess (>=0) or failure
> (<0)
> pio.'print'("sample data\n") # throw exception on failure
>

Could perhaps get fun for compilers though - what if the program just throws
away the return value? So some optimizer that doesn't know this subtlety
sees this code and throws away the unused return value or just never emits
the assignment, and the behaviour changes. I'm not sure I like this idea.

>
> * [ seek, tell ] and 64bit values
>
> We want sooner or later extended Parrot register types. One of these would
> be a C<Int64>) Parrot register. We currently have:
>
> op tell(out INT, in PMC)
> op tell(out INT, out INT, in PMC)
>
> Depending on the arch (32 vs 64 bits) one of these opcodes is suboptimal.
> With a new "L" (Long) register type the functionality could be handled
> transparently:
>
> $L0 = pio.'tell'()
>

Yes, but as you add more register types you get a combinatorial blow-up on
various opcodes. My understanding was that "I" registers were native
integers so you could get good performance, and you used a PMC if you wanted
some guarantees about what size you were talking about.

> The register allocator would map 'L0' either to a pair (I0, I1) on 32 bit
> arch or just to 'I0' on 64 bit arch.

Yes, but surely it becomes somewhat more than just a mapping problem? For
example, what do we do about:

add L0, L1, L2
mul L0, L1, L2
...

Jonathan

Leopold Toetsch

unread,

Mar 6, 2006, 6:11:13 PM3/6/06

to Jonathan Worthington, Internals List, Allison Randal

On Mar 6, 2006, at 23:16, Jonathan Worthington wrote:

> "Leopold Toetsch" <l...@toetsch.at> wrote:

>> $I0 = pio.'print'("sample data\n") # return sucess (>=0) or
>> failure (<0)
>> pio.'print'("sample data\n") # throw exception on failure
>>
> Could perhaps get fun for compilers though - what if the program just
> throws away the return value? So some optimizer that doesn't know this
> subtlety sees this code and throws away the unused return value or
> just never emits the assignment, and the behaviour changes. I'm not
> sure I like this idea.

Good point. Then I prefer exceptions + return value (which may be
ignored eventually - depending on strictness of retval/result count
error checking).

>> Depending on the arch (32 vs 64 bits) one of these opcodes is
>> suboptimal. With a new "L" (Long) register type the functionality
>> could be handled transparently:
>>
>> $L0 = pio.'tell'()
>>

> Yes, but as you add more register types you get a combinatorial
> blow-up on various opcodes.

This depends on the implementation of 'opcodes'. With the current
scheme any such extension isn't really implementable because of the
combinatorial opcode explosion. I've written a (still internal)
proposal that would prevent the combinatorial issue. It (or some
similar thing) would be indeed necessary to even think about more
register types like int8, int16, int32/int64 or float32.

> My understanding was that "I" registers were native integers so you
> could get good performance, and you used a PMC if you wanted some
> guarantees about what size you were talking about.

In the long run, we certainly don't want to use PMCs for e.g.
immplementing bytes (int8) or some such for performance reasons. 'long
long' aka int64 is usually supported by compilers and has for sure by
far better performanche than a BigInt PMC.
Actually using 'int8' or 'float32' is usually only important, if you
have huge arrays of these. That means that there's a very limited need
for opcodes using these types, just some basic math mainly and array
fetch/store. Or in other words: what is supported by the hardware CPU.

>
>> The register allocator would map 'L0' either to a pair (I0, I1) on 32
>> bit arch or just to 'I0' on 64 bit arch.
> Yes, but surely it becomes somewhat more than just a mapping problem?
> For example, what do we do about:
>
> add L0, L1, L2
> mul L0, L1, L2

I don't see any problem with above code.

The register mapping rules would be something like:
- Lx occupies registers I(2x, 2x+1) - this is compile time,
that is 'L1' prevents 'I2' and 'I3' from being assigned by the
register allocator
- the runtime mapping isn't portable due to endianess and sizeof types
'L1' might be 'I1' on 64-bit arch or (I2,I3) or (I3,I2) on 32-bit arch
- if you write PASM, overlapping Ix/Ly may cause warnings or errors,
but could be used in a non-portable way, if you know what you are doing
on a specific platform.

> Jonathan

leo

Allison Randal

unread,

Mar 6, 2006, 6:31:17 PM3/6/06

to Leopold Toetsch, Internals List

On Mar 6, 2006, at 4:08, Leopold Toetsch wrote:
>
> Some remarks re the pdd and discussion so far.
>
> o "write" ... [Is this redundant?]
>
> "write" isn't needed. It is there, as some time ago, "print" was't
> able to write strings with "\0"s inside.

Now marked in the PDD as deprecated.

(To make it easier to track changes, I added the PDD to the parrot
repository under docs/pdds/clip/, where the other non-finished PDDs
live.)

> o "readline" ... Lines are truncated at 64K.
>
> This limitation is history already.

I've updated src/ops/io.ops with this information.

> * opcode vs function / method
>
> open P0, "data.txt", ">"
> print P0, "sample data\n"
>
> Using opcodes for all the IO has some disadvantages:
> a) namespace pollution: all opcodes are reserved words in Parrot
> b) opcodes aren't overridable, that is you can't provide your own
> 'print' opcode for e.g. debugging
> c) all such IO opcodes have to verify that the given PMC is
> actually a ParrotIO PMC.
>
> E.g.
>
> new P0, .Undef # or .Integer, ...
> print P0, "foo"
>
> I'm in favor of using methods for almost all IO functionality:
>
> P0.'print'("sample data\n")

Generally, I'm in favor of opcodes for simple, common operations, and
falling back to methods for more complex capabilities. It's overkill
to require people to write:

P0 = getstdout
P0.'print'(S1)

everywhere they would currently write:

print S1

One place I'm in favor of eliminating the opcode in favor of a method
call is the C<pioctl> opcode. I would also seriously consider it for
the socket opcodes, but we need to kick that one around a bit to see
if it really would be an improvement.

> Combined with ...
>
> * [ return code vs exception ]
>
> ... we can also check, if this was a void call or not (Parrot *does
> have* the concept of a result context):
>
> $I0 = pio.'print'("sample data\n") # return sucess (>=0) or
> failure (<0)
> pio.'print'("sample data\n") # throw exception on failure

That solution is problematic in the "unintended consequences"
department. Just because someone doesn't capture the return value
doesn't necessarily mean they want exceptions turned on for failures.
We would at least need to provide a flag for selecting exceptions vs
integer return codes. But, even that is probably too minimal.

A side note: many of these opcodes also have another return value in
addition to the integer error code, which makes the integer error
code interface clumsy.

> * C<sockaddr> returns a string representing a socket address
> [Nicholas] "I don't think that this is appropriate. It's IPv4
> specific."
>
> A more general SocketAddr PMC seems to be needed here.

Possibly. A smarter Parrot equivalent of a standard sockaddr structure:

http://www.awprofessional.com/articles/article.asp?
p=101172&seqNum=5&rl=1

Will it be used anywhere other than a call to C<bind>? If not,
there's probably a simpler way to handle it.

> * [ seek, tell ] and 64bit values
>
> We want sooner or later extended Parrot register types. One of
> these would be a C<Int64>) Parrot register. We currently have:
>
> op tell(out INT, in PMC)
> op tell(out INT, out INT, in PMC)
>
> Depending on the arch (32 vs 64 bits) one of these opcodes is
> suboptimal. With a new "L" (Long) register type the functionality
> could be handled transparently:
>
> $L0 = pio.'tell'()
>
> The register allocator would map 'L0' either to a pair (I0, I1) on
> 32 bit arch or just to 'I0' on 64 bit arch.
> Actually the type mapping bits in pdd03 got extended to cope with
> such register types.

For now I'll just note that the 2 integer forms for 64-bit offsets
may be deprecated in the future.

> * [Nicholas] "Should the IO system provide symbolic lookup on
> AF_* and PF_* constants.
> IIRC at least one of these groups is OS dependant"
>
> Any such constants that aren't the same on all architectures have
> to be delt with at runtime, i.e. these constants can't be integers,
> because integer constants are compiled into the bytecode in the
> flavor of the compiling machine. That is: instead of
>
> .include "xF_foo.pasm" # constants are resolved at compile time
>
> we'd need something like:
>
> load_bytecode "xF_foo.pasm" # postpone to runtime
>
> We don't have a proper syntax for such (not so-) constants yet, but
> it could just be:
>
> pio = socket(".AF_UNIX", ...)

It seems like a more general problem than that. Like, you want a way
of flagging a constant when you define it as to whether it should be
substituted when compiling to bytecode or substituted when
interpreting the bytecode. (And a way of storing delayed constant
substitutions in the bytecode.)

Allison

Joshua Isom

unread,

Mar 6, 2006, 7:03:27 PM3/6/06

to Internals List

Personally I'm a little big indifferent towards the difference with
methods and opcodes for most IO. But since each opcode added is really
added at least four times(slow core, switch, cg, and cgp, plus jit if
it's written), too many opcodes would increase size quickly. But, the
basic print opcodes, easy capacity to work with stdin, stdout, and
stderr, should remain with opcodes. Maybe have the io opcodes deal
only with the std's, and pmc's and methods for the rest. If you take
the "opcodes for easy things, methods for complex", stdout is easy,
files are complex.

>> Combined with ...
>>
>> * [ return code vs exception ]
>>
>> ... we can also check, if this was a void call or not (Parrot *does
>> have* the concept of a result context):
>>
>> $I0 = pio.'print'("sample data\n") # return sucess (>=0) or
>> failure (<0)
>> pio.'print'("sample data\n") # throw exception on failure
>
> That solution is problematic in the "unintended consequences"
> department. Just because someone doesn't capture the return value
> doesn't necessarily mean they want exceptions turned on for failures.
> We would at least need to provide a flag for selecting exceptions vs
> integer return codes. But, even that is probably too minimal.
>
> A side note: many of these opcodes also have another return value in
> addition to the integer error code, which makes the integer error code
> interface clumsy.

Any compiler targetting parrot should know parrot's behavior, so if the
HLL program doesn't want the return value but the way the compiler
handles it wants it, the compiler would add it in. Every pmc has eight
private flags it can use, and currently ParrotIO uses none of them. So
the "throw exception or ignore" can be set per pmc if the method
approach is taken.

> [...]

>> * [Nicholas] "Should the IO system provide symbolic lookup on
>> AF_* and PF_* constants.
>> IIRC at least one of these groups is OS dependant"
>>
>> Any such constants that aren't the same on all architectures have to
>> be delt with at runtime, i.e. these constants can't be integers,
>> because integer constants are compiled into the bytecode in the
>> flavor of the compiling machine. That is: instead of
>>
>> .include "xF_foo.pasm" # constants are resolved at compile time
>>
>> we'd need something like:
>>
>> load_bytecode "xF_foo.pasm" # postpone to runtime
>>
>> We don't have a proper syntax for such (not so-) constants yet, but
>> it could just be:
>>
>> pio = socket(".AF_UNIX", ...)
>
> It seems like a more general problem than that. Like, you want a way
> of flagging a constant when you define it as to whether it should be
> substituted when compiling to bytecode or substituted when
> interpreting the bytecode. (And a way of storing delayed constant
> substitutions in the bytecode.)

An alternative, although cumbersome, would be to add a new opcode
similar to sysinfo or interpinfo, where the IO constants would keep the
same '.include "foo.pasm"' method, but would be used like '$I0 =
ioconst .AF_WHATEVER' to get the native constant. Then it would mainly
be a matter of having compiler writers knowing this, and still
reminding people not to use hard coded constants.

> Allison
>

Leopold Toetsch

unread,

Mar 6, 2006, 7:42:14 PM3/6/06

to Allison Randal, Internals List

On Mar 7, 2006, at 0:31, Allison Randal wrote:

> It's overkill to require people to write:
>
> P0 = getstdout
> P0.'print'(S1)

Yep. The more that printing to stdout is heavily used in the test
suite. OTOH opcode vs method is merely a matter of what the assembler
is creating. That is: there are 2 notions of an opcode: a user visiable
and usable opcode and an internal opcode as actually implemented in
F<src/ops/core_ops.c>.

E.g.

add P0, P1, P2

is a totally valid user opcode. But there isn't such a thing inisde
Parrot runcores, it translates to the C<infix> opcode. Or 'add I0, 2,
3' is a user-visible/usable opcode only.

Or IO-related:

$I0 = say "ok 1"
say "ok 3"

.local pmc io
io = getstdout
"say"(io, "ok 5")
$I0 = io.'say'('ok 8')

and more variants are valid (albeit experimental) PIR syntax (s.
t/pmc/builtin.t). With the help of a list of 'builtins' the assembler
is creating equivalent (maybe class) method calls from the bare
opcodes.

The more interesting thing is, as already mentioned in a previous f'up:
opcode namespace pollution and overridability of opcodes.

leo

Allison Randal

unread,

Mar 6, 2006, 7:41:35 PM3/6/06

to Joshua Isom, Nicholas Clark, Internals List

Yet another option: make a generic opcode (or internal Parrot routine
that opcode implementations can call) to retrieve the value of a
constant from a string constant name at runtime.

Though, I'm not sure making our whole I/O subsystem depend on integer
constants that vary from one OS to the next is a good idea anyway.
It's probably more sensible to have a standard representation
internal to Parrot (whether integers or strings or attributes of the
ParrotIO object), and compile a translation layer to retrieve the OS-
specific integer value where it's needed.

Allison

Leopold Toetsch

unread,

Mar 6, 2006, 7:58:09 PM3/6/06

to Allison Randal, Internals List

On Mar 7, 2006, at 0:31, Allison Randal wrote:

>> pio = socket(".AF_UNIX", ...)
>
> It seems like a more general problem than that. Like, you want a way
> of flagging a constant when you define it as to whether it should be
> substituted when compiling to bytecode or substituted when
> interpreting the bytecode. (And a way of storing delayed constant
> substitutions in the bytecode.)

Exactly. The same problem is arising with all constants created at
compile time in some certain order, which should be valid and the same
at runtime too. The most prominent example is PMC type numbers:

new P0, .TclInt

or such depends on the loading order of dynamic PMCs (if there are more
than one) [1]. The general defered lookup is just using a string:

new P0, "TclInt"

Runcores that are recompiling opcodes are doing the lookup just once
per PBC location and replace it with the integer constant variant
on-the-fly. Therefore it's not a real speed penalty.

[1] actually PMC types are handled within the HLL_info structure now,
but the general problem is still existing.

> Allison

leo

Bob Rogers

unread,

Mar 6, 2006, 8:39:28 PM3/6/06

to Leopold Toetsch, Jonathan Worthington, Internals List, Allison Randal

From: Leopold Toetsch <l...@toetsch.at>
Date: Tue, 7 Mar 2006 00:11:13 +0100

On Mar 6, 2006, at 23:16, Jonathan Worthington wrote:

> "Leopold Toetsch" <l...@toetsch.at> wrote:

>> $I0 = pio.'print'("sample data\n") # return sucess (>=0) or
>> failure (<0)
>> pio.'print'("sample data\n") # throw exception on failure
>>
> Could perhaps get fun for compilers though - what if the program just
> throws away the return value? So some optimizer that doesn't know this
> subtlety sees this code and throws away the unused return value or
> just never emits the assignment, and the behaviour changes. I'm not
> sure I like this idea.

Good point. Then I prefer exceptions + return value (which may be
ignored eventually - depending on strictness of retval/result count
error checking).

Seems to me this won't be a problem if the ideas discussed in the "Param
count checks" thread (5-7 Feb, <http://xrl.us/jwt2>) are implemented.
There we discussed having one syntax that means "I expect exactly zero
returns" and another that means "I expect any number of returns but
don't plan to use it/them." This context information should also be
testable by the I/O implementation, and no sane Parrot optimizer would
dare throw it away. True?

-- Bob Rogers
http://rgrjr.dyndns.org/

Jonathan Worthington

unread,

Mar 7, 2006, 5:44:47 PM3/7/06

to Leopold Toetsch, Internals List

"Leopold Toetsch" <l...@toetsch.at> wrote:
>>> Depending on the arch (32 vs 64 bits) one of these opcodes is
>>> suboptimal. With a new "L" (Long) register type the functionality could
>>> be handled transparently:
>>>
>>> $L0 = pio.'tell'()
>>>
>
>> Yes, but as you add more register types you get a combinatorial blow-up
>> on various opcodes.
>
> This depends on the implementation of 'opcodes'. With the current scheme
> any such extension isn't really implementable because of the combinatorial
> opcode explosion. I've written a (still internal) proposal that would
> prevent the combinatorial issue. It (or some similar thing) would be
> indeed necessary to even think about more register types like int8, int16,
> int32/int64 or float32.
>

Iff we want those register types. .NET is interesting in that it recognises
int8, int16 etc as fundemental types, but on the stack only recognizes
int32, int64 and native int (out of the integer types anyway). If you want
to have an int8 then you just do the ops in an int32 and then use a conv.i1
instruction. I think the wisdom here is, where do we actually really need
to support int8. And it's in arrays that it matters most for example, for
size reasons.

>> My understanding was that "I" registers were native integers so you
>> could get good performance, and you used a PMC if you wanted some
>> guarantees about what size you were talking about.
>
> In the long run, we certainly don't want to use PMCs for e.g.
> immplementing bytes (int8) or some such for performance reasons. 'long
> long' aka int64 is usually supported by compilers and has for sure by far
> better performanche than a BigInt PMC.
> Actually using 'int8' or 'float32' is usually only important, if you have
> huge arrays of these. That means that there's a very limited need for
> opcodes using these types, just some basic math mainly and array
> fetch/store. Or in other words: what is supported by the hardware CPU.
>

If it's just arrays, then we can provide a (Fixed|Resizable)ByteArray PMC,
etc. I don't think we need any instructions to specially handle doing 8-bit
arithmetic. Maybe we want something to truncate a 32-bit to an 8-bit etc,
maybe throwing an exception on overflow.

>>> The register allocator would map 'L0' either to a pair (I0, I1) on 32
>>> bit arch or just to 'I0' on 64 bit arch.
>> Yes, but surely it becomes somewhat more than just a mapping problem?
>> For example, what do we do about:
>>
>> add L0, L1, L2
>> mul L0, L1, L2
>
> I don't see any problem with above code.
>
> The register mapping rules would be something like:
> - Lx occupies registers I(2x, 2x+1) - this is compile time,
> that is 'L1' prevents 'I2' and 'I3' from being assigned by the register
> allocator
> - the runtime mapping isn't portable due to endianess and sizeof types
> 'L1' might be 'I1' on 64-bit arch or (I2,I3) or (I3,I2) on 32-bit arch

Yup, and I really, really don't like the idea of making our bytecode format
non-portable. Part of the point of having a VM is portability, right?

> - if you write PASM, overlapping Ix/Ly may cause warnings or errors, but
> could be used in a non-portable way, if you know what you are doing on a
> specific platform.
>

You still didn't address my question with these points, though.

mul L0, L1, L2

Isn't just a case of churning out something like:-

mul I0, I2, I4
mul I1, I3, I5

So it's not just so simple as a "map 1 L to 2 Is" problem.

Jonathan

Leopold Toetsch

unread,

Mar 7, 2006, 6:54:00 PM3/7/06

to Jonathan Worthington, Internals List

On Mar 7, 2006, at 23:44, Jonathan Worthington wrote:

>
>> - if you write PASM, overlapping Ix/Ly may cause warnings or errors,
>> but could be used in a non-portable way, if you know what you are
>> doing on a specific platform.
>>
> You still didn't address my question with these points, though.
>
> mul L0, L1, L2
>
> Isn't just a case of churning out something like:-
>
> mul I0, I2, I4
> mul I1, I3, I5
>
> So it's not just so simple as a "map 1 L to 2 Is" problem.

Well, there wasn't a question to me ;)

mul L0, L1, L2

is of course a distinct opcode that does int64 arithmetic.

leo

Leopold Toetsch

unread,

Mar 7, 2006, 6:57:40 PM3/7/06

to Jonathan Worthington, Internals List

On Mar 7, 2006, at 23:44, Jonathan Worthington wrote:

>>
>> The register mapping rules would be something like:
>> - Lx occupies registers I(2x, 2x+1) - this is compile time,
>> that is 'L1' prevents 'I2' and 'I3' from being assigned by the
>> register allocator
>> - the runtime mapping isn't portable due to endianess and sizeof types
>> 'L1' might be 'I1' on 64-bit arch or (I2,I3) or (I3,I2) on 32-bit
>> arch
> Yup, and I really, really don't like the idea of making our bytecode
> format non-portable. Part of the point of having a VM is portability,
> right?

The described mapping doesn't have any PBC portability issues AFAIK. If
'L' is mapping to 'I' or not is chosen at runtime.

leo

Jonathan Worthington

unread,

Mar 8, 2006, 4:55:37 PM3/8/06

to Leopold Toetsch, Internals List

"Leopold Toetsch" <l...@toetsch.at> wrote:
>> Yup, and I really, really don't like the idea of making our bytecode
>> format non-portable. Part of the point of having a VM is portability,
>> right?
>
> The described mapping doesn't have any PBC portability issues AFAIK. If
> 'L' is mapping to 'I' or not is chosen at runtime.
>

Wouldn't the required re-writing blow away the wins we get through mmap'ing
in bytecode files? Not that JIT doesn't hurt a little there anyway, but at
least you can choose not to do that if you care about memory usage more than
speed...

Jonathan

Leopold Toetsch

unread,

Mar 8, 2006, 5:19:43 PM3/8/06

to Jonathan Worthington, Internals List

On Mar 8, 2006, at 22:55, Jonathan Worthington wrote:

>> The described mapping doesn't have any PBC portability issues AFAIK.
>> If 'L' is mapping to 'I' or not is chosen at runtime.
>>
> Wouldn't the required re-writing blow away the wins we get through
> mmap'ing in bytecode files?

There isn't any rewriting required. E.g. "I0" is represented as a plain
int '0' in the PBC, that's absolutely the same as, how 'S0', 'N0', or
'P0' are represented. A proper location in the register storage is
selected at runtime according to the register type and the register
number. The same would be true for a new register type 'L'.

> Jonathan

leo

Nicholas Clark

unread,

Mar 11, 2006, 5:07:10 AM3/11/06

to Leopold Toetsch, Allison Randal, Internals List

On Mon, Mar 06, 2006 at 01:08:55PM +0100, Leopold Toetsch wrote:

> * opcode vs function / method
>
> open P0, "data.txt", ">"
> print P0, "sample data\n"
>
> Using opcodes for all the IO has some disadvantages:
> a) namespace pollution: all opcodes are reserved words in Parrot
> b) opcodes aren't overridable, that is you can't provide your own
> 'print' opcode for e.g. debugging
> c) all such IO opcodes have to verify that the given PMC is actually a
> ParrotIO PMC.
>
> E.g.
>
> new P0, .Undef # or .Integer, ...
> print P0, "foo"
>
> I'm in favor of using methods for almost all IO functionality:
>
> P0.'print'("sample data\n")

I feel more comfortable with the idea of IO being methods on PMCs than raw
OPs. Not totally sure why. It's at least partly the ease of having "safe"
compartments that ban IO simply by preventing the creation of those PMCs.

However, I think that the biggest thing is seeing how the Perl 5 IO works,
where all the IO ops have to redispatch if the "file handle" turns out to be
tied, and in turn the PerlIO system has a whole second level of redispatch
at the C level of "overloading the file handle's behaviour". Oh, and then
there's PerlIO::via which hooks that C level back up to Perl methods.

So a single level of method calls feels much cleaner, as that means only 1
level of dispatch, however the program is re-implementing IO, be it writes
to tied scalars, character set conversion, or merely a direct IO with the
most native operating system interfaces.

> Combined with ...
>
> * [ return code vs exception ]
>
> ... we can also check, if this was a void call or not (Parrot *does
> have* the concept of a result context):
>
> $I0 = pio.'print'("sample data\n") # return sucess (>=0) or
> failure (<0)
> pio.'print'("sample data\n") # throw exception on failure

Oooh. Useful.

> * C<sockaddr> returns a string representing a socket address
> [Nicholas] "I don't think that this is appropriate. It's IPv4
> specific."
>
> A more general SocketAddr PMC seems to be needed here.

I think someone else said it too, but more a SocketAddr PMC hierarchy?
At least, a PMC class for each distinct way of describing addresses, that all
fulfil a SocketAddr role.

Nicholas Clark

unread,

Mar 14, 2006, 5:13:04 PM3/14/06

to Allison Randal, Leopold Toetsch, Internals List

On Mon, Mar 06, 2006 at 03:31:17PM -0800, Allison Randal wrote:
> On Mar 6, 2006, at 4:08, Leopold Toetsch wrote:

> > * C<sockaddr> returns a string representing a socket address
> > [Nicholas] "I don't think that this is appropriate. It's IPv4
> >specific."
> >
> >A more general SocketAddr PMC seems to be needed here.
>
> Possibly. A smarter Parrot equivalent of a standard sockaddr structure:
>
> http://www.awprofessional.com/articles/article.asp?
> p=101172&seqNum=5&rl=1
>
> Will it be used anywhere other than a call to C<bind>? If not,
> there's probably a simpler way to handle it.

From memory, socket addresses show up as results from C<accept>,
C<getsockname>, C<getpeername> and C<recvfrom>/C<recvmsg>, and are also
used as arguments to C<sendto>/C<sendmsg>

Nicholas Clark

unread,

Mar 14, 2006, 5:16:08 PM3/14/06

to Allison Randal, Internals List

On Sun, Mar 05, 2006 at 07:11:59PM +0000, Nicholas Clark wrote:

> =item *
>
> C<stat> retrieves information about a file on the filesystem. It takes a
> string filename or an integer argument of a UNIX file descriptor, and an
> integer flag for the type of information requested. It returns an
> integer containing the requested information. The following constants
> are defined for the type of information requested (see
> F<runtime/parrot/include/stat.pasm>):
>
> To me it seems that stat should also be able to take a PMC representing an
> open parrot file handle. I assume that systems exist where we'll be
> layering Parrot IO onto underlying OS IO, where the OS uses tokens other
> than integers for its files. For example pointers, if miniparrot is built
> by layering Parrot IO onto C's stdio.

Actually, I was wondering if that three way distinction is appropriate to
quite a few of the IO operations - string is filename, PMC is opaque
filehandle object, integer is UNIX file descriptor (and therefore not
portable to all platforms)

Nicholas Clark

Allison Randal

unread,

Mar 17, 2006, 6:18:32 PM3/17/06

to Nicholas Clark, Internals List

On Mar 11, 2006, at 2:07, Nicholas Clark wrote:
>
> I feel more comfortable with the idea of IO being methods on PMCs
> than raw
> OPs.

They are methods on I/O objects internally. (Just as most opcodes on
PMCs actually call vtable methods.) So, the question isn't as
significant as it appears. It's really just "Do we provide a simple
opcode interface for the most common operations on I/O objects?"
Which is an easy "yes".

> However, I think that the biggest thing is seeing how the Perl 5 IO
> works,
> where all the IO ops have to redispatch if the "file handle" turns
> out to be
> tied, and in turn the PerlIO system has a whole second level of
> redispatch
> at the C level of "overloading the file handle's behaviour". Oh,
> and then
> there's PerlIO::via which hooks that C level back up to Perl methods.

That's a Perl 5 problem that Parrot handles with vtables. It's not
specific to I/O.

> So a single level of method calls feels much cleaner, as that means
> only 1
> level of dispatch, however the program is re-implementing IO, be it
> writes
> to tied scalars, character set conversion, or merely a direct IO
> with the
> most native operating system interfaces.

It's not quite that simple, as you still have the I/O layers.

Allison

Allison Randal

unread,

Mar 17, 2006, 8:40:29 PM3/17/06

to Joshua Isom, Internals List

On Mar 3, 2006, at 15:54, Joshua Isom wrote:
>
> Concerning all the byte/character issues, all the string opcodes
> except bytelength work with characters. But the io subsystem
> currently only deals with bytes. I know there is the speed issue
> for things like reading with dealing with utf8, but something like
> 'peek' should probably be able to get the full character. Also,
> what's supposed to be the default encoding for all data read in
> from a stream where a layer's not added that explicitly states so?
> Ascii, binary?

These we'll put off until we get to the Character Set PDD (as yet
unnumbered).

> I don't think it's really been addressed, at least not recently,
> but what about IPv6? By the time perl6 becomes commonplace and
> used often(and thus, parrot), IPv6 will be common enough that
> problems could occur. Currently it's not speced or stated, aside
> from a comment in PIO_sockaddr_in.

Added.

> One more thing, what about specing directory handling? Nothing is
> speced yet for it.

Added.

Thanks!
Allison

Allison Randal

unread,

Mar 17, 2006, 9:40:32 PM3/17/06

to Nicholas Clark, Internals List

On Mar 6, 2006, at 4:06, Nicholas Clark wrote:

> On Fri, Mar 03, 2006 at 11:27:05AM -0800, Allison Randal wrote:
>
> =head2 Network I/O Opcodes
>
>
> Functionality wise, the following are missing:
>
> shutdown

Added.

> getpeername/getsockname
>
> getsockopt/setsockopt

These seem rare, and intimately associated with the I/O object
(they're just retrieving and setting attributes), enough to be left
as methods.

> C<socketpair> isn't listed, but I'd assume that that is more a
> class method
> called on the class representing Unix Domain sockets.

Makes sense as a method.

> Would it work to have classes representing each address format, each
> providing a packsockaddr method?

I think it makes more sense to have a single opcode that returns the
appropriate kind of address object depending on the format of the
arguments you pass it. (Though internally it will rely on whatever
the chosen class uses to set its address information.) Though, a
single class that's smart enough to store different formats of
addresses may be better.

> There's no direct access to fnctl or ioctl given. Specifically it
> would be
> useful to have a way to set handles non-blocking (and have the
> entire IO
> system cope with synchronous-but-non-blocking IO, even if async IO
> is more
> powerful still)

This is currently pioctl, but may be replaced by methods on the
ParrotIO PMCs.

Allison

Allison Randal

unread,

Mar 17, 2006, 11:05:16 PM3/17/06

to Internals List

I just committed a more complete draft of the I/O PDD to docs/pdds/
clip/pddXX_io.pod. I've integrated or responded to the mailing list
comments. I also added a good bit of additional discussion of
asynchronous operations and error handling. The original draft was
"how it works now", while this draft is "how it should work", so a
diff between the two provides an interesting comparison.

Comments and questions welcome as usual.

Allison

Leopold Toetsch

unread,

Mar 18, 2006, 7:18:27 AM3/18/06

to Allison Randal, Internals List

On Mar 18, 2006, at 5:05, Allison Randal wrote:

>
> Comments and questions welcome as usual.

+=head3 Hybrid solution
+
+Another option is to return a status object from each I/O operation.

I'm in favour of such a solution. There are several reasons:
- int status codes can't provide all the variety of information, we
might want to return
- having variants that return exceptions too, leads to code duplication
- read operations can directly return the result string PMC

+The disadvantage is that a status object involves more overhead than
a
+simple integer status code.

Well, IO goes usually through several layers aka function calls, and
isn't one of the fast operations a computer performs. And:
- int status codes (if checked) are usually just converted to PMCs by
HLL compilers
- if PMC creation becomes a bottleneck, then we should fix it and not
avoid or work around it

That said I can imagine to unify sync/async at the surface. E.g.

sync operation:

PResult = Pio."read"(n) # PResult = String/IOError object

async operation:

PAsync = new .IOAsync, Pcallback
PAsync."set_callback"(Pcb) # or method
...
Pio."async"(PAsync)
PStatus = Pio."read"(n)

Both the sync and async read bubble down the layers until one is found
that supports the operation either directly or by e.g. emulating async.
But due to the same function call signature, we can avoid duplication
of code paths.

> Allison

leo

Nicholas Clark

unread,

Mar 18, 2006, 11:04:03 AM3/18/06

to Allison Randal, Internals List

On Fri, Mar 17, 2006 at 06:40:32PM -0800, Allison Randal wrote:
> On Mar 6, 2006, at 4:06, Nicholas Clark wrote:
>
> >On Fri, Mar 03, 2006 at 11:27:05AM -0800, Allison Randal wrote:
> >
> > =head2 Network I/O Opcodes
> >
> >
> >Functionality wise, the following are missing:
> >
> > shutdown
>
> Added.
>
> > getpeername/getsockname
> >
> > getsockopt/setsockopt
>
> These seem rare, and intimately associated with the I/O object
> (they're just retrieving and setting attributes), enough to be left
> as methods.

I'd not realised how opcodes and methods were interchangeable. In which case,
I think that shutdown should also be only a method, as it is rarely needed.

> This is currently pioctl, but may be replaced by methods on the
> ParrotIO PMCs.

Is there a draft list of the IO PMC interface hierarchy yet? With the methods
provided at each level?

Nicholas Clark

Allison Randal

unread,

Mar 19, 2006, 2:01:34 AM3/19/06

to Leopold Toetsch, Internals List

On Mar 18, 2006, at 4:18, Leopold Toetsch wrote:
> On Mar 18, 2006, at 5:05, Allison Randal wrote:
>
> +=head3 Hybrid solution
> +
> +Another option is to return a status object from each I/O
> operation.
>
> I'm in favour of such a solution.

Also my favorite.

> +The disadvantage is that a status object involves more overhead
> than a
> +simple integer status code.
>
> Well, IO goes usually through several layers aka function calls,
> and isn't one of the fast operations a computer performs. And:
> - int status codes (if checked) are usually just converted to PMCs
> by HLL compilers
> - if PMC creation becomes a bottleneck, then we should fix it and
> not avoid or work around it

I had to list *some* kind of disadvantage, or it would sound too good
to be true. ;)

Seriously, though, it's important to consider all the advantages and
disadvantages, even if we decide the advantages outweigh the
disadvantages (as is the case here).

> That said I can imagine to unify sync/async at the surface. E.g.
>
> sync operation:
>
> PResult = Pio."read"(n) # PResult = String/IOError object
>
> async operation:
>
> PAsync = new .IOAsync, Pcallback
> PAsync."set_callback"(Pcb) # or method
> ...
> Pio."async"(PAsync)
> PStatus = Pio."read"(n)
>
> Both the sync and async read bubble down the layers until one is
> found that supports the operation either directly or by e.g.
> emulating async. But due to the same function call signature, we
> can avoid duplication of code paths.

That interface is unnecessarily complex. But more importantly, the
choice between async and sync is not set per filehandle, it's per
operation. It could be very common to combine the two, such as using
synchronous operations for opening and closing a filehandle, and only
using asynchronous operations for the heavy lifting of slurping in an
entire file.

But, yes, I agree with the principle of not maintaining two
completely separate implementations for synchronous and asynchronous
ops. The earlier design approached that by having the synchronous ops
be asynchronous internally, but my draft suggests this be handled by
having the asynchronous ops use the synchronous ops internally.

Allison

Uri Guttman

unread,

Mar 19, 2006, 3:23:09 AM3/19/06

to Allison Randal, Leopold Toetsch, Internals List

>>>>> "AR" == Allison Randal <all...@perl.org> writes:

AR> That interface is unnecessarily complex. But more importantly, the
AR> choice between async and sync is not set per filehandle, it's per
AR> operation. It could be very common to combine the two, such as
AR> using synchronous operations for opening and closing a filehandle,
AR> and only using asynchronous operations for the heavy lifting of
AR> slurping in an entire file.

if you look at the rfc's (remember those? :) i wrote on this topic, you
will see that i proposed just that form of api (not at OO as these). the
only difference between a sync and async i/o op was the addition of a
callback argument (and an optional timeout arg). in all cases you got a
return value with either status (sync) or event handle (async). you
could combine the returns into one object as mentioned in this thread.

AR> But, yes, I agree with the principle of not maintaining two
AR> completely separate implementations for synchronous and
AR> asynchronous ops. The earlier design approached that by having the
AR> synchronous ops be asynchronous internally, but my draft suggests
AR> this be handled by having the asynchronous ops use the synchronous
AR> ops internally.

what you do is make the internal sync op be implemented by the internal
async op. all that is needed is a wrapper around the async op with a
callback that saves the results where the sync call can get them. the
sync wrapper must somehow yield (block in a thread?) until the actual
async op is done. then it can continue with the sync op and return to
the caller. this is very easy with the ideas dan was doing a while
back.

the p6 level sync call is broken up into a couple (or more) parrot op
codes. the first sets up the i/o op and it is always async. it the p6
call was async then it returns to p6 and the usual callbacks will
work. if the p6 was sync then the next parrot op would be a wait on i/o
thing. it would block the thread until the i/o was completed. this could
be a wait for event or from the underlying async i/o system.

yes it is a little handwaving but the idea is that you can have a single
p6 i/o api which supports sync and async behavior and a single
implementation which only really does async and it does sync with
special i/o wait ops.

and this goes way back to RT-11 on the pdp-11. it had three forms of i/o
calls, READ (async), READC (read with completion routine - what they
named callbacks) and READW (sync read). they also had a WAIT op that was
handle specific. READW was probably implemented as a READ followed by a
WAIT.

so there is nothing new under the sun or under the hood of p6. just go
with the tried and true way to handle sync and async i/o.

thanx,

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Nicholas Clark

unread,

Mar 19, 2006, 2:05:18 PM3/19/06

to Allison Randal, Leopold Toetsch, Internals List

On Sat, Mar 18, 2006 at 11:01:34PM -0800, Allison Randal wrote:

> But, yes, I agree with the principle of not maintaining two
> completely separate implementations for synchronous and asynchronous
> ops. The earlier design approached that by having the synchronous ops
> be asynchronous internally, but my draft suggests this be handled by
> having the asynchronous ops use the synchronous ops internally.

Is the choice of implementation actually visible to a user of the API?
If "yes", where, and can we avoid it? At which point we have flexibility in
how things are actually implemented.

I can see that emulating asynchronous ops with synchronous ops and POSIX
threads is portable, whereas a platform specific AIO implementation with
synchronous ops implemented atop it is likely to offer higher performance.
If the API is strong enough, would it let us just write the first
implementation, and allow people to write platform specific optimisations
later?

Nicholas Clark

Larry Wall

unread,

Mar 20, 2006, 12:09:07 PM3/20/06

to Internals List

On Sat, Mar 18, 2006 at 11:01:34PM -0800, Allison Randal wrote:

: But, yes, I agree with the principle of not maintaining two

: completely separate implementations for synchronous and asynchronous
: ops. The earlier design approached that by having the synchronous ops
: be asynchronous internally, but my draft suggests this be handled by
: having the asynchronous ops use the synchronous ops internally.

Having written one of the latter during my tenure at JPL, I can
personally attest that writing async on top of sync is the pits when
you're also trying to emulate multithreading, since a "yield" means
you end up with arbitrarily deep recursion. Maybe the situation won't
arise here with proper stack-switching semantics, but I'm not sure
all the architectures you might want to port Parrot to will support
threads natively.

That being said, if you do port to such an limited architecture,
you might end up emulating sync on top of async on top of sync, so
the argument runs both ways. I used to think that progress would
eventually guarantee that all machines support "modern" semantics,
but PDAs kind of disproved that notion, and I think we will always
be inventing limited machines at the low end of the size scale.
It is unlikely that the first nano-bots will run Linux...

I don't really have much wisdom on all of this. But hey, that's
why I'm letting other people design the threading, so they can take
the blame. :-)

Larry

Allison Randal

unread,

Mar 20, 2006, 1:12:56 PM3/20/06

to Nicholas Clark, Internals List

On Mar 19, 2006, at 11:05, Nicholas Clark wrote:
>
> Is the choice of implementation actually visible to a user of the API?
> If "yes", where, and can we avoid it?

No.

> At which point we have flexibility in
> how things are actually implemented.

Yes.

> I can see that emulating asynchronous ops with synchronous ops and
> POSIX
> threads is portable, whereas a platform specific AIO implementation
> with
> synchronous ops implemented atop it is likely to offer higher
> performance.

The I/O layers implementation allows us to have multiple different
combinations. We could have sync ops + threads for one version, sync
ops on top of system AIO ops for another, sync ops as system sync ops
+ async ops as system AIO ops for another. (My guess is that using
the system sync ops for the synchronous versions will actually get
better performance than wrapping synchronous versions around system
AIO ops. Though, of course, the biggest hit in any I/O implementation
is the I/O itself.)

> If the API is strong enough, would it let us just write the first
> implementation, and allow people to write platform specific
> optimisations
> later?

Yes.

I would probably start with a sync + threads implementation, and a
layer for Linux that uses the system AIO ops for the async ops
(leaving the sync ops unchanged). That would give us portability and
a proof-of-concept on customization.

Allison

Allison Randal

unread,

Mar 20, 2006, 5:56:46 PM3/20/06

to Uri Guttman, Internals List

On Mar 19, 2006, at 0:23, Uri Guttman wrote:
>
> if you look at the rfc's (remember those? :) i wrote on this topic,
> you
> will see that i proposed just that form of api (not at OO as
> these). the
> only difference between a sync and async i/o op was the addition of a
> callback argument (and an optional timeout arg). in all cases you
> got a
> return value with either status (sync) or event handle (async). you
> could combine the returns into one object as mentioned in this thread.

Yup, it's an important contribution. And now we're entering into the
final section of that RFC, titled "Unknowns":

"Designing the internals for this will be tricky. The best way to
support all these needs is not clear."

I'll expand the section discussing Asynchronous and Synchrounous ops
with some of the alternatives. Though, I may do the first drafts of
events, threads, and exceptions PDDs first, as they're all closely
related.

Allison