This synopsis summarizes the non-existent Apocalypse 9, which
discussed in detail the design of Perl 6 data structures. It was
primarily a discussion of how the existing features of Perl 6 combine
to make it easier for the PDL folks to write numeric Perl.
=head1 Lazy lists
All list contexts are lazy by default. They still flatten eventually,
but only when forced to. You have to use unary C<**> to get a non-lazy
flattening list context (that is, to flatten immediately like Perl 5).
=head1 Sized types
Sized low-level types are named most generally by appending the number
of bits to a generic low-level type name:
int1
int2
int4
int8
int16
int32 (aka int on 32-bit machines)
int64 (aka int on 64-bit machines)
uint1 (aka bit)
uint2
uint4
uint8 (aka byte)
uint16
uint32
uint64
num32
num64 (aka num on most architectures)
num128
complex32
complex64 (aka complex on most architectures)
complex128
Complex sizes indicate the size of each C<num> component rather than
the total. This would extend to tensor typenames as well if they're
built-in types. Of course, the typical tensor structure is just
reflected in the dimensions of the array--but the principle still holds
that the name is based on the number of bits of the simple base type.
The unsized types C<int> and C<num> are based on the architecture's
normal size for C<int> and C<double> in whatever version of C the
run-time system (presumably Parrot) is compiled in. So C<int>
typically means C<int32> or C<int64>, while C<num> usually means
C<num64>, and C<complex> means two of whatever C<num> turns out to be.
You are, of course, free to use macros or type declarations to
associate additional names, such as "short" or "single". These are
not provided by default. An implementation of Perl is not required
to support 64-bit integer types or 128-bit floating-point types unless
the underlying architecture supports them.
And yes, an C<int1> can store only -1 or 0. I'm sure someone'll think of
a use for it...
XXX Alternately we could go with a byte count rather than a bit count.
But people seem to know whether they have a 32-bit or 64-bit processor.
If you ask whether they have a 4-byte or 8-byte processor, they have
to think about it a long time...
XXX Plus this opens the door for types like uint5. Arguably, these should
be declared as int[:range(0..31)] or some such, a là Ada.
=head1 Compact structs
A class whose attributes are all low-level types can behave as
a struct. (Access from outside the class is still only through
accessors, though.) Whether such a class is actually stored compactly
is up to the implementation, but it ought to behave that way,
at least to the extent that it's trivially easy (from the user's
perspective) to read and write to the equivalent C structure.
That is, when byte-stringified, it should look like the C struct,
even if that's not how it's actually represented inside the class.
(This is to be construed as a substitute for at least some of the
current uses of C<pack>/C<unpack>.)
=head1 Compact arrays
In declarations of the form:
my bit @bits;
my int @ints;
my num @nums;
my int4 @nybbles;
my str @buffers;
my ref[Array] @ragged2d;
my complex128 @longdoublecomplex;
the presence of a low-level type tells Perl that it is free to
implement the array with "compact storage", that is, with a chunk
of memory containing contiguous (or as contiguous as practical)
elements of the specified type without any fancy object boxing that
typically applies to undifferentiated scalars. (Perl tries really
hard to make these elements look like objects when you treat them
like objects--this is called autoboxing.)
The declarations above declare one-dimensional arrays of indeterminate
length. Such arrays are autoextending just like ordinary Perl
arrays (at the price of occasionally copying the block of data to
another memory location). For many purposes, though, it's useful to
define array types of a particular size and shape that, instead of
autoextending, throw an exception if you try to access outside their
declared dimensionality. Such arrays tend to be faster to allocate and
access as well.
A multidimensional array is indexed by a semicolon list, which is really
a list of lists in disguise. Each sublist is a slice of one particular
dimension. So
@array[0..10; 42; @x]
is really short for
@array.postcircumfix:[]( <== [0..10], [42], [@x] );
though in the list of lists form, a bare number is interpreted as if
it were a list of one element, so you can also say:
@array.postcircumfix:[]( <== [0..10], 42, [@x] );
Note that at the comma level, a list such as:
@array[@x,@y]
is always interpreted as a one-dimensional slice in the outermost
dimension, which is the same as:
@array[@x,@y;]
or more verbosely:
@array.postcircumfix:[]( <== [@x,@y] );
To interpolate an array at the semicolon level rather than the comma level,
use the C<semi> list operator:
@array[semi @x; @y]
which is equivalent to
@array.postcircumfix:[]( <== @x, [@y] );
Note the difference between that and
@array[semi @x, @y]
which is the same as
@array.postcircumfix:[]( <== @x, @y );
To declare a multidimensional array, you add a shape parameter:
my num @nums is shape(3); # one dimension, @nums[0..2]
my int @ints is shape(4;2); # two dimensions, @ints[0..3; 0..1]
The argument to a shape specification is a semicolon list, just like
the inside of a multidimensional subscript. Ranges are also allowed,
so you can pretend you're programming in Fortran, or awk:
my int @ints is shape(1..4;1..2); # two dimensions, @ints[1..4; 1..2]
You can pass a list for the shape as well:
my int @ints is shape(semi @foo.shape);
Again, the C<semi> list operator interpolates a list into a semicolon
list, which we do for consistency with subscript notation, not because
it makes a great deal of sense to allow slices for dimensional specs
(apart from ranges). So while the following is okay:
my int @ints is shape(0,1,2,3,4); # same as 0..4
the following is a semantic error that the compiler should catch:
my int @ints is shape(3,3,3); # oops, comma instead of semicolon
The shape may be supplied entirely by the object at run-time:
my num @nums = Array of num.new(:shape(3;3;3));
my num @nums .=new():shape(3;3;3); # same thing
Any dimension of the array may be specified as C<*>, in which case
that dimension will autoextend. Typically this would be used in the
final dimension to make a ragged array functionally equivalent to an
array of arrays:
my int @ints is shape(42; *);
push(@ints[41], getsomeints());
The shape may also be specified by types rather than sizes:
my int @ints is shape(Even; Odd);
or by both:
my int @ints is shape(0..100 where Even; 1..99 where Odd);
(presuming C<Even> and C<Odd> are types already constrained to be even or odd).
=head1 PDL support
An array C<@array> can be tied to a piddle at declaration time:
my num @array is Piddle is shape(semi @mytensorshape);
my @array is Piddle(:shape(2;2;2;2)) of int8;
Piddles are allowed to assume a type of C<num> by default rather than
the usual simple scalar. (And in general, the type info is merely
made available to the "tie" implementation to do with what it will.
Some data structures may ignore the "of" type and just store everything
as general scalars. Too bad...)
Arrays by default are one dimensional, but may be declared to have any
dimensionality supported by the implementation. You may use arrays
just like scalar references--the main caveat is that you have to use
binding rather than assignment to set one without copying:
@b := @a[0...:by(2)]
With piddles in particular, this might alias each of the individual
elements rather than the array as a whole. So modifications to @b
are likely to be reflected back into @a. (But maybe the PDLers will
prefer a different notation for that.)
The dimensionality of an array may be declared on the variable, but
the actual dimensionality of the array depends on how it was created.
Reconciling these views is a job for the particular array implementation.
It's not necessarily the case that the declared dimensionality must match
the actual dimensionality. It's quite possible that the array variable
is deliberately declared with a different dimensionality to provide a
different "view" on the actual value:
my int @array is Puddle is shape(2;2) .= new(:shape(4) <== 0,1,2,3);
Again, reconciling those ideas is up to the implementation, C<Puddle>
in this case. The traits system is flexible enough to pass any
metadata required, including ideas about sparseness, raggedness,
and various forms of non-rectangleness such as triangleness.
The implementation should probably carp about any metadata it doesn't
recognize though. The implementation is certainly free to reject
any object that doesn't conform to the variable's shape requirements.
=head1 Subscript and slice notation
A subscript indicates a "slice" of an array. Each dimension
of an array is sliced separately, so we say a subscript is a
semicolon-separated list of slice specifiers. A three-dimensional
slice might look like this:
@x[0..10; 1,0; 1...:by(2)]
It is up to the implementation of C<@x> to decide how aggressively
or lazily this subscript is evaluated, and whether the slice entails
copying. (The PDL folks will generally want it to merely produce a
virtual piddle where the new array aliases its values back into the
old one.)
Of course, a single element can be selected merely by providing a single
index value to each slice list:
@x[0;1;42]
=head1 The semicolon operator
At the statement level, a semicolon terminates the current expression.
Within any kind of bracketing construct, semicolon notionally
produces a list of lists, the interpretation of which depends on
the context. Such a semicolon list always provides list context to
each of its sublists. The following two constructs are structurally
indistinguishable:
(0..10; 1,2,4; 3)
([0..10], [1,2,3,4], [3])
Of course, in known contexts such as array subscripts, the compiler
is free to optimize away the actual construction of sublists where
that's unnecessary.
Single dimensional arrays expect simple slice subscripts, meaning
they will treat a list subscript as a slice in the single dimension of
the array. Multi-dimensional arrays, on the other hand, always expect
a list of slice lists, one for each dimension. You need not specify
all the dimensions; if you don't, the unspecified dimensions are
"wildcarded". Supposing you have:
my num @nums is shape(3;3;3);
Then
@nums[0..2]
is the same as
@nums[0..2;]
which is the same as
@nums[0,1,2;*;*]
But you should maybe write the last form anyway just for good
documentation, unless you don't actually know how many more dimensions
there are.
If you wanted that C<0..2> range to mean
@nums[0;1;2]
instead, then you need to use that C<semi> we keep mentioning:
@nums[semi 0..2]
The zero-dimensional slice:
@x[]
is assumed to want everything, not nothing. It's particularly handy
because Perl 6 (unlike Perl 5) won't interpolate a bare array without brackets:
@x = (1,2,3);
say "@x = @x[]"; # prints @x = 1 2 3
Lists are lazy in Perl 6, and the slice lists are no exception.
In particular, things like range objects are not flattened until they
need to be, if ever. So a PDL implementation is free to steal the
values from these ranges and "piddle" around with them:
@nums[$min..$max:by(3)]
@nums[$min..$max]
@nums[$min...:by(3)]
@nums[1...:by(2)] # the odds
@nums[0...:by(2)] # the evens
That's all just the standard Perl 6 notation for ranges. Additional
syntactic relief is always available as long as it's predeclared
somehow. It's possible the range operator could be taught that C<:2>
means C<:by(2)>, for instance. (But I rather dislike the RFC-proposed
C<0:10:2> notation that makes colon mean two different things so close
together, plus it conflicts with Perl 6's general adverb notation if
the next thing is alphabetic.)
XXX Another ugly possibility is to overload something looser than C<..>
(like C<+=>) to modify a range object:
0..10+=2
(always assuming that shouldn't just turn the range into C<2..12>.)
Another thing that's not going to fly easily is simply dropping out
terms. Perl depends rather heavily on knowing when it's expecting
a term or an operator, and simply leaving out terms before or after
a binary operator really screws that up. For instance,
0..:by(2)
parses as
0 .. (by => 2)
rather than
0 .. Inf :by(2)
That why we have postfix C<...> to mean C<..Inf>. But then if you
leave out the first argument:
...:by(2)
you've written the yada-yada-yada operator, which is actually a term
that will not produce an infinite range for you. Don't do that.
Maybe you should just find some nice Unicode characters for your operators...
=head1 PDL signatures
To rewrite a Perl 5 PDL definition like this:
pp_def(
'inner',
Pars => 'a(n); b(n); [o]c(); ', # the signature, see above
Code => 'double tmp = 0;
loop(n) %{ tmp += $a() * $b(); %}
$c() = tmp;' );
you might want to write a macro that parses something vaguely
resembling this:
role PDL_stuff[type $TYPE] {
PDLsub inner (@a[$n], @b[$n]) returns(@c[]) {
my ::{$TYPE} $tmp = 0;
for 0..^$n {
$tmp += @a[$_] * @b[$_];
}
@c[] = tmp;
}
}
where that turns into something like this:
role PDL_stuff[type $TYPE] {
multi sub inner (::{$TYPE} @a, ::{$TYPE} @b) returns(::{$TYPE}) {
my $n = +@a[*]; # or maybe $n is just a parameter
assert($n == +@b[*]); # and this is already checked by PDL
my ::{$TYPE} $tmp = 0;
for 0..^$n {
$tmp += @a[$_] * @b[$_];
}
return $tmp;
}
}
Then any class that C<does PDL_stuff[num]> has an C<inner()> function that
can (hopefully) be compiled down to a form useful to the PDL threading
engine. Presumably the macro also stores away the PDL signature
somewhere safe, since the translated code hides that information
down in procedural code. Possibly some of the C<[n]> information can
come back into the signature via C<where> constraints on the types.
This would presumably make multimethod dispatch possible on similarly
typed arrays with differing constraints.
XXX It's not clear whether C<+@a> should return the size of the entire
array or the size of the first dimension (or the scalar value of the
entire array if it's really a zero-dimensional array!). I've put
C<+@a[*]> above just in case.
(The special destruction problems of Perl 5's PDL should go away with
Perl 6's GC approach, as long as PDL's objects are registered with Parrot
correctly.)
=head1 Junctions
A junction is a superposition of data values pretending to be a single
data value. Junctions come in four varieties:
list op infix op
======= ========
any() |
all() &
one() ^
none() (no "nor" op defined)
Note that the infix ops are "list-associative", insofar as
$a | $b | $c
$a & $b & $c
$a ^ $b ^ $c
mean
any($a,$b,$c)
all($a,$b,$c)
one($a,$b,$c)
rather than
any(any($a,$b),$c)
all(all($a,$b),$c)
one(one($a,$b),$c)
Some contexts, such as boolean contexts, have special rules for dealing
with junctions. In any scalar context not expecting a junction of
values, a junction produces automatic parallelization of the algorithm.
In particular, if a junction is used as an argument to any routine
(operator, closure, method, etc.), and the scalar parameter you
are attempting to bind the argument to is inconsistent with the
Junction type, that routine is "autothreaded", meaning the routine
will be called automatically as many times as necessary to process
the individual scalar elements of the junction in parallel.
Junctions passed as part of a container do not cause autothreading
unless individually pulled out and used as a scalar. It follows that
junctions passed as members of a "slurpy" array or hash do not cause
autothreading on that parameter. Only individually declared parameters
may autothread. (Note that positional array and hash parameters are
in fact scalar parameters, though, so you could pass a junction of
array or hash references.)
=head1 Parallelized parameters and autothreading
Within the scope of a C<use autoindex> pragma (or equivalent, such as
C<use PDL> (maybe)), any closure that uses parameters as subscripts is also
a candidate for autothreading. For each such parameter, the compiler
supplies a default value that is a junction of all possible values that
subscript can take on. That is, if you have a closure of the form:
-> $x, $y { @foo[$x;$y] }
then the compiler adds defaults for you, something like:
-> $x = @foo.shape[0].all,
$y = @foo.shape[1].all { @foo[$x;$y] }
In the abstract (and often in the concrete), this puts an implicit
loop around the block of the closure that visits all the possible
subscript values for that dimension (unless the parameter is actually
supplied to the closure, in which case that is what is used as the
slice subscript).
So to write a typical tensor multiplication:
Cijkl = Aij * Bkl
you can just write this:
do { @c[$^i, $^j, $^k, $^l] = @a[$^i, $^j] * @b[$^k, $^l] };
or equivalently:
-> $i, $j, $k, $l { @c[$i, $j, $k, $l] = @a[$i, $j] * @b[$k, $l] }();
or even:
do -> $i, $j, $k, $l {
@c[$i, $j, $k, $l] = @a[$i, $j] * @b[$k, $l]
}
That's almost pretty.
It is erroneous for an unbound parameter to match multiple existing array
subscripts differently. (Arrays being created don't count.)
Note that you could pass any of $i, $j, $k or $l explicitly, or prebind
them with a C<.assuming> method, in which only the unbound parameters
autothread.
If you use an unbound array parameter as a semicolon-list interpolator
(via the C<semi> list operator), it functions as a wildcard list of
subscripts that must match the same everywhere that parameter is used.
For example,
do -> @wild { @b[semi reverse @wild] = @a[semi @wild]; };
produces an array with the dimensions reversed regardless of the
dimensionality of C<@a>.
The optimizer is, of course, free to optimize away any implicit loops
that it can figure out how to do more efficiently without changing
the semantics.
See RFC 207 for more ideas on how to use autothreading (though the syntax
proposed there is rather different).
=head1 Hashes
Everything we've said for arrays applies to hashes as well, except that
if you're going to limit the keys of one dimension of a hash, you have
to provide an explicit list of keys to that dimension of the shape,
or an equivalent range:
my num %hash is shape(«a b c d e f»; *);
my num %hash is shape('a'..'f'; *); # same thing
To declare a hash that can take any object as a key rather than
just a string, say something like:
my %hash is shape(Any);
Likewise, you can limit the keys to objects of particular types:
my Fight %hash is shape(Dog;Cat)
=head1 Autosorted hashes
The default hash iterator is a property called C<.iterator> that can be
user replaced. When the hash itself needs an iterator for C<.pairs>,
C<.keys>, C<.values>, or C<.kv>, it calls C<%hash.iterator()> to
start one. In scalar context, C<.iterator> returns an iterator object.
In list context, it returns a lazy list fed by the iterator. It must
be possible for a hash to be in more than one iterator at at time,
as long as the iterator state is stored in a lazy list.
However, there is only one implicit iterator that works in scalar context
to return the next pair.
The downside to making a hash autosort via the iterator is that you'd
have to store all the keys in sorted order, and resort it when the
hash changes. Alternately, the entire hash could be tied to an ISAM
implementation (not included (XXX or should it be?)).
For multidimensional hashes, the key returned by any hash iterator is
a list of keys, the size of which is the number of declared dimensions
of the hash.
LW> =head1 Compact structs
LW> A class whose attributes are all low-level types can behave as
LW> a struct. (Access from outside the class is still only through
LW> accessors, though.) Whether such a class is actually stored compactly
LW> is up to the implementation, but it ought to behave that way,
LW> at least to the extent that it's trivially easy (from the user's
LW> perspective) to read and write to the equivalent C structure.
LW> That is, when byte-stringified, it should look like the C struct,
LW> even if that's not how it's actually represented inside the class.
LW> (This is to be construed as a substitute for at least some of the
LW> current uses of C<pack>/C<unpack>.)
and vec. hard to get pack/unpack to access single bits with offsets or
classic c bit fields.
from inside the class (no accessor needed) will bit sized attributes be
direct access to the field (ala c)? that would mean you could use a
class to map to complex structures and in particular to i/o registers
and not have to jump through shift/mask hoops.
<the rest of S9 hurt my brane>
uri
--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
"all low-level types" or "all low-level *sized* types"?
(I'm wondering about char arrays, string and pointers.)
I presume a char[n] array within a structure could be represented
as an array of int8. That might be uncomfortable to work with.
And that a pointer would be... what? Some platforms has odd
sizing issues for pointers. Perhaps a "voidp" type is needed?
(Which would just be an intN where N is sizeof(void*)*8.)
If a "class whose attributes are all low-level types can behave as
a struct", is that class then also considered a "low-level type"?
(So other structs can be composed from it.)
I think that's important because it allows you to avoid having
to build too much into the base perl6 language. Semantics like
turning an array of int8 with a string with or without paying
attention to an embedded null, can be provided by classes that
implement specific behaviours for low level types but can still
be considered low level types themselves.
> (Access from outside the class is still only through
> accessors, though.) Whether such a class is actually stored compactly
> is up to the implementation, but it ought to behave that way,
> at least to the extent that it's trivially easy (from the user's
> perspective) to read and write to the equivalent C structure.
Is there some syntax to express if the struct is packed or
needs alignment? (Perhaps that would be needed per element.)
Certainly there's a need to be able match whatever packing
and alignment the compiler would use.
(I'm not (yet) familiar with Parrot's ManagedStruct and UnManagedStruct
types but there's probably valuable experience there.)
> That is, when byte-stringified, it should look like the C struct,
> even if that's not how it's actually represented inside the class.
> (This is to be construed as a substitute for at least some of the
> current uses of C<pack>/C<unpack>.)
Whether elements are packed or not for byte stringification is,
perhaps, orthognal to whether they're packed internally.
Network i/o would prefer packed elements and structure interfacing
would need aligned elements.
Tim.
Why am I suddenly thinking about unions ?
What is a "ref" type exactly? Is it like a pointer in C? If so, and
based on the parameterization above, I assume that there will also be
the appropriate pointer arithmetic such that if $fido is declared as a
ref[Dog] and pointed at an array of Dogs, then $fido++ will move to the
next Dog in the array. Something like this:
my Dog @pound;
my ref[Dog] $fido;
# after we've populated the @pound with Dogs ...
loop ($fido = @pound[0]; ?$fido; $fido++) {
$fido.bark();
}
Is there some other syntax to get a compact array of things? Do I
need an attribute on the array?
> the presence of a low-level type tells Perl that it is free to
> implement the array with "compact storage", that is, with a chunk
> of memory containing contiguous (or as contiguous as practical)
> elements of the specified type without any fancy object boxing that
> typically applies to undifferentiated scalars. (Perl tries really
> hard to make these elements look like objects when you treat them
> like objects--this is called autoboxing.)
Will it also try really hard to use compact storage or are the
low-level types just compiler hints? (As a first approximation, could
all of the int types be implemented as Ints with appropriate
serialization?)
> To declare a multidimensional array, you add a shape parameter:
>
> my num @nums is shape(3); # one dimension, @nums[0..2]
> my int @ints is shape(4;2); # two dimensions, @ints[0..3; 0..1]
Maybe it's just my BASIC upbringing, but "shape" doesn't seem like the
right word. Words like "dimension" and "cardinal" fit better in my
head, but I'd want them shorter and "dim" and "card" don't quite work
either ;-)
But "shape" makes me want to do something like this:
my num @a is shape('triangle');
my num @b is shape('octagon');
my num @c is shape('square');
That might make sense for triangles, but not the others (unless
I'm just suffering a failure of imagination)
"size" could even work though it's vague. Maybe even "basis" though
that's not quite right either. Or perhaps "extent"?
Anyway ...my two cents. If "shape" is carved in stone, I'll live with
it :)
> If you wanted that C<0..2> range to mean
>
> @nums[0;1;2]
>
> instead, then you need to use that C<semi> we keep mentioning:
>
> @nums[semi 0..2]
If I had
@a = (0,undef,2);
would
@nums[semi @a]
be the same as
@nums[0;*;2]
?
> XXX It's not clear whether C<+@a> should return the size of the entire
> array or the size of the first dimension (or the scalar value of the
> entire array if it's really a zero-dimensional array!). I've put
> C<+@a[*]> above just in case.
hmm
my int @a is shape(3;4;5);
+@a[*;;] == 3 (same as +@a[*], right?)
+@a[;*;] == 4 (same as +@a[;*])
+@a[;;*] == 5
BTW, could these also be made to work (or something similar)?
my int @b;
my int @a is shape(10;5;7);
@b = @a[*:by(2);;] # @b is now shape(5;5;7)
@b = @a[;1,4;] # @b is now shape(10;2;7)
@b = @a[;(*);] # @b is now shape(10;7)
@b = @a[;;;*] # @b is now shape(10;5;7;1)
@b = @a[;;;*5] # @b is now shape(10;5;7;5)
-Scott
--
Jonathan Scott Duff
du...@pobox.com
> Maybe it's just my BASIC upbringing, but "shape" doesn't seem like the
>
> right word. Words like "dimension" and "cardinal" fit better in my
> head, but I'd want them shorter and "dim" and "card" don't quite work
> either ;-)
>
> But "shape" makes me want to do something like this:
>
> my num @a is shape('triangle');
> my num @b is shape('octagon');
> my num @c is shape('square');
>
> That might make sense for triangles, but not the others (unless
> I'm just suffering a failure of imagination)
I think 'shape' fits better than 'cardinal' or 'dimension'; these things
have a 'dimension' that is distinct from their shape (e.g., C<my int @a
is shape(3;4;5);> has three dimensions). 'shape' implies the topology of
the array; though I can see how it would be easy to assume 'shape' means
'polygon'.
Think of it more like "the shape of ships, the shape of ships, the shape
of water when it drips" (_The Shape of Me and Other Things_ , Dr.
Suess). A teapot and a mobius strip both have very different shapes with
very different features, as do a sparse array and a quaternion.
Forgive me if I come across pedantic; I'm just trying to provide some
examples of thinking in terms of 'shape' as Perl 6 defines it.
Gregory Keeney
> Anyway ...my two cents. If "shape" is carved in stone, I'll live with
> it :)
See, a stone has another shape! <grin>
It's exactly like a reference in Perl 5. Declaring a compact array of
"ref" is merely declaring that the array will only hold references
to Perl 6 data structures, and doesn't have to worry about holding
value types like int or num. It may come out to the same thing as an
ordinary array, depending on how Parrot ends up defining references
internally.
: If so, and
: based on the parameterization above, I assume that there will also be
: the appropriate pointer arithmetic such that if $fido is declared as a
: ref[Dog] and pointed at an array of Dogs, then $fido++ will move to the
: next Dog in the array. Something like this:
:
: my Dog @pound;
: my ref[Dog] $fido;
: # after we've populated the @pound with Dogs ...
: loop ($fido = @pound[0]; ?$fido; $fido++) {
: $fido.bark();
: }
I don't see any more reason to allow that in Perl 6 than in Perl 5.
: Is there some other syntax to get a compact array of things? Do I
: need an attribute on the array?
I don't know what you mean by "of things".
: > the presence of a low-level type tells Perl that it is free to
: > implement the array with "compact storage", that is, with a chunk
: > of memory containing contiguous (or as contiguous as practical)
: > elements of the specified type without any fancy object boxing that
: > typically applies to undifferentiated scalars. (Perl tries really
: > hard to make these elements look like objects when you treat them
: > like objects--this is called autoboxing.)
:
: Will it also try really hard to use compact storage or are the
: low-level types just compiler hints? (As a first approximation, could
: all of the int types be implemented as Ints with appropriate
: serialization?)
I suspect the compiler will try harder for composite types than for
scalars, but certain algorithms will run much faster if they can use
Parrot integer registers rather than PMCs, so int and num scalars
are also likely to be implemented by the Perl 6 compiler directly.
: > To declare a multidimensional array, you add a shape parameter:
: >
: > my num @nums is shape(3); # one dimension, @nums[0..2]
: > my int @ints is shape(4;2); # two dimensions, @ints[0..3; 0..1]
:
: Maybe it's just my BASIC upbringing, but "shape" doesn't seem like the
: right word. Words like "dimension" and "cardinal" fit better in my
: head, but I'd want them shorter and "dim" and "card" don't quite work
: either ;-)
:
: But "shape" makes me want to do something like this:
:
: my num @a is shape('triangle');
: my num @b is shape('octagon');
: my num @c is shape('square');
:
: That might make sense for triangles, but not the others (unless
: I'm just suffering a failure of imagination)
:
: "size" could even work though it's vague. Maybe even "basis" though
: that's not quite right either. Or perhaps "extent"?
:
: Anyway ...my two cents. If "shape" is carved in stone, I'll live with
: it :)
I picked it only because that's what the PDL folks came up with
in their series of RFCs after their own round of discussions.
That doesn't mean we can't change it if we do come up with something
better. But I rather like shape. It's short, and not easily confused
with other Perl 6 concepts.
: > If you wanted that C<0..2> range to mean
: >
: > @nums[0;1;2]
: >
: > instead, then you need to use that C<semi> we keep mentioning:
: >
: > @nums[semi 0..2]
:
: If I had
:
: @a = (0,undef,2);
:
: would
:
: @nums[semi @a]
:
: be the same as
:
: @nums[0;*;2]
:
: ?
Dunno. I suspect we can allow
@a = (0,*,2);
in the indirect form in any event. But the '*' is also still negotiable.
As is the "semi", for that matter.
: > XXX It's not clear whether C<+@a> should return the size of the entire
: > array or the size of the first dimension (or the scalar value of the
: > entire array if it's really a zero-dimensional array!). I've put
: > C<+@a[*]> above just in case.
:
: hmm
:
: my int @a is shape(3;4;5);
:
: +@a[*;;] == 3 (same as +@a[*], right?)
: +@a[;*;] == 4 (same as +@a[;*])
: +@a[;;*] == 5
:
: BTW, could these also be made to work (or something similar)?
That seems rather opaque to me. Better is
@a.shape[0] == 3
@a.shape[1] == 4
@a.shape[2] == 5
: my int @b;
: my int @a is shape(10;5;7);
: @b = @a[*:by(2);;] # @b is now shape(5;5;7)
: @b = @a[;1,4;] # @b is now shape(10;2;7)
: @b = @a[;(*);] # @b is now shape(10;7)
: @b = @a[;;;*] # @b is now shape(10;5;7;1)
: @b = @a[;;;*5] # @b is now shape(10;5;7;5)
I don't like notation that uses null slices to mean everything,
because, as you can see, you end up with long sequences of delimiters,
and I think it's psychologically more valuable to make people count the
somethings than the nothings. So I'm thinking to allow null slices
to mean "everything" only on the trailing slices out of convenience
so you can drop the trailing semicolons, especially when you don't
actually know the dimensionality. But "everything" slices in front
need to use '*'. So I'd write the above as:
my int @b;
my int @a is shape(10;5;7);
@b = @a[*:by(2)] # @b is now shape(5;5;7)
@b = @a[*;1,4] # @b is now shape(10;2;7)
@b = @a[*;(*)] # @b is now shape(10;7)
@b = @a[*;*;*;*] # @b is now shape(10;5;7;1)
@b = @a[*;*;*;*5] # @b is now shape(10;5;7;5)
I'm not sure I buy the last three of those, however. My inclination
is to either make it illegal to use too many dimensions. And the
middle one doesn't make any sense to me. You can't just refuse to
slice the middle dimension, because you have multiple values, unless
you're thinking it turns into shape(10;7;5) instead. But using parens
for something like that is a no-go, since a slice expression needs
parens for whatever grouping it's going to do. * should mean the
same as (*), ((*)), (((*))), etc. for any definition of *.
Larry
heh, that answers that then. I was trying to put on a "I'm a C coder and
I want to write perl as C" hat, but I guess the answer is "learn perl!"
> : Is there some other syntax to get a compact array of things? Do I
> : need an attribute on the array?
>
> I don't know what you mean by "of things".
If "my int @foo" makes a compact array of ints, is there a way to make a
compact array of Dog? (Does it even make sense?) And if so, does it look
like "my Dog @foo" or must there be some other syntax to declare it?
> I picked it only because that's what the PDL folks came up with
> in their series of RFCs after their own round of discussions.
> That doesn't mean we can't change it if we do come up with something
> better. But I rather like shape. It's short, and not easily confused
> with other Perl 6 concepts.
Works for me.
It's just "my Dog @foo", which can be stored exactly like "my
ref @foo". Maybe "my ref @foo" is redundant with "my Any @foo",
if Any is always a reference type. (Which would imply that a mere
value passed to an Any parameter would be autoboxed, which sounds
about right. When viewed as an Any, int turns into Int, num into Num,
str into Str, ref into Ref, dog into Dog, etc. :-)
Larry
I'm not intending to run this all the way into C madness. :-)
The problem with arrays, even arrays that are of fixed shape, is
that someone has to remember that shape, and in the case of a dynamic
language, that's not always going to be the compiler. So either we
have to find someplace related to the object to tuck an invisible array
header, or we force the programmer to keep track of lengths as C does.
I am presuming the former is the better approach. I really only care
if it's efficient to serialize to and from C structs, not whether the
in-memory representation is identical.
: I presume a char[n] array within a structure could be represented
: as an array of int8. That might be uncomfortable to work with.
Not a lot more uncomfortable to work with than in C. :-)
But I'm guessing the low-level C<str> type is a buffer of bytes. Probably
means we say
has str $s is bytes($n)
or
has str[n] $s
or some such.
: And that a pointer would be... what? Some platforms has odd
: sizing issues for pointers. Perhaps a "voidp" type is needed?
: (Which would just be an intN where N is sizeof(void*)*8.)
Eh, pointer? I don't see any pointers around here...but then I
haven't looked very hard on purpose...
: If a "class whose attributes are all low-level types can behave as
: a struct", is that class then also considered a "low-level type"?
: (So other structs can be composed from it.)
It should be easy to serialize.
: I think that's important because it allows you to avoid having
: to build too much into the base perl6 language. Semantics like
: turning an array of int8 with a string with or without paying
: attention to an embedded null, can be provided by classes that
: implement specific behaviours for low level types but can still
: be considered low level types themselves.
That's fine, but you have to expect the objects to keep extra info
around to manage some of the low-level types according to the desired
semantics. What this probably means is that objects might have extra
parameters that aren't expected to officially serialize/deserialize.
I don't profess to be an expert on that subject. I just want it to
be fairly efficient to interface to C and C++ libraries that return
structures. But only to the extent that it doesn't warp Perl into C.
I'm just trying to tell the implementors that they are allowed to
go as far as they reasonably can in that direction, as long as they
don't feel like they have to go unreasonably far. :-)
: > (Access from outside the class is still only through
: > accessors, though.) Whether such a class is actually stored compactly
: > is up to the implementation, but it ought to behave that way,
: > at least to the extent that it's trivially easy (from the user's
: > perspective) to read and write to the equivalent C structure.
:
: Is there some syntax to express if the struct is packed or
: needs alignment? (Perhaps that would be needed per element.)
Would be easy to specify with traits.
: Certainly there's a need to be able match whatever packing
: and alignment the compiler would use.
It only has to look like that. :-)
: (I'm not (yet) familiar with Parrot's ManagedStruct and UnManagedStruct
: types but there's probably valuable experience there.)
Quite likely.
: > That is, when byte-stringified, it should look like the C struct,
: > even if that's not how it's actually represented inside the class.
: > (This is to be construed as a substitute for at least some of the
: > current uses of C<pack>/C<unpack>.)
:
: Whether elements are packed or not for byte stringification is,
: perhaps, orthognal to whether they're packed internally.
Well, I don't know if it's 90° in practice because of efficiency concerns,
but in the abstract that's about right.
: Network i/o would prefer packed elements and structure interfacing
: would need aligned elements.
Presumably. I expect the default would be aligned.
Larry
> The argument to a shape specification is a semicolon list, just like
> the inside of a multidimensional subscript. Ranges are also allowed,
> so you can pretend you're programming in Fortran, or awk:
>
> my int @ints is shape(1..4;1..2); # two dimensions, @ints[1..4; 1..2]
What happens when the Pascal programmer declares
my int @ints is shape(-10..10);
Does it blow up? If not, does @ints[-1] mean the element with index -1
or the last element?
~ John Williams
No.
: If not, does @ints[-1] mean the element with index -1 or the last element?
The element with index -1. Arrays with explicit ranges don't use the
minus notation to count from the end. We probably need to come up
with some other notation for the beginning and end indexes. But it'd
be nice if that were a little shorter than:
@ints.shape[0].beg
@ints.shape[0].end
Suggestions? Maybe we just need integers with "whence" properties... :-)
Larry
I'm a bit surprised.
If I declare
method postcircumfix:[] ($self: *@whatever);
Is $object[$yada] the same as
$object.postcircumfix:[]( $yada ); # which I would expect
or
$object.postcircumfix:[]( <== [ $yada ] ); # which surprises me
If the latter, Why?
If the former, where did the extra magic for arrays come from?
Tangential trivial thoughts:
Can I declare an alphabetic postcircumfix operator?
sub postcircumfix:ipso...facto ( $left, $inside ) {...}
Is that even usable, given that no space is allowed before the
postcircumfix operator? Or does this work:
$foo.ipso 'bar' facto;
Or maybe it has to be declared multi instead of sub for that?
~ John Williams
What jumps to my mind is that inside an array subscript could be
(sub)?context of it's own. Then one could do:
@ints[.beg .. .end ; .beg + 3 .. .end];
Where the .beg and .end would relate to @ints.shape[0] or @ints.shape[1]
depending on which position it's in.
My only issue with this, and why I refered to it as a possible
subcontext, is that it's easy to concieve of somewanting to use the
prior context to generate the subscripts for an array.
The other idea which jumps to mind was to create an operator which tell
the compiler to treat the the following number as if it referred to a
0-based (sub)array, and negatives count off the end. What springs to my
mind is that we are saying this is the "relative to start / end" of the
range, so I will call it Δ (capital delta).
@ints[ Δ0 .. Δ-1 ; Δ3 .. Δ-1 ];
It at least looks nice.
-- Rod
Well, that's just postcircumfix operators are defined. We have to pick
some way for the arguments to come in for any operator type we define,
and this is just how those work.
Defining postcircumfix operators this way allows us to use semicolon
slices in hashes and even sub-like calls:
method postcircumfix:() ($self: *@args) {...}
$object($foo ; $bar);
For multiple argument lists. Coderefs don't do this by default (they
don't accept that notation), but other forms of $object() might.
BTW, the latter is the same as:
$object.postcircumfix:[]( [$yada] );
Since it has no positional section.
> Tangential trivial thoughts:
>
> Can I declare an alphabetic postcircumfix operator?
>
> sub postcircumfix:ipso...facto ( $left, $inside ) {...}
>
> Is that even usable, given that no space is allowed before the
> postcircumfix operator? Or does this work:
>
> $foo.ipso 'bar' facto;
Yep, that's fine, if very weird looking.
Luke
Actually, I'd go for something a little bit longer in the first case:
@ints.shape[0].begin
'beg' is what you do when you're down on your luck; it oughtn't be how you
start out.
As for shortening things, a bit of dwimmery might be in order: if the
array in question only has one dimension, the [0] might be made optional;
and when you're using the postcircumfix:[] index operator, you might have
each expression evaluated in the context of the appropriate dimension:
@ints[.begin + 2, .end - 3]
would be equivalent to
@ints[@ints.shape[0].begin + 2, @ints.shape[1].end - 3]
I wonder if this notion of contextualizing a method's signature could be
generalized... I could see a case for treating most methods as if the
expressions in each parameter were being evaluated within the caller's
class:
scratch $jack: .back
would be equivalent to
$jack.scratch($jack.back)
This isn't quite the same thing as the index operator given above:
@ints[.begin + 2, .end - 3]
would be equivalent to
@ints[@ints.begin + 2, @ints.end - 3]
by this logic, which isn't what we want. We'd either have to make this a
special case or say:
@ints[.shape[0].begin + 2, .shape[1].end - 3]
...which would be unneccessarily bulky. OTOH, it would be annoying if we
had to say
@ints[@ints.shape[1].begin, @ints.shape[0].begin]
if we wanted to access the index whose first dimension corresponds to the
lowest bound of the second dimension and vice versa - though why you'd
ever want to do something like that is beyond me. Perhaps this is
appropriate Huffman coding after all...
How to declare this? Perhaps you could do something like:
method postcircumfix:[]
($object: *@indexes is context($object.shape[$_])) {...}
where the "context" trait normally has a value of $object in a method.
The above is an example of how the thing would be applied to the list
parameter, with the context being applied to each element in turn and $_
being set to the list's index for that element; similarly, an anonymous
named parameter might work the same way, with $_ being set to the
parameter's key. Any function that has exactly one invocant would have a
default context of that invocant for every parameter, while any function
that has anything _but_ one invocant would have _no_ default parameter
context - which wouldn't stop programmers from explicitly adding contexts
where appropriate.
=====
Jonathan "Dataweaver" Lang
_______________________________
Do you Yahoo!?
Win 1 of 4,000 free domain names from Yahoo! Enter now.
http://promotions.yahoo.com/goldrush
Awful dotty...
: Where the .beg and .end would relate to @ints.shape[0] or @ints.shape[1]
: depending on which position it's in.
:
: My only issue with this, and why I refered to it as a possible
: subcontext, is that it's easy to concieve of somewanting to use the
: prior context to generate the subscripts for an array.
That too...
: The other idea which jumps to mind was to create an operator which tell
: the compiler to treat the the following number as if it referred to a
: 0-based (sub)array, and negatives count off the end. What springs to my
: mind is that we are saying this is the "relative to start / end" of the
: range, so I will call it Δ (capital delta).
:
: @ints[ Δ0 .. Δ-1 ; Δ3 .. Δ-1 ];
:
: It at least looks nice.
I'd prefer to keep things in the Latin-1 range though.
Here's some other screwy ideas:
@ints[A..Z]
@ints[=+0..=-1]
@ints[|+0..|-1]
@ints[\+0..\-1]
@ints[{0}..{-1}]
There's something to be said for the last one. Those are actually
closures, so it'd be perfectly natural for the subscript to delay
evaluation of those numbers until it know the current bounds of the
current dimension. The other syntaxes would have to be recognized
and translated to @ints.shape[$dim].beg and .end. Which would
preclude them from being used outside a subscript:
@allbutone = (A..Z-1);
return @ints[@allbutone];
I suppose we could always get around that with special endpoint objects.
But closures are already special. On the other hand, that's a real
minus in the closure, so it's sort of making the same mistake as Perl 5.
Though if we went with A and Z, I suppose for consistency the
corresponding methods would want to be:
@ints.shape[$dim].A
@ints.shape[$dim].Z
And that A..Z notation has the benefit of *not* relying on the -1 hack.
(Though presumably the \+ and \- style thingies are two-character
unary operators, not real numeric operators. In which case \-0 means
the first unused entry after the array. Poor-man's push:
@ints[\-0] = $x;
Could even do the same chicanery with {+0} and {-1}, I suppose, but that
feels tacky.
I guess I still have a soft spot for the A to Z approach:
@ints[Z+1] = $x;
I wonder if that means that @ints.shape[n].AZ returns a range.
The fun thing would be getting A..Z to work for enumerated ranges of
hash keys. I thought it was kind of fun that Perl 5's restricted
hashes naturally fell out of shape("Foo", "Bar", "Baz"). I guess
that means the .. operator has to be sensitive to which subscript
it's being expanded in, though. And A+1..Z-1 would presumably be
smart enough to give you only a "Bar"...which some folks might be
happy enough with.
Larry
But wouldn't that mean Jack scratching my back instead of his?
I think we'd have people losing track of the current topic all the
time if we did something like that.
: This isn't quite the same thing as the index operator given above:
I think we just need something really short and unconfusing for the
commonest cases, and let people write it out longhand for the others.
Somebody needs to talk me out of using A..Z for the simple cases.
Larry
@a[ 42 ; -1 but last ]
That reads pretty well, no?
Maybe the other end isn't quite as good:
@a[ 1 but first .. -2 but last ]
Hmm. Should "-1 but last" or "0 but last" be the last element?
~ John Williams
Well, hey, it almost makes sense to go with:
@a[0 but true .. -1 but false]
:-)
I'm still thinking A is the first one and Z is the last one. Someone
talk me out of it quick.
Larry
Would we? It seems obvious to me that once you specify a subject, all
possessive objects that follow are considered to be possessed by that
subject; read the dot in the shorthand as "his", "her", "its", "their", or
"your", as appropriate:
$spot.chase .tail;
would be
"Spot: chase your tail."
This is the same way that dots operate within a method's body.
> I think we just need something really short and unconfusing for the
> commonest cases, and let people write it out longhand for the others.
> Somebody needs to talk me out of using A..Z for the simple cases.
How is A..Z different from
(A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z)?
>I'm still thinking A is the first one and Z is the last one. Someone
>talk me out of it quick.
>
I had thought about A and Z before my previous post. I dismissed it for
two reasons:
1) Using Alphas as an index for something that should be numeric can be
very confusing. Especially when one sees:
@Int[4 .. Z];
2) If A = first and Z = last, DWIM (but maybe not DWYM) would dictate
that I should be able to say:
@Int[C .. Y];
instead of:
@Int[A+2 .. Z-1];
That's why I liked the concept of an operator to modify the number
afterwards. Okay, Δ might not have been the best choice, but there are
other options. Looking at your preferred Latin-1 table, we have:
§ (§) "Use this Section of the array"
® (®) The R means "Relative to base"
Or if we want to get silly, use ¨ instead of .., and just use the
numbers as is.
If you insist on using A and Z, at least make them \A and \Z, to give a
stronger visual cue that something different is happening.
-- Rod
I think I'd prefer alpha and omega.
Or maybe turn my previous suggestion around and make first and last
special constants. Then say:
@a[ first .. last but 1 ]
Or does it have to be last but int(1) ?
~ John Williams
Some other ideas ...
^A..^Z Too confusing with $^A and $^Z ?
^A..^? Well, if control-A is the alpha, then control-?
is the natural omega :-)
^..$ Borrowing from REs
^..^-1 ^ becomes something like $[. Indices "before" it
come from the end of the array.
<..> start from the little end and move to the other
er ... little end
After typing these I can see why Larry likes A and Z so much.
@array.abs[0];
@array.abs[-1];
.abs would be an alternative interface for the array, where 0 is always
the first element.
The real index is then
( $abs_index >= 0 ?? @array.first :: @array.last + 1 ) + $abs_index
Juerd
Why not use Cyrillic or Korean or the secret code alphabet we used in
school?
I don't like using letters for array indexes, but if they're used,
please keep it ascii :)
Juerd
> What happens when the Pascal programmer declares
> my int @ints is shape(-10..10);
Should that really all be in core? Why not let the user create his own
derived array that does what she wants?
Honestly I don't see the point why all "normal" array usage should be
slowed down just for the sake of some rare usage patterns.
> ~ John Williams
leo
>: (I'm not (yet) familiar with Parrot's ManagedStruct and UnManagedStruct
>: types but there's probably valuable experience there.)
> Quite likely.
Well, *ManagedStruct is already working pretty well. It is used to
interface with C code that returns or takes structures of some natural
types. It supports nested structures and pointers to these as well as
arrays of items and a limited set of callback functions.
See e.g. runtime/parrot/library/SDL* for a lot of such usage.
Constructing such a struct takes a list of triples:
- data type
- optional array count of item
- optional offset
Alignment is calculated automatically, if offset is zero. Access to
structure items is by index or by name, if the initializer list
provides named items.
Under the hood an exact image of the C structure is created.
> Larry
leo
> On Fri, Sep 03, 2004 at 05:45:12PM -0600, John Williams wrote:
> : What happens when the Pascal programmer declares
> :
> : my int @ints is shape(-10..10);
> :
> : Does it blow up?
>
> No.
>
> : If not, does @ints[-1] mean the element with index -1 or the last element?
>
> The element with index -1. Arrays with explicit ranges don't use the
> minus notation to count from the end.
If '*' is already going to be a special term when expecting an array
index, I could imagine using it also as a term for the end of the
array -- which end depending on whether it is used in addition or
subtraction.
my int @ints is shape(-10..10;0..3);
@ints[*+0;*-0] =:= @ints[-10;3];
(Though I cannot see any reason why I would want to use something
like that, I am sure someone somewhere would find it useful ...)
> We probably need to come up with some other notation for the
> beginning and end indexes. But it'd be nice if that were a little
> shorter than:
>
> @ints.shape[0].beg
> @ints.shape[0].end
>
> Suggestions? Maybe we just need integers with "whence" properties... :-)
A _little_ shorter, okay? If @ints.shape[0] is a range, then:
my int @ints is shape(-10..10;0..3);
@ints.shape[0;0] == -10;
@ints.shape[0;-1] == 10;
Assuming shape has the proper shape, that is. :-)
And, in scalar context, it would still return the size.
my int @ints is shape(-10..10;0..3);
@ints.shape[0] == 21;
... now, I don't know if I mean that seriously. You be the judge. :-)
Eirik
--
-------- Rule of Feline Frustration: -------* I bet you never noticed that
When your cat has fallen asleep on your lap | "homeowner" has the word
and looks utterly content and adorable, you | "meow" in the middle of it.
will suddenly have to go to the bathroom. *------------- P. McGraw ----
I think "@ints[-11]" is the obvious choice!
Also, it might be a decent default to have array parameters (or any
bindings) automatically readjust their indices to [0..$#array] unless
explicitly declared otherwise:
sub f1(@a) { @a[0] }
sub f2(@a is shape(:natural)) { @a[0] }
sub f3(@a is shape(-2..(-2 + @a.len))) { @a[0] }
my @array is shape(-1..1) = 1..3;
f1(@a); # ==> 1
f2(@a); # ==> 2
f3(@a); # ==> 3
That way, any code using non-zero-based indices would be clearly
marked as such, which seems prudent -- I know I don't use "$[" where
appropriate, and usually assume that "$#x + 1 == @x".
/s
Just think of all the trouble it would cause in the summaries:
'Meanwhile, in perl6-language, there was much discussion about Z.
Actually, most of the discussion was about Z, but there's no chance I'm
using the American pronunciation. I was born pronouncing it "Z", and
I'll die pronouncing it "Z", "Z" be damned.'
Besides, this is Perl, and there should be a classical solution, a
literary solution, a solution that makes other languages look on
uncomfortably because they can't quite decide whether it's too crazy for
words or whether they've been left in the dust once again.
The actual issue is how to distinguish cardinal numbers from ordinals,
right? So if we want ordinal numbers, why not use ordinals?
say "From the home office in Moose Jaw, Saskatchewan:";
say @top_AZ_alternatives[1st .. 10th];
So the first element is 1st (or 1th), and the last is -1st. Or maybe 0th
is the first? No, that's silly, 1st should be first. 0th could be the
element before the first, and I suppose -0th means after the last. (If
you read from the 0th/-0th element of an array you presumably get undef,
and you could write to it to unshift/push.)
We already have ordinals for grammars, so I'm sure we could make 'em work
here. (Maybe "nth()" is an operator that constructs ordinal-objects?
(I kind of want a "th" suffix operator so I can do ($n)th. Although that
doesn't really lend itself to counting from the end, like the supposed
"-nth" operator, unless you can do something like "($n)th but backwards"
... eh, which may not be worth it.))
I actually found things I liked in pretty much all the suggested
alternatives, but none of them reached out and grabbed me by the throat
the way "nth" did. It just seems more Perlish.
- David "just reali[sz]ed that in England @floor[1st] is
the second, but is hoping nobody else notices" Green
That's sort of what we're doing here. Normal arrays have no shape
and are 0-based.
: Honestly I don't see the point why all "normal" array usage should be
: slowed down just for the sake of some rare usage patterns.
Does it have to? Couldn't it have a different vtable? (Which argues
that such a shape ought to be considered an parameterization of the
implementation type. (Which argues that it should be "of shape"
rather than "is shape"...).)
Larry
Yow. Presumably "nth" without an argument would mean the last. So
@ints[1st..nth]
means
@ints[*]
And if it's a unary with an optional arg, we can write
@ints[nth+1..nth-1]
Hmm, that's not quite right. My wife looks over my shoulder and
says "What about zth?" Not knowing the history of A..Z already...
But yeah, nth is pretty grabby.
Larry
How about keywords c<lo> and c<hi>?
>
> Larry
> In article <2004090402...@wall.org>,
> la...@wall.org (Larry Wall) wrote:
> >I'm still thinking A is the first one and Z is the last one. Someone
> >talk me out of it quick.
>
> The actual issue is how to distinguish cardinal numbers from ordinals,
> right? So if we want ordinal numbers, why not use ordinals?
>
> say "From the home office in Moose Jaw, Saskatchewan:";
> say @top_AZ_alternatives[1st .. 10th];
>
> So the first element is 1st (or 1th), and the last is -1st. Or maybe 0th
> is the first? No, that's silly, 1st should be first. 0th could be the
> element before the first, and I suppose -0th means after the last. (If
> you read from the 0th/-0th element of an array you presumably get undef,
> and you could write to it to unshift/push.)
Very close. I really like counting from the front using a postfix operator
"th" (you'd also want "st" and "nd" to be synonymous, and maybe a special
"first") but it seems to me trying to count backwards in that scheme just
doesn't quite work. That last element isn't "-1st", that's (or at least
"(-1)st", depending on precedence) the one before the 0th, which is the one
before the 1st. If you want to count from the end, why not go all the way and
use "last", "last-1", "last-2", etc. "last+1" would be the first element past
the current bounds, so "push @foo, $bar" would be the same as
"@foo[last+1]=$bar". Of course, getting any of this to actually work seems
tricky, but doesn't seem any harder than A or Z, and would read extremely well.
> We already have ordinals for grammars, so I'm sure we could make 'em work
> here. (Maybe "nth()" is an operator that constructs ordinal-objects?
> (I kind of want a "th" suffix operator so I can do ($n)th. Although that
> doesn't really lend itself to counting from the end, like the supposed
> "-nth" operator, unless you can do something like "($n)th but backwards"
> ... eh, which may not be worth it.))
I don't see any reason why "st", "nd", and "th" couldn't be postfix operators
instead of prefix.
--
Adam Lopresto
http://cec.wustl.edu/~adam/
"The prince wants your daughter for his wife."
"Well, tell him his wife can't have her."
- Blackadder III
On Sat, Sep 04, 2004 at 08:48:54AM -0700, Larry Wall wrote:
: Does it have to? Couldn't it have a different vtable?
Though I can see where it might slow down some optimizations.
Larry
I meant the actual words "alpha" and "omega", because they're like A and Z
but with some extra religious connotation of first and last.
st (nd, rd) as a cardinal postfix operator sounds good.
To keep 1st+$n a Cardinal, we could define
multi operator:+ (Cardinal $a, Int $b) is commutative {...}
Is nth a magic function that returns the cardinal size of the
dimension it is in?
~ John Williams
While we're here, I think perl should understand ordinals
(http://mathworld.wolfram.com/OrdinalNumber.html), too. The syntax is
quite ready for it:
$w = +0...;
$w1 = $w + 1;
assert($w1 > $w);
sub wn($n) { $n ?? wn($n-1)+1 :: $w }
$w2 = 0... + wn«0...;
assert($w2 == $w*2);
Just think of the possibilities! :-)
Seriously though, putting 1st, 2nd, nth, etc. in the language is somehow
very appealing. It makes my heartburn about m:1st// settle down quite a
bit, too.
Luke
If it means the last, why not just use C<last>? It's only one character
longer, and its meaning can't be mistaken or confused. So
@ints[1st..last]
means
@ints[*]
There is the question about what C<last+1> would mean; intuitively, moving
forward from the end takes you into limbo; but I can see an advantage in
wrapping things around such that last+1 == 1st, and 1st-1 == last. You
could even go so far as to allow something like postfix:'th for whole
number scalars, so that $n'th == last + $n. This would imply that 0th ==
last, that -1st == last - 1 (or "next to last"), and that -3rd == last - 3
(or "third from last"). While the idea that 0th == last is
counterintuitive to me, the idea that negative ordinals correspond to
offsets from the last element seems rather nice. All in all, this
salvages the perl5 $ints[-1] notion, albeit requiring a +1 adjustment - or
this could be translated as @ints[1st-1] (a bit bulkier, but more
legible).
Would we be able to extend this notion? That is, would the following be
viable:
@ints.shape[0].1st
# returns the index that corresponds to 1st for dimension zero
Regardless, the following _would_ be viable:
@ints.shape[1st]
# returns information that corresponds to the first dimension
And that's worth a lot.
--
Where else could ordinals come in handy? They're already being used in
REs, but I'd recommend expanding the usage there to include C<last> and
possibly C<$n'th> as well as C<1st>, C<2nd>, C<3rd>, etc. Where else?
Would there be any use for C[<STDIN>.3rd], frex?
Yeah, I was thinking something like that. And if the arg is an actual
array, maybe it returns the max dimension(s)? I think you'd get that
anyway for one dimension... in scalar context it would just be the same
as using the array itself, since it returns the number of elements:
@degree=('a'..'f');
nth @degree == nth(@degree) == nth(+@degree) == nth(6) == 6th
nth @degree == @degree.nth == 6
Hm, some ambiguity there, since 6 isn't quite the same as 6th (although
of course it only makes sense for 6th to evaluate to 6 in numeric
context). So the ".nth" method needs to return an ordinal?
But with multiple dimensions, nth would return the max for each one:
@degree is shape (5;6;7);
nth @degree != nth (+@degree)
nth @degree == @degree.nth == (5, 6, 7) # or is that (5;6;7)?
# or (5th;6th;7th)?
So @degree[nth @degree] == @degree[nth; nth; nth];
Something like that....
>And if it's a unary with an optional arg, we can write
>
> @ints[nth+1..nth-1]
>
>Hmm, that's not quite right. My wife looks over my shoulder and
>says "What about zth?" Not knowing the history of A..Z already...
=) Was that supposed to be first to last again, or second to second
last? 1st, 2nd, nth could all take an arg; but I think no arg should
mean the same as an argument of zero. Then nth+1 = 1st+0 = 2nd-1, etc.
@int[1st+0 .. nth-0] # first to last
@int[1st+1 .. nth-1] # second to second last
@int[1st+2 .. nth-2] # third to third last
@int[1st+$offset .. nth-$offset] #whateverth to whateverth last
@int[nth(1+$offset)..nth(-$offset)] #same thing
Using nth(1) to mean "first" looks better with the brackets, I guess.
(Psychologically it reads more like "the nth position is 1" than "the
nth position plus 1 more".) There's still that problem of whether 0th
should be at the beginning or the end (or both (or neither)). Having
both +0th and -0th is a nice symmetry, but it's more suited to humans
than programmers; one reason for wanting to make -1 the last position is
so that you can wrap around simply by decrementing your counter.
Whichever way we try to go, Murphy's Law predicts you'll always wish the
zero were at the other end. I think it makes more sense to start
counting with the first==nth(1) and go up to the last=nth(0)=nth:
because counting from the front is likely to happen more often than
counting from the end; and because 1st(0) gives us a symmetrical
counterpart to nth(0).
-David "nth degrees of exasperation" Green
>Larry Wall wrote:
> > Yow. Presumably "nth" without an argument would mean the last.
>
>If it means the last, why not just use C<last>?
Conflict with "last LOOP"? Hm, the context should be enough to
distinguish them, no? (Hey, maybe they can be unified somehow --
"last -1" to skip to the penultimate pass through the loop? =P)
Anyway, if we can have "last", we should also have "first" (just for
people who don't mind all the extra typing).
>There is the question about what C<last+1> would mean; intuitively, moving
>forward from the end takes you into limbo; but I can see an advantage in
>wrapping things around such that last+1 == 1st, and 1st-1 == last.
And Adam D. Lopresto wrote:
>That last element isn't "-1st", that's (or at least "(-1)st",
>depending on precedence) the one before the 0th, which is the one
>before the 1st. If you want to count from the end, why not go all
>the way and use "last", "last-1", "last-2", etc. "last+1" would be
>the first element past the current bounds, so "push @foo, $bar"
>would be the same as "@foo[last+1]=$bar".
Yes, I think counting from the "zeroth last" element is the way to go.
So the question then is, should we wrap around or not? Maybe "first" and
"last" can keep going off either end of the array, and "nth" can wrap? Hm.
>Where else could ordinals come in handy? They're already being used in
>REs, but I'd recommend expanding the usage there to include C<last> and
>possibly C<$n'th> as well as C<1st>, C<2nd>, C<3rd>, etc. Where else?
>Would there be any use for C[<STDIN>.3rd], frex?
I guess there isn't so much need for a relative index into a file...
unless maybe you've got some bad sectors and the data doesn't start
until the fifth line. =) Of course, there's a resonance with the
whole iterator thing...
-David "but that's still making my head go 'round and around [ha]" Green
No problem here, especially if C<0th> and C<last> are synonyms - that is,
make "..., -4th, -3rd, -2nd, -1st, 0th, 1st, 2nd, 3rd, 4th, ..." be the
underlying mechanism, and define C<last> and C<first> as synonyms for
C<0th> and C<1st>. I'd also leave out C<nth> as unneccessary, although an
ordinalizer function that takes an integer scalar and produces an ordinal
would be useful. I'm partial to said function being C<postfix:'th> in
order to maintain symmetry with the ordinal notation (and to let
mathematicians talk about the $i'th and $j'th elements of a vector), but
as long as there's _some_ sort of ordinalizer, I'll be happy.
> Adam D. Lopresto wrote:
> >That last element isn't "-1st", that's (or at least "(-1)st",
> >depending on precedence) the one before the 0th, which is the one
> >before the 1st. If you want to count from the end, why not go all
> >the way and use "last", "last-1", "last-2", etc. "last+1" would be
> >the first element past the current bounds, so "push @foo, $bar"
> >would be the same as "@foo[last+1]=$bar".
If C<@foo[last+1]=$bar> is equivalent to C<push @foo, $bar>, what happens
if you say C<@foo[last+2]=$bar>? While I like the notion that subtracting
from first or adding to last takes you beyond the bounds of the list, you
generally can't go more than one beyond either end, and then only to add
to it. No, I think we'd be better off having it wrap all the time, and
leaving push and unshift as is.
> sub wn($n) { $n ?? wn($n-1)+1 :: $w }
> $w2 = 0... + wn«0...;
> assert($w2 == $w*2);
>Just think of the possibilities! :-)
Hm. Needs more Unicode. =)
>Seriously though, putting 1st, 2nd, nth, etc. in the language is somehow
>very appealing. It makes my heartburn about m:1st// settle down quite a
>bit, too.
It is kind of comfortable. Which is why I think I'd like to keep the
redundant nth (if we have "first" and "last"), aka 'th (where nth($i)
and $i'th are just pre- and postfixed versions of each other).
When you're referring to an element in the middle of a list,
$four'th
just feels cleaner than "first+$four" or something.
Especially important since there's a potential ambiguity problem between
the C<$n'th> notation and the C<''> notation, whereas no such ambiguity
exists for C<nth($n)>.
> When you're referring to an element in the middle of a list,
> $four'th
> just feels cleaner than "first+$four" or something.
or something (namely, "last+$four", or "0th+$four"). :) In this case,
C<0th> would probably be clearer than C<last>.
And whenever the ambiguity isn't an issue, I'd prefer $four'th to
nth($four) as well.
=====
Jonathan "Dataweaver" Lang
__________________________________
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail
Yeah, I like that.
>I'd also leave out C<nth> as unneccessary, although an
>ordinalizer function that takes an integer scalar and produces an ordinal
>would be useful. I'm partial to said function being C<postfix:'th> in
>order to maintain symmetry with the ordinal notation (and to let
>mathematicians talk about the $i'th and $j'th elements of a vector), but
>as long as there's _some_ sort of ordinalizer, I'll be happy.
I'm sold on 'th. nth might also handy if you have some long hairy
expression to wrap it around. (I'm not too worried about the exact
syntax; I reckon I'll be happy with whatever we end up with.)
>> Adam D. Lopresto wrote:
>> >the first element past the current bounds, so "push @foo, $bar"
>> >would be the same as "@foo[last+1]=$bar".
>
>If C<@foo[last+1]=$bar> is equivalent to C<push @foo, $bar>, what happens
>if you say C<@foo[last+2]=$bar>? While I like the notion that subtracting
>from first or adding to last takes you beyond the bounds of the list, you
>generally can't go more than one beyond either end, and then only to add
>to it.
I would expect that to work like @foo[$last+2]=$bar does in Perl 5 --
adds an undef value for @foo[$last+1] and $bar after that.
I was going to suggest that ordinals wrap around and cardinals "stick
out", but that's probably just begging for subtle confusing errors.
- David "eh, just use Inf-based arrays, then wrapping
and sticking out amount to the same thing" Green
Good point there: if perl 5 does it, we ought to think twice about
removing the capability from perl 6. With this in mind, _should_ indices
wrap; and if so, when? I think you're right about cardinals not wrapping;
if someone assigns to @ints[6] and the existing indices range 0..4, it
shouldn't end up assigning to @ints[1]. Likewise, if someone wants to
assign to two past the last element, assigning to C<last+2> seems to be
common sense; similarly with C<first-3>. The only place where it makes
sense to wrap is when you define 0th as the final element, making it
logical that 0th+1 == 1st and 1st-1 == 0th. (But what happens if you try
to access the 4th or -3rd element of a three-element list? Perhaps Perl
should complain?)
Maybe the behavior should depend on the list. For normal lists, nothing
wraps; if you define the list with C<is ring> or some such, everything
wraps.
=====
Jonathan "Dataweaver" Lang
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
A different vtable implies some kind of a derived class. The question
is, if an "of shape" or "is shape" already causes a new class of
arrayish objects.
The question probably is: how much of this class code is done directly
by Parrot. But given that shapes can be fully dynamic too my gut feeling
is that some glue code will be in the Perl6 library.
> Larry
leo
Okay, I'll try to make it clear in S9 that non-0-based arrays are not
to be implemented in such a way that the base array class is slowed.
If that means they're not in 6.0.0 at all, that's okay. I'm mostly
just trying to keep the syntactic space open for them by allowing ranges
in shape parameters, which implies that the parameters are lists rather
than scalars, which implies they have to be separated by semicolons
rather than commas.
Larry
Another possibility is that .[] always forces the "normal" view of an
array as 0-based, and if you want non-0-based arrays you have to use
the .{} interface instead, on the assumption that strange subscripts
are more like hash keys than ranges of integers. Certainly if
you have a sparse array with keys like 1,2,4,8,16,32... you have a
situation that's more like a hash than an array. A sparse array
might not even give you the .[] interace. The presence of a .[]
interface might signal the ability to process contigous ranges,
even if those ranges are offset from the .{} range.
Such an approach would have the benefit that a module could simply
treat all its arrays as 0-based .[] arrays, and if you happened to
pass a non-0-based .{} array in, it would still work, albeit with
0-based indexes instead of whatever shape the .{} interface uses.
Anyway, it's an idea that might or might not make sense. It seems
to me, though, that either you're interested in treating all the
dimensions of a subscript as non-0-based, or none of them. I think
people will rarely want to mix those in the same subscript.
On the other hand, it opens up the possibility of mixing up .[] with .{}
and getting off-by-n errors, unless a declared shape of non-0-based
turns off the .[] interface entirely.
Larry
> Arrays with explicit ranges don't use the
> minus notation to count from the end. We probably need to come up
> with some other notation for the beginning and end indexes. But it'd
> be nice if that were a little shorter than:
>
> @ints.shape[0].beg
> @ints.shape[0].end
>
> Suggestions? Maybe we just need integers with "whence" properties... :-)
Actually, what you had in Perl 5, was essentially:
$x[-1] == reverse(@x)[0]
In Perl 6, this is actually workable because of the lazy evaluation of
reverse, so if you simply re-name reverse to "rev" as a list method for
brevity:
my int @ints is shape(-10..10);
@ints.rev[-10]; # assuming that @x.rev retains shape
And as someone pointed out:
@ints.abs.rev[0]
if you had an abs that strips away shape.
All that being said, I think that this (shape) is a dangerous idea at
best. If used, it should probably specify a length ONLY:
my int @ints is shape(10)
and in that light, I think it would best be renamed to "length" or
"extent".
Specifying the origin should be left to $[... that is, left out.
--
☎ 781-324-3772
✉ a...@ajs.com
☷ http://www.ajs.com/~ajs
The Honeywell 6000 (which is still around as a machine from Bull
with a 6 in its name, I believe) was a word addressed machine.
(Words were 36 bits long and could hold 4 9-bit characters
packed into each one.) Pointers used a few high-order bits to
encode the "byte-number" within the word for languages like C
which thought that character addresses were a meaningful part of
the language. (So did old Control Data machines, come to think
of it - 60-bit words containing 6-bit and/or 12-bit characters.)
If a int1 (or int2 or nybble or other sub-addressable sized
value) is being referred to, a similar issue arises since most
machines these days have byte addressing, but do not have bit
addressing. If you can't refer directly to it, the value will
have to be extracted and re-inserted to provide "is rw" access.
--
Well, sure. Is this any more of a problem than vec() in Perl 5?
Larry
> To declare a multidimensional array, you add a shape parameter:
>
> my num @nums is shape(3); # one dimension, @nums[0..2]
> my int @ints is shape(4;2); # two dimensions, @ints[0..3; 0..1]
Just a random thought, and probably a minor point: I know that there
already so many keywords, but if not already taken, could C<has> be
introduced as a synonim for C<is> to be freely used where it would "fit"
better? Especially in constructs like
my num @data is Stuff has shape(4;2);
Michele
--
> I was wondering if there is someone who knows a good perl spam filter.
That's funny. Most of my spam is for v1agr4 and p3nis 3nl@rg3ment, not Perl.
- John J. Trammell in clpmisc, "Re: Perl Spam Filter" (slightly edited)
> And yes, an C<int1> can store only -1 or 0. I'm sure someone'll think of
> a use for it...
Probably OT, but I've needed something like that badly today: "working" on
a japh that turned out to require mostly golfing skills (and not that I
have many, I must admit)... well, it would have been useful to have, say,
a pack template for 2-bits unsigned integers...
Michele
--
QueryPerformanceCounter returns a 64-bit integer... but the frequency
and base change from machine to machine, and boot to boot.
- Ben Liddicott on CLPM, "Re: pack, Win32 registry & binary data"
C<has> is already used to give classes attributes...
class Thing {
has @.fingers;
has $.mysteriousness;
}
I wouldn't be surprised if it does run-time attribute addition to
objects as well, but I can't recall seeing that anywhere and haven't got
time to look right now. Even if it doesn't, it would, I think, be a
mistake to overload it in this way.
As an array index -1 and 0 give you the 2 ends. The perl5
code to alternately extract elements from the two eds of an
array can be something like:
my $end = 0; # -1 to start with right end
while( @array ) {
my $next = splice( @array, $end, 1 );
# use the $next element
$end = -1 - $end;
}
Using int1 for $end, that last line can be changed in a variety
of ways, such as:
$end ^= 1;
(except that the p5 ^= operator is written differently in p6)
This is not a *good* use of int1 though. :-)
--
Well, if you're golfing:
$end++;
If the low level types wrap around rather than promote;
Luke
> On Thu, 2 Sep 2004, Larry Wall wrote:
>
> > To declare a multidimensional array, you add a shape parameter:
> >
> > my num @nums is shape(3); # one dimension, @nums[0..2]
> > my int @ints is shape(4;2); # two dimensions, @ints[0..3; 0..1]
>
> Just a random thought, and probably a minor point: I know that there
> already so many keywords, but if not already taken, could C<has> be
> introduced as a synonim for C<is> to be freely used where it would "fit"
> better? Especially in constructs like
>
> my num @data is Stuff has shape(4;2);
I also think C<is shape> reads awkwardly. Would making it C<is shaped>
be any better?
my num @data is Stuff is shaped(4;2);
Smylers
>Another possibility is that .[] always forces the "normal" view of an
>array as 0-based, and if you want non-0-based arrays you have to use
>the .{} interface instead, on the assumption that strange subscripts
>are more like hash keys than ranges of integers.
That's true. But it's got me thinking about the connection between
arrays and "associative" arrays. In fact, the user doesn't need to know
that a "hash" is implemented with a hash table, and an "array" isn't;
and nothing stops you from using numbers as hash keys. And if you
restrict your "hash" to numeric keys, Perl could notice and optimise it
into an array. (Or integer keys, or positive integers, or a consecutive
range of positive ints....)
If we consider a generic "data structure" type (which may or may not be
optimised under the hood for integral indices), then why shouldn't {} be
the "index-by-name" interface, and [] the "index-by-ordinal" interface?
(Does that mean [$x] is just a shorthand for {nth($x)}?)
(Will P6 arrays/hashes have an ".index" method to return what index they
are? Or .index for the ordinal and .key for the name. (Although .key
is perhaps a bit too close to .keys when the member is itself a hash.))
Of course, there is one important difference between arrays and hashes:
arrays are ordered. People do keep asking about ordered hashes, though.
There's no reason you couldn't use ordinals with a hash anyway (the
"order" may not be particularly meaningful, but sometimes you don't care
-- of course, in those cases you'd usually iterate through it, but it
could be handy to be able to say %h[rand(last)] (except that isn't
really quite how you'd say it, but you get the idea)).
Data structures might have default sorting by key (since that's what
arrays have), but you could sort other ways... maybe nth takes a "by"
adverb: "first"="first :by key", vs. "first :by value". Hm, it probably
should take a closure (along the lines of "sort").
Anyway, I'm rambling, but there's something to the idea of Perl offering
some sort of generic "data structure" type...
"my $hash is Struct;" would be the most general, no restrictions on
keys, and no ordering, so Perl is free to use a hash table without
worrying about the order.
"my $array is Struct(keytype=>PosInt, ordering=>keywise)" has keys
restricted to ints, and iterates in order by index, i.e. it's
implemented as an ordinary array (where [$n] and {$n} refer to the same
element).
"Hash" and "Array" types would be shorthand for those kinds of Structs,
but you could define your own by providing suitable shapes and
orderings. (Hm, since Hashes are the most general, they're probably
actually the base type, rather than "Struct", which does sound kinda
silly, and probably sounds sillier if you're not used to C.)
I was also going to suggest an in-between type of structure, like a
Collection in VB, that accepts anything for the key (or some useful
restricted type?) but is ordered (in order of when elements were added).
But I can't think of any character available for the sigil. =)
- David "making a hash of things" Green
>If we consider a generic "data structure" type (which may or may not be
>optimised under the hood for integral indices), then why shouldn't {} be
>the "index-by-name" interface, and [] the "index-by-ordinal" interface?
>(Does that mean [$x] is just a shorthand for {nth($x)}?)
Maybe not. You can use an object for a hash key -- it would be
confusing if certain objects (ordinals) got treated specially. (Don't
ask me why you'd want to use an ordinal as a hash name anyway....)
Alice could only look puzzled: she was thinking of bidirectional
iterators.
"You are ambivalent," the Knight said in an anxious tone: "let me recite
a poem to comfort you."
"Does it take many keystrokes?" Alice asked, for she had heard a good
deal of Perl poetry.
"It's long," said the Knight, "but above par. Everybody that hears it --
either it brings the tears into their eyes, xor-- '
"Xor what?" said Alice, as the Knight had paused for a bit.
"Xor it doesn't, you know. I call it my %hashdock{5th}."
"Oh, it's the fifth element, is it?" Alice said, trying to keep on-topic.
"No, you don't understand," the Knight said, looking a little vexed.
"That's just what it's *called*. It really is the third one."
"Well, what *is* the poem, then?" said Alice, who by this time was
completely bewildered.
"I was coming to that," replied the Knight. "The poem really is
SCALAR(0x809e94), and the implementation's my own class."
Alice felt her migraine coming back and wished she had paid more
attention when her sister tried to teach her Python.
-David "yes, that's MUCH less confusing!!" Green
> That's true. But it's got me thinking about the connection between
> arrays and "associative" arrays. In fact, the user doesn't need to know
> that a "hash" is implemented with a hash table, and an "array" isn't;
> and nothing stops you from using numbers as hash keys.
I believe Lua treats arrays just like hashes indexed by integers. I
don't know if they optimize anything under the hood.
Yes, but...
: And if you
: restrict your "hash" to numeric keys, Perl could notice and optimise it
: into an array. (Or integer keys, or positive integers, or a consecutive
: range of positive ints....)
What exactly do you mean by "could notice"? The point about the distinction
between .[] and .{} is that it makes it very easy for the compiler to
notice (at compile time) that .[] is going to be indexed in a standard
fashion, so it can optimize the heck out of it.
: If we consider a generic "data structure" type (which may or may not be
: optimised under the hood for integral indices), then why shouldn't {} be
: the "index-by-name" interface, and [] the "index-by-ordinal" interface?
: (Does that mean [$x] is just a shorthand for {nth($x)}?)
I think you just said what I said, sort of.
: (Will P6 arrays/hashes have an ".index" method to return what index they
: are? Or .index for the ordinal and .key for the name. (Although .key
: is perhaps a bit too close to .keys when the member is itself a hash.))
You mean individual hash/array elements knowing their key/index? I think
that's a feature looking for a way to slow down the common operations.
If you an array or hash element to remember its index/key, you should
ask for .pairs or .kv.
: Of course, there is one important difference between arrays and hashes:
: arrays are ordered. People do keep asking about ordered hashes, though.
: There's no reason you couldn't use ordinals with a hash anyway (the
: "order" may not be particularly meaningful, but sometimes you don't care
: -- of course, in those cases you'd usually iterate through it, but it
: could be handy to be able to say %h[rand(last)] (except that isn't
: really quite how you'd say it, but you get the idea)).
I expect people would mostly want to treat the hash as an ISAM, and
ask for the successor of the current entry without necessarily caring
about its index. But a hash with a real ordering could certainly
provide a .[] interface if it conforms to the usual semantics.
: Data structures might have default sorting by key (since that's what
: arrays have), but you could sort other ways... maybe nth takes a "by"
: adverb: "first"="first :by key", vs. "first :by value". Hm, it probably
: should take a closure (along the lines of "sort").
:
:
: Anyway, I'm rambling, but there's something to the idea of Perl offering
: some sort of generic "data structure" type...
That's what the hash interface is intended to provide.
: "my $hash is Struct;" would be the most general, no restrictions on
: keys, and no ordering, so Perl is free to use a hash table without
: worrying about the order.
:
: "my $array is Struct(keytype=>PosInt, ordering=>keywise)" has keys
: restricted to ints, and iterates in order by index, i.e. it's
: implemented as an ordinary array (where [$n] and {$n} refer to the same
: element).
That's why I extended the shape trait to specify hash keys.
: "Hash" and "Array" types would be shorthand for those kinds of Structs,
: but you could define your own by providing suitable shapes and
: orderings. (Hm, since Hashes are the most general, they're probably
: actually the base type, rather than "Struct", which does sound kinda
: silly, and probably sounds sillier if you're not used to C.)
Yup.
: I was also going to suggest an in-between type of structure, like a
: Collection in VB, that accepts anything for the key (or some useful
: restricted type?) but is ordered (in order of when elements were added).
: But I can't think of any character available for the sigil. =)
The sigil for that is %, with appropriate declaration.
Larry
I think that David Green was referring to the possibility that arrays that
are sparse and/or not 0-based cannot be optimized to the same extent as a
contiguous 0-based array can be, but they can _still_ be optimized to a
greater extent than a hash can be. Am I reading you right, David?
> : If we consider a generic "data structure" type (which may or may not
> : be optimised under the hood for integral indices), then why shouldn't
> : {} be the "index-by-name" interface, and [] the "index-by-ordinal"
> : interface? (Does that mean [$x] is just a shorthand for {nth($x)}?)
>
> I think you just said what I said, sort of.
In terms of the parenthetical at the end of David's statement, that would
only be true if nth(0) is the same as 1st and nth(-1) is the same as last.
But the advantage of restricting .[] to dealing with a zero-based
contiguous index is that it renders the need for C<1st, 2nd, 3rd, ...> and
C<nth()> ordinal notations moot: [0] always refers to the 1st element, [1]
always refers to the second element, [-1] always refers to the last
element, and so on. The ordinal notations could theoretically still be
kept for use in .{}, but given a choice between saying [0] and {1st} I
would pretty much always go with the former.
> : Of course, there is one important difference between arrays and
> : hashes: arrays are ordered. People do keep asking about ordered
> : hashes, though. There's no reason you couldn't use ordinals with a
> : hash anyway (the "order" may not be particularly meaningful, but
> : sometimes you don't care -- of course, in those cases you'd usually
> : iterate through it, but it could be handy to be able to say
> : %h[rand(last)] (except that isn't really quite how you'd say it, but
> : you get the idea)).
>
> I expect people would mostly want to treat the hash as an ISAM, and
> ask for the successor of the current entry without necessarily caring
> about its index.
ISAM?
> : Data structures might have default sorting by key (since that's what
> : arrays have), but you could sort other ways... maybe nth takes a "by"
> : adverb: "first"="first :by key", vs. "first :by value". Hm, it
> : probably should take a closure (along the lines of "sort").
Funny you should mention that, especially considering the (relatively)
recent discussion of revamping "sort", and noting that providing an
"ordering" for a hash would essentially be the same as providing the hash
with a "default" sorting mechanism.
=====
Jonathan "Dataweaver" Lang
__________________________________
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!
http://promotions.yahoo.com/new_mail
Er, that's what the last section of S9 was attempting to talk about...
Larry
>Somebody needs to talk me out of using A..Z for the simple cases.
>
>Larry
>
>
The Turing programming language uses splat to stand in for the length of
the array, so in Turing *a[*-1]* means what Perl 5 programmers mean when
they say *$a[-1]*.
However, splat is already quite heavily loaded in Perl 6. So I got to
thinking of Ada's "empty box" operator, *<>*. Maybe it would be a good
stand-in for the temporary "it" that represents a dimension's length.
So *@a[<>-3..<>-1]* could be the syntax to grab the last 3 three
elements of *@a*.
That might confuse users of languages that were not
C-syntax-influenced, who think that '*<>*' means "not equal". But
surely old Modula hacks like me are in a minority in the Perl world (and
Pascal programmers would never do Perl, would they? Algol, anybody?) So
maybe I'm the only one who runs the risk of that particular confusion. :-)
'Course, I don't pretend to understand all the possible existing
meanings that '*<*' and '*>*' already have in Perl 6, either.
=thom
Q. How many Malkieri does it take to screw in a light bulb?
A. Well, it better not be more than one.
> If a int1 (or int2 or nybble or other sub-addressable sized value)
> is being referred to, a similar issue arises since most machines
> these days have byte addressing, but do not have bit addressing. If
> you can't refer directly to it, the value will have to be extracted
> and re-inserted to provide "is rw" access.
I surely must be misunderstanding what you're saying... the way I
read that, you're suggesting that it will matter to Perl -- not only
to the compiler but even to user code -- how the underlying hardware
addresses its memory. I really hope that's not the case.
I thought Parrot would take care of all that fiddly platform-dependent
stuff so that Perl doesn't have to know or care about it. Perl6 code
shouldn't have to even *know* whether it's running on a big or little
(or middle) endian system, how many bits wide the BUS is, how many
bits are in an integer, whether there's a math coprocessor, whether
the instruction set is RISC or CISC, how many CPUs there are, what
kind of filesystem the underlying OS has, or whether the underlying
GUI is Win32 or Aqua or GTK or Qt. Perl6 code shouldn't have to know
that stuff *even* to call libraries written in another language; even
the compiler shouldn't have to know about it. Parrot should have a
wrapper API thingydo that makes it Just Work.
That's the point of having a VM, or such was my understanding.
I don't think I'm dreaming the impossible here, because Inform seems
to manage this stuff just fine, with either of the VMs it compiles to
(except for the parts about calling libraries written in other
languages, and having a GUI; Inform doesn't support those things).
You can write the code and compile it on a DOS system, stick the
binary on an ftp server, and J. Random Nerd can download it, and
assuming he has the appropriate version of the VM for his system, it
will run your code -- whether his system is SPARC/Solaris or Nintendo
Gameboy, your code will never know the difference; as far as your code
is concerned, it's running on the z-machine. Parrot should be like
that (except that Parrot's minimum requirements for the underlying
system will have to be a little higher, because we want to support
things like disk I/O and allocating more RAM after the program starts
running).
So if the underlying hardware doesn't know how to write a single byte,
then Parrot should have workaround code for that. Perl shouldn't even
need to know about it.
--
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,"ten.thgirb\@badanoj$/ --";$\=$ ;-> ();print$/
Perhaps I'm misunderstanding things, but I thought that Perl6 should be able to *optionally* allow
such things if platform specific fine-grained code tuning is necessary. Most of the time it
shouldn't be, but if it's required that .01% of the time why force someone to reach for another
language, regardless of how easy the languages are to integrate?
Cheers,
Ovid
=====
Silence is Evil http://users.easystreet.com/ovid/philosophy/indexdecency.htm
Ovid http://www.perlmonks.org/index.pl?node_id=17000
Web Programming with Perl http://users.easystreet.com/ovid/cgi_course/
> It took us some time discussing this... we weren't sure what tense
> you were using. At first we thought it might be the past subjective,
> but after a while, we decided to coin a new tense: the vapor tense. ;-)
Actually, it's not new at all; there's already a quite established
terminology for that tense. It's called the prophetic past.
HTH.HAND.
> int1, int2, int4, int8, int16, int32, int64, uint1, uint2, uint4,
> uint8, uint16, uint32, uint64, num32, num64, num128, complex32,
> complex64, complex128, ...
Well, all that is harmless enough, as long as I don't ever have the
misfortune to inherit maintenance of any code that *uses* those
lowlevel types.
We are also getting a "holds whatever size number you put in it, up to
the limits of available system resources" type, right? Good.
> say "@x = @x[]"; # prints @x = 1 2 3
Nice. Until now I wasn't sure I liked the new interpolation rules,
but this looks good.
What about BASIC? Aren't all the little kids today raised on BASIC? :)
--Andrew
Only if their parents are evil...
I was raised on BASIC and look what happened - now I'm writing Perl Quiz
of the Week solutions in Haskell!
> What about BASIC? Aren't all the little kids today raised on BASIC? :)
I don't know about the kids _today_, but for about twenty years
starting circa 1980 most home computers came with exactly one
programming language tool, and it was BASIC -- line-number BASIC
initially and QBasic later. A lot of the programmers who cut their
teeth on BASIC never made the transition to C, because C as a language
is so primitive compared to BASIC (not in terms of absolute
capabilities or performance but in terms of the amount of abstraction
provided) that it felt like stone knives and bearskins. Perl came
along and is actually even more high-level than BASIC, and a number of
us picked it up and never looked back.
aside-->
(As for me, in between BASIC and Perl I also picked up Inform and
Emacs Lisp, which are also much higher-level languages than C. I
tried on two separate occasions to make myself learn C (plus two
_additional_ attempts at C++) before I finally realized I don't
actually *want* to maintain legacy code written in a low-level
language, anyway. I also tried Python and PHP, but they didn't
take because I kept thinking how much easier things are in Perl.)
<--backtotopic
So yeah, there are a lot of BASIC-influenced people writing Perl code.
However, I don't think using <> for something other than not-equal is
going to be a big deal. Perl5 doesn't use <> for not-equal either,
and picking up a differently-named operator or two is *NOT* the hard
part of learning a different programming language. It's the paradigm
differences that will get you, and Perl6 is going to stand in good
stead there because it supports most of the paradigms out there to one
degree or another.
> ISAM?
From the RDBMS world, a kind of index I think, or something along
those lines. MySQL for example has a type of table called MyISAM.
JtUO> Jonathan Lang <datawe...@yahoo.com> writes:
>> ISAM?
>>> From the RDBMS world, a kind of index I think, or something along
JtUO> those lines. MySQL for example has a type of table called MyISAM.
it predates dbms stuff. it stands for indexed sequential access method
and it means you can do random or serial access to a set of
records. cobol and pl1 support isam internally and many systems used it.
uri
--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
Index Sequential Access Method
Invented by IBM in the '60s, provides fast random access to single
records and then allows for sequential access to the following records
in sorted order. It is very similar to the perl 5 sorted hashs.