Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[S09] "Whatever" indices and shaped arrays

6 views
Skip to first unread message

Jonathan Lang

unread,
Feb 23, 2007, 1:49:34 PM2/23/07
to p6l
From S09:
"When you use * with + and -, it creates a value of Whatever but Num,
which the array subscript interpreter will interpret as the subscript
one off the end of that dimension of the array."

"Alternately, *+0 is the first element, and the subscript dwims from
the front or back depending on the sign. That would be more
symmetrical, but makes the idea of * in a subscript a little more
distant from the notion of 'all the keys', which would be a loss, and
potentially makes +* not mean the number of keys."

If '*+0' isn't the first element, then '*+$x' is only meaningful if $x < 0.

That said, I think I can do one better:

Ditch all of the above. Instead, '*' always acts like a list of all
valid indices when used in the context of postcircumfix:<[ ]>.

If you want the last index, say '*[-1]' instead of '* - 1'.
If you want the first index, say '*[0]' instead of '* + 0'.

So the four corners of a two-dimensional array would be:

@array[ *[0]; *[0] ]; @array[ *[-1]; *[0] ];
@array[ *[0]; *[-1] ]; @array[ *[-1]; *[-1] ];

The only thing lost here is that '@array[+*]' is unlikely to point
just past the end of a shaped array. But then, one of the points of
shaped arrays is that if you point at an invalid index, you get a
complaint; so I don't see why one would want to knowingly point to
one.

--

Also, has the syntax for accessing an array's shape been determined
yet? If not, I'd like to propose the following:

@array.shape returns a list of lists, with the top-level list's
indices corresponding to the dimensions of the shape and each nested
list containing every valid index in that dimension. In boolean
context, the shape method returns true if the array is shaped and
false if not - though an unshaped array will otherwise pretend to be a
one-dimensional, zero-based, non-sparse, shaped array.

So:

@array.shape[0][2] # the third valid index of the first dimension of the shape
@array.shape[-1][0] # the first valid index of the last dimension of the shape
@array.shape[1] # every valid index of the second dimension of the shape
@array.shape[1][*] # same as @array.shape[1]

?@array.shape # is this a shaped array?

exists @array.shape[2] # does the array have a third dimension?
exists @array.shape[3][4] # does the fourth dimension have a fifth element?

+...@array.shape # how many dimensions does the shape have?
+...@array.shape[0] # how many indices does the first dimension have?

If we use this notation, then

@array[ *; * ]

is shorthand for

@array[ @array.shape[0]; @array.shape[1] ]

--
Jonathan "Dataweaver" Lang

Larry Wall

unread,
Feb 23, 2007, 3:36:18 PM2/23/07
to p6l
On Fri, Feb 23, 2007 at 10:49:34AM -0800, Jonathan Lang wrote:
: That said, I think I can do one better:

:
: Ditch all of the above. Instead, '*' always acts like a list of all
: valid indices when used in the context of postcircumfix:<[ ]>.

Ooh, shiny! Or at least, shiny on the shiny side...

: If you want the last index, say '*[-1]' instead of '* - 1'.


: If you want the first index, say '*[0]' instead of '* + 0'.

So the generic version of leaving off both ends would be *[1]..*[-2]
(ignoring that we'd probably write *[0]^..^*[-1] for that instead).

: So the four corners of a two-dimensional array would be:


:
: @array[ *[0]; *[0] ]; @array[ *[-1]; *[0] ];
: @array[ *[0]; *[-1] ]; @array[ *[-1]; *[-1] ];

A point against it visually is the nested use of [].

: The only thing lost here is that '@array[+*]' is unlikely to point


: just past the end of a shaped array. But then, one of the points of
: shaped arrays is that if you point at an invalid index, you get a
: complaint; so I don't see why one would want to knowingly point to
: one.

I would expect that to point to one off the end in the first dimension only,
which might make sense if that dimension is extensible:

my @array[*;2;2];

Adding something at +* would then add another 2x2 under it.

: Also, has the syntax for accessing an array's shape been determined


: yet? If not, I'd like to propose the following:
:
: @array.shape returns a list of lists, with the top-level list's
: indices corresponding to the dimensions of the shape and each nested
: list containing every valid index in that dimension. In boolean
: context, the shape method returns true if the array is shaped and
: false if not - though an unshaped array will otherwise pretend to be a
: one-dimensional, zero-based, non-sparse, shaped array.

That's more or less how I was thinking of it, though I hadn't got as
far as boolean context.

: So:


:
: @array.shape[0][2] # the third valid index of the first dimension of the
: shape
: @array.shape[-1][0] # the first valid index of the last dimension of the
: shape
: @array.shape[1] # every valid index of the second dimension of the shape
: @array.shape[1][*] # same as @array.shape[1]
:
: ?@array.shape # is this a shaped array?
:
: exists @array.shape[2] # does the array have a third dimension?
: exists @array.shape[3][4] # does the fourth dimension have a fifth element?
:
: +...@array.shape # how many dimensions does the shape have?
: +...@array.shape[0] # how many indices does the first dimension have?
:
: If we use this notation, then
:
: @array[ *; * ]
:
: is shorthand for
:
: @array[ @array.shape[0]; @array.shape[1] ]

Note also that multidimensional whatever gives us

@array[ ** ]

to mean

@array[ @@( @array.shape[*] ) ]

or some such. Though ** might want to be even smarter than that if
we want

@array[ 0; **; 42]

to dwim. That'd have to turn into something like:

@array[ 0; @@( @array.shape[*[1]..*[-2]] ); 42 ]

Also +** might return a shape vector, or maybe +«**.

Larry

Luke Palmer

unread,
Feb 23, 2007, 10:20:06 PM2/23/07
to Jonathan Lang, p6l
On 2/23/07, Jonathan Lang <dataw...@gmail.com> wrote:
'> I'm still debating the boolean context myself. I _think_ it will
> work; but I have a tendency to miss intricacies. You might instead
> want to require someone to explicitly check for definedness or
> existence instead of merely truth; or you might not.

I should chime in something here. It may not be practical for Perl,
given how much we have already relied on its opposite, but it is still
worth considering:

I have been extremely satisfied with Ruby's boolean truth model: nil
and false are false, everything else is true. So the empty string is
true, 0 is true, "0" is certainly true. I think it's the same reason
that I like Haskell's function call model: that is, function
application binds most tightly, everything else has various looser
precedence.

I think the nice thing about these two is their extreme simplicity.
In Haskell, when I read:

foo x ! bar

I don't need to think for a fraction of a second to associate that
correctly in my mind. Likewise, in Ruby, when I write:

while line = gets

I don't need to think for a fraction of a second about edge cases of
gets. Gets returns a string when it reads something, and nil on EOF,
that's all I need to know. And the simplicity has a way of
propagating to other areas of the language, as in that example, where
gets is able to return the most obvious thing for EOF and have it work
correctly.

So, yeah, simple rules can be a blessing if you find the right ones.
In particular, since I use boolean context a lot (i.e. without
explicit compare operators), I'm a fan of as much boolean
predictability as I can get. Even if we don't get the same simple
model of booleans as ruby, I'd like to keep the number of
boolean-context overloaded objects reasonably small. This gives
functions the freedom to return false or undef as a failure mode, when
it is convenient for it to function that way.

Luke

Jonathan Lang

unread,
Feb 24, 2007, 12:32:52 PM2/24/07
to p6l
Jonathan Lang wrote:

> Larry Wall wrote:
> > : If you want the last index, say '*[-1]' instead of '* - 1'.
> > : If you want the first index, say '*[0]' instead of '* + 0'.
> >
> > So the generic version of leaving off both ends would be *[1]..*[-2]
> > (ignoring that we'd probably write *[0]^..^*[-1] for that instead).
>
> Correct - although that assumes that the indices are consecutive (as
> opposed to, say, 1, 2, 4, 8, 16...); this version of * makes no such
> assumption.

Another thought: '*[1..-2]' or '*[0^..^-1]' would do the trick here -
except for the fact that the Range 1..-2 doesn't normally make sense.
Suggestion: when dealing with Ranges in unshaped arrays, negative
endpoints are treated like negative indices (i.e., '$_ += +@array').

In effect, using * as an array of indices gives us the ordinals
notation that has been requested on occasion: '*[0]' means 'first
element', '*[1]' means 'second element', '*[-1]' means 'last element',
'*[0..2]' means 'first three elements', and so on - and this works
regardless of what the actual indices are.

> Like I said, I tend to miss intricacies. For instance, I never
> considered what would be involved in applying a subscriptor to a
> multidimensional Whatever (e.g., what can you do with '**[...]'?).
> Part of that is that I'm not yet comfortable with multidimensional
> slices (or arrays, for that matter); when reading about them, I keep
> on getting the feeling that there's something going on here that the
> big boys know about that I don't - implicit assumptions, et al.

I think I've got a better grip on it now. Here's how I understand it to work:

A multidimensional array is defined by providing a list of lists, each
giving all of the valid indices along one axis (i.e., in one
dimension). The overall shape of the array will be rectangular, or a
higher-dimensional analog of rectangular. There may be gaps in the
indices (in which case the array is a sparse array as well as a
multidimensional array); but if there are, the gaps also conform to
the rectangular structure: it's as if you carved a solid rectangle
into two or more rectangular pieces and pulled them apart a bit. That
is, @array[-1, +1; -1 +1] is effectively a 2x2 square array with valid
x-indices of -1 and +1 and valid y-indices of -1 and +1.

To access an element in a multidimensional array, use a
semicolon-delimited list of indices in the square braces:
'@cube[1;1;1]' will access the center element of a [^3;^3;^3] shaped
array, while '@array[*;*;1]' will access a 3x3 horizontal slice of it.

When putting together a list literal, things work a bit differently.
Create a one-dimensional literal by means of a comma-delimited list of
values; create a two-dimensional literal by means of a
semicolon-delimited list of comma-delimited lists of values:

1, 2, 3 # one-dimensional list literal with a length of 3
(1, 2, 3; 4, 5, 6) # two-dimensional list literal with a length of 2
and a width of 3.
(1; 2; 3) # two-dimensional list literal with a length of 3 and a width of 1.

I would guess that you would build higher-dimensional "literals" by
nesting parentheses-enclosed semicolon-delimited lists:

(( 0, 1; 2, 3; 4, 5; 6, 7; 8, 9);
(10, 11; 12, 13; 14, 15; 16, 17; 18, 19);
(20, 21; 22, 23; 24, 25; 26, 27; 28, 29))
# three-dimensional list literal with a length of 3, a width of 5, and
a height of 2.

The outermost set of semicolons delimits the first dimension, and the
commas delimit the last dimension. That is, semicolon-delimited lists
nest, and comma-delimited lists flatten.

Furthermore, the "list literal" gets assigned to the array by means of
ordinal coordinates:

my @cube[-1..+1; -1..+1; -1..+1] =
((1, 2, 3; 4, 5, 6; 7, 8, 9);
(10, 11, 12; 13, 14, 15; 16, 17, 18);
(19, 20, 21; 22, 23, 24; 25, 26, 27));

would be equivalent to

my @cube[1..3; 1..3; 1..3];
@cube[**[0; **]] = (1, 2, 3; 4, 5, 6; 7, 8, 9);
@cube[**[0; **]] = (10, 11, 12; 13, 14, 15; 16, 17, 18);
@cube[**[0; **]] = (19, 20, 21; 22, 23, 24; 25, 26, 27);

or

my @cube[1..3; 1..3; 1..3];
@cube[**[0; 0; *]] = 1, 2, 3;
@cube[**[0; 1; *]] = 4, 5, 6;
@cube[**[0; 2; *]] = 7, 8, 9;
@cube[**[1; 0; *]] = 10, 11, 12;
@cube[**[1; 1; *]] = 13, 14, 15;
@cube[**[1; 2; *]] = 16, 17, 18;
@cube[**[2; 0; *]] = 19, 20, 21;
@cube[**[2; 1; *]] = 22, 23, 24;
@cube[**[2; 2; *]] = 25, 26, 27;

or

my @cube[1..3; 1..3; 1..3];
@cube[**[0; 0; 0]] = 1;
@cube[**[0; 0; 1]] = 2;
@cube[**[0; 0; 2]] = 3;
@cube[**[0; 1; 0]] = 4;
@cube[**[0; 1; 1]] = 5;
@cube[**[0; 1; 2]] = 6;
...

where

say @cube[**[1; 1; 1]];

would be equivalent to

say @cube[0; 0; 0];

Do I have the general idea?

--
Jonathan "Dataweaver" Lang

David Green

unread,
Feb 27, 2007, 3:39:04 AM2/27/07
to p6l
On 2/24/07, Jonathan Lang wrote:
>In effect, using * as an array of indices gives us the ordinals
>notation that has been requested on occasion: '*[0]' means 'first
>element', '*[1]' means 'second element', '*[-1]' means 'last
>element',
>'*[0..2]' means 'first three elements', and so on - and this works
>regardless of what the actual indices are.

Using * that way works, but it still is awkward, which makes me think
there's something not quite dropping into place yet. We have the
notion of "keyed" indexing via [] and "counting"/ordinal indexing via
[*[]], which is rather a mouthful. So I end up back at one of
Larry's older ideas, which basically is: [] for counting, {} for keys.

To put a slight twist on it: instead of adding {}-indexing to arrays,
consider that what makes something an "array" is that it doesn't have
keys -- it's a collection of things that you can count through, as
opposed to a collection that you search through by meaningful
keys/names/tags/references/etc. (E.g., consider positional vs. named
params, and how they naturally map onto an array and a hash
respectively.)

Now something that is countable doesn't have to have meaningful keys,
but any keyed collection can be counted through; hence it makes sense
to give hashes an array-like [] accessor for getting the
first/last/nth item in the hash. In fact, this is basically what
%h.values gives you -- turning the hash values into an array (well, a
list). Saying %h[n] would amount to a direct way of saying
@(%h.values)[n].

This becomes much more handy in P6, because hashes can be ordered.
(Not that there's anything stopping you from counting through an
unordered hash; %h[0] is always the first element of %h, you just
might not know what that is, the same as with %h.values.) If Perl
knows how to generate new keys on the fly (say, because your possible
hash keys were declared as something inc-/dec-rementable), then you
can even access elements off the ends of your hash (push/unshift).

What about shaped arrays? A "shape" means the indices *signify*
something (if they didn't, you wouldn't care, you'd just start at
0!). So they really are *keys*, and thus should use a hash (which
may not use any hash tables at all, but it's still an associative
array because it associates meaningful keys with elements). I'm not
put off by calling it a hash -- I trust P6 to recognise when I
declare a "hash" that is restricted to consecutive int keys, is
ordered, etc. and to optimise accordingly.

If there are no meaningful lookup keys, if all I can do to get
through my list is count the items, then an array is called for, and
it can work in the usual way: start at 0, end at -1. It is useful to
be able to count past the ends of an array, and * can do this by
going beyond the end: *+1, *+2, etc., or before the beginning: *-1,
*-2, etc. (This neatly preserves the notion of * as "all the
elements" -- *-1 is the position before everything, and *+1 is the
position after everything else.)


Well, at least this keeps the easy stuff (counting) easy, and the
barely-harder stuff (keying) possible. In fact, since hashes would
always have both views available, nothing is lost; we get ordinals
for hashes, shaped collections, and ones that you can pass to a sub
without losing their shape, it solves the problem of distinguishing
between ordinal vs. "funny" indices (and the related issues of
wrap-around), you can count past the edges, and all while preserving
familiar array behaviour (especially for P5 veterans), the meaning of
* as "everything", and uncluttered syntax.


-David

Jonathan Lang

unread,
Feb 27, 2007, 6:08:58 PM2/27/07
to David Green, p6l
David Green wrote:
> On 2/24/07, Jonathan Lang wrote:
> >In effect, using * as an array of indices gives us the ordinals
> >notation that has been requested on occasion: '*[0]' means 'first
> >element', '*[1]' means 'second element', '*[-1]' means 'last
> >element',
> >'*[0..2]' means 'first three elements', and so on - and this works
> >regardless of what the actual indices are.
>
> Using * that way works, but it still is awkward, which makes me think
> there's something not quite dropping into place yet. We have the
> notion of "keyed" indexing via [] and "counting"/ordinal indexing via
> [*[]], which is rather a mouthful. So I end up back at one of
> Larry's older ideas, which basically is: [] for counting, {} for keys.

What if you want to mix the two? "I want the third element of row 5".
In my proposal, that would be "@array[5, *[2]]"; in your proposal,
there does not appear to be a way to do it.

Unless the two approaches aren't mutually exclusive: "@array{5,
*[2]}". That is, allow subscripted Whatevers within curly braces for
to enable the mixing of ordinals and keys. Since this is an unlikely
situation, the fact that nesting square braces inside curly braces is
a bit uncomfortable isn't a problem: this is a case of making hard
things possible, not making easy things easy.

> What about shaped arrays? A "shape" means the indices *signify*
> something (if they didn't, you wouldn't care, you'd just start at
> 0!). So they really are *keys*, and thus should use a hash (which
> may not use any hash tables at all, but it's still an associative
> array because it associates meaningful keys with elements). I'm not
> put off by calling it a hash -- I trust P6 to recognise when I
> declare a "hash" that is restricted to consecutive int keys, is
> ordered, etc. and to optimise accordingly.

The one gotcha that I see here is with the possibility of
multi-dimensional arrays. In particular, should multi-dimensional
indices be allowed inside square braces? My gut instinct is yes;
conceptually, "the third row of the fourth column" is perfectly
reasonable terminology to use. The thing that would distinguish []
from {} would be a promise to always use zero-based, consecutive
integers as your indices, however many dimensions you specify. With
that promise, you can always guarantee that the wrap-around semantics
will work inside [], while nobody will expect them to work inside {}.

In short, the distinction being made here isn't "unshaped" vs.
"shaped"; it's "ordinal indices" vs. "named indices", or "ordinals"
vs. "keys".

That said, note that - in the current conception, at least - one of
the defining features of a shaped array is that trying to access
anything outside of the shape will cause an exception. How would
shapes work with the ordinals-and-keys paradigm?

First: Ordinals have some severe restrictions on how they can be
shaped, as specified above. The only degrees of freedom you have are
how many dimensions are allowed and, for each dimension, how many
ordinals are permitted. Well, also the value type (although the key
type is fixed as "Int where 0..*". So you could say something like:

my @array[2, 3, *]

...which would mean that the array must be three-dimensional; that the
first dimension is allowed two ordinals, the second is allowed three,
and the third is allowed any number of them - i.e., 'my @array[^2; ^3;
0..*]' in the current syntax. Or you could say:

my @array[2, **, 2]

...meaning that you can have any number of dimensions, but the first
and the last would be constrained to two ordinals each: 'my @array[^2;
**; ^2]'.

Note the use of commas above. Since each dimension can only take a
single value (a non-negative integer), there's no reason to use a
multidimensional list to define the shape. Personally, I like this
approach: it strikes me as being refreshingly uncluttered.

Furthermore, you could do away with the notion of "shaped vs.
unshaped": just give everything a default shape. The default shape
for arrays would be '[*]' - that is, one dimension with an
indeterminate number of ordinals.

Meanwhile, shapes for {} would continue to use the current syntax.
'[$x, $y, $z]' would be nearly equivalent to '{0..^$x; 0..^$y;
0..^$z}'.

> If there are no meaningful lookup keys, if all I can do to get
> through my list is count the items, then an array is called for, and
> it can work in the usual way: start at 0, end at -1. It is useful to
> be able to count past the ends of an array, and * can do this by
> going beyond the end: *+1, *+2, etc., or before the beginning: *-1,
> *-2, etc. (This neatly preserves the notion of * as "all the
> elements" -- *-1 is the position before everything, and *+1 is the
> position after everything else.)

Regardless, I would prefer this notion to the "offset from the
endpoint" notion currently in use. Note, however, that [*-1] wouldn't
work in the ordinals paradigm; there simply is nothing before the
first element. About the only use I could see for it would be to
provide an assignment equivalent of "unshift": '@array[*-1] = $x'
could be equivalent to 'unshift @array, $x'. But note that, unlike
the 'push'-type assignments, this would change what existing ordinals
point to.

Meanwhile, {*-1} would only make sense in cases where keys are ordered
and new keys can be auto-generated. Note also that {*+$x} is
compatible with {*[$x]}: the former would reference outside of the
known set of keys, while {*[$x]} would reference within them.

--
Jonathan "Dataweaver" Lang

David Green

unread,
Mar 5, 2007, 11:36:36 PM3/5/07
to p6l
On 2/27/07, Jonathan Lang wrote:

>David Green wrote:
>>So I end up back at one of Larry's older ideas, which basically is:
>>[] for counting, {} for keys.
>
>What if you want to mix the two? "I want the third element of row
>5". In my proposal, that would be "@array[5, *[2]]"; in your
>proposal, there does not appear to be a way to do it.
>
>Unless the two approaches aren't mutually exclusive: "@array{5,
>*[2]}". [...] Since this is an unlikely situation, the fact that
>nesting square braces inside curly braces is a bit uncomfortable
>isn't a problem: this is a case of making hard things possible, not
>making easy things easy.

Oh, good point. Yes, I think that mixing them together that way makes sense.
It also suggests that you could get at the named keys by applying {} to *:
%foo[0, 1, *{'bar'}]; #first column, second row, "bar" layer

>The one gotcha that I see here is with the possibility of
>multi-dimensional arrays. In particular, should multi-dimensional

>indices be allowed inside square braces? [...] With that promise,

>you can always guarantee that the wrap-around semantics will work
>inside [], while nobody will expect them to work inside {}.

Right, I don't see a problem with handling any number of dimensions that way.

>Furthermore, you could do away with the notion of "shaped vs.
>unshaped": just give everything a default shape. The default shape
>for arrays would be '[*]' - that is, one dimension with an
>indeterminate number of ordinals.
>
>Meanwhile, shapes for {} would continue to use the current syntax.
>'[$x, $y, $z]' would be nearly equivalent to '{0..^$x; 0..^$y; 0..^$z}'.

Agreed.

>>it can work in the usual way: start at 0, end at -1. It is useful
>>to be able to count past the ends of an array, and * can do this by
>>going beyond the end: *+1, *+2, etc., or before the beginning: *-1,
>>*-2, etc. (This neatly preserves the notion of * as "all the
>>elements" -- *-1 is the position before everything, and *+1 is the
>>position after everything else.)
>
>Regardless, I would prefer this notion to the "offset from the
>endpoint" notion currently in use. Note, however, that [*-1]
>wouldn't work in the ordinals paradigm; there simply is nothing
>before the first element. About the only use I could see for it
>would be to provide an assignment equivalent of "unshift":
>'@array[*-1] = $x' could be equivalent to 'unshift @array, $x'. But
>note that, unlike the 'push'-type assignments, this would change
>what existing ordinals point to.

I figured that *-1 or *+1 would work like unshift/push, which
effectively does change what the ordinals point to (e.g. unshifting
a P5 array). If the array is not extensible, then it should fail in
the same way as unshift/push would.

>Meanwhile, {*-1} would only make sense in cases where keys are
>ordered and new keys can be auto-generated. Note also that {*+$x}
>is compatible with {*[$x]}: the former would reference outside of
>the known set of keys, while {*[$x]} would reference within them.

Exactly.


-David

Larry Wall

unread,
Mar 6, 2007, 12:11:01 PM3/6/07
to p6l
I like it. I'm a bit strapped for time at the moment, but if you send
me a patch for S09 I can probably dig up a program to apply it with. :)

Larry

Jonathan Lang

unread,
Mar 6, 2007, 4:35:40 PM3/6/07
to p6l
Larry Wall wrote:
> I like it. I'm a bit strapped for time at the moment, but if you send
> me a patch for S09 I can probably dig up a program to apply it with. :)

Could someone advise me on how to create patches?

--
Jonathan "Dataweaver" Lang

Juerd Waalboer

unread,
Mar 6, 2007, 7:00:55 PM3/6/07
to perl6-l...@perl.org
Jonathan Lang skribis 2007-03-06 13:35 (-0800):

> Could someone advise me on how to create patches?

Single file:

diff -u oldfile newfile

Entire tree:

diff -Nur oldtree/ newtree/

See also diff(1), and note that when diffing trees, you want to
distclean them first :)
--
korajn salutojn,

juerd waalboer: perl hacker <ju...@juerd.nl> <http://juerd.nl/sig>
convolution: ict solutions and consultancy <sa...@convolution.nl>

Ik vertrouw stemcomputers niet.
Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.

Jonathan Lang

unread,
Mar 7, 2007, 9:57:16 AM3/7/07
to p6l
OK: before I submit a patch, let me make sure that I've got the
concepts straight:

"@x[0]" always means "the first element of the array"; "@x[-1]" always
means "the last element of the array"; "@x[*+0]" always means "the
first element after the end of the array"; "@x[*-1]" always means "the
first element before the beginning of the array". That is, the
indices go:

..., *-3, *-2, *-1, 0, 1, 2, ..., -3, -2, -1, *+0, *+1, *+2, ...
^ ^
| |
first last

As well, a Whatever in square braces is treated as an array of the
valid positions; so @x[*] is equivalent to @x[0..-1].

If you want to use sparse indices and/or indices that begin somewhere
other than zero, access them using curly braces. Consider an array
with valid indices ranging from -2 to +2: @x{-2} means "element -2",
which would be equivalent to @x[0]; @x{+2} means "element 2", which
would be equivalent to @x[-1]. Likewise, @x{0} is the same as @x[2],
@x{-3} is the same as @x[*-1], @x{+3} is the same as @x[*+0], and so
on. If @y has a series of five indices that start at 1 and double
with each step, then @y{1} will be the same as @y[0]; @y{4} will be
the same as @y[2], and so on.

A Whatever in curly braces is treated as an array of the valid index
names; so @x{*} means @x{-2..+2}, and @y{*} means @y{1, 2, 4, 8, 16}.
Because it is treated as an array, individual index names can be
accessed by position: @x{*[0]} is a rather verbose way of saying
@x[0]. This lets you embed ordinal indices into slices involving
named indices. Conversely, using *{...} inside square braces lets you
embed named indices into slices involving ordinal indices: @x[*{-2}]
is the same as @x{-2}.

Multidimensional arrays follow the above conventions for each of their
dimensions; so a single-splat provide a list of every index in a given
dimension, a 0 refers to the first index in that dimension, and so on.
A double-splat extends the concept to a multidimensional list that
handles an arbitrary number of dimensions at once.

--

Commentary: I find the sequence of ordinals outlined above to be a bit
messy, especially when you start using ranges of indices: you need to
make sure that @x[0..-1] dwims, that @x[-1..(*+0)] dwims, that
@x[(*-2)..(*+3)] dwims, and so on. This is a potentially very ugly
process. As well, the fact that @x[-1] doesn't refer to the element
immediately before @x[0] is awkward, as is the fact that @x[*-1]
doesn't refer to the element immediately before @x[*+0]. IMHO, it
would be cleaner to have @x[n] count forward and backward from the
front of the array, while @x[*+n] counts forward and backward from
just past the end of the array:

..., -3, -2, -1, 0, 1, 2, ..., *-3, *-2, *-1, *+0, *+1, *+2, ...
^ ^
| |
first last

So perl 5's "$x[-1]" would always translate to "@x[*-1]" in perl 6.
Always. Likewise, "@x[+*]" would be the same as "@x[*+0]". (In fact,
the semantics for "@x[*+n]" follows directly from the fact that an
array returns the count of its elements in scalar context.) And
"@x[*]" would be the same as "@x[0..^*]" or "@x[0..(*-1)]".

You would lose one thing: the ability to select an open-ended Range of
elements. For a five-element list, "@x[1..^*]" means "@x[1, 2, 3,
4]", not "@x[1, 2, 3, 4, 5, 6, 7, 8, ...]".

Technically, one could say "@x{+*}" to reference the index that
coincides with the number of indices; but it would only be useful in
specific cases, such as referencing the last element of a one-based
contiguous array.

--
Jonathan "Dataweaver" Lang

David Green

unread,
Mar 7, 2007, 8:43:17 PM3/7/07
to p6l
On 3/7/07, Jonathan Lang wrote:
><summary snipped>

Looks good to me.

>As well, the fact that @x[-1] doesn't refer to the element
>immediately before @x[0] is awkward, as is the fact that @x[*-1]
>doesn't refer to the element immediately before @x[*+0]. IMHO, it
>would be cleaner to have @x[n] count forward and backward from the
>front of the array, while @x[*+n] counts forward and backward from
>just past the end of the array:

I suggested that at one point, so I'd agree that makes sense too. It
avoids the discontinuity at either end of the array -- although
arguably, points off the end of a list aren't in the same boat as
elements that actually exist, so the discontinuity might be
conceptually justified. (Make the weird things look weird?)

>(In fact, the semantics for "@x[*+n]" follows directly from the fact
>that an array returns the count of its elements in scalar context.)
>And "@x[*]" would be the same as "@x[0..^*]" or "@x[0..(*-1)]".

That's an elegance in its favour.

One possible downside is that it wouldn't work for cyclic/wrap-around
arrays (where the indices are always interpreted mod n) -- since any
number would always refer to an existing element. Oh -- but if an
index isn't a plain counter, then it should be a named key, so scrap
that.
(The question then is: how to have "reducible" hash keys? By which I
mean different keys that get "reduced" to the same thing, e.g. %x{1}
=== %x{5} === %x{9} === %x{13}, etc. Presumably you can just
override the .{} method on your hash, right?)

>You would lose one thing: the ability to select an open-ended Range
>of elements. For a five-element list, "@x[1..^*]" means "@x[1, 2,
>3, 4]", not "@x[1, 2, 3, 4, 5, 6, 7, 8, ...]".

Except wouldn't the .. interpret the * before the [] did? So 1..*
would yield a range-object from 1 to Inf, and then the array-deref
would interpret 1..Inf accordingly.

Actually, it seems more useful if the * could mean the count; you can
always say 1..Inf if that's what you want, but otherwise how would
you get [1..^*] meaning [1,2,3,4]? Perhaps the range could note when
it's occurring in []-context, and interpret the * as count rather
than as Inf?


-David

Dr.Ruud

unread,
Mar 8, 2007, 4:44:49 AM3/8/07
to perl6-l...@perl.org
David Green schreef:
> Jonathan Lang:


>> (In fact, the semantics for "@x[*+n]" follows directly from the fact
>> that an array returns the count of its elements in scalar context.)
>> And "@x[*]" would be the same as "@x[0..^*]" or "@x[0..(*-1)]".
>
> That's an elegance in its favour.

In Perl5 a "+" can creep in, for example:

$ perl -wle '$s = "-123"; $n = -123; print -$s; print -$n'
+123
123

so maybe it is not a bad idea to keep treating a "unary +" as (almost) a
no-op.

--
Affijn, Ruud

"Gewoon is een tijger."

0 new messages