Dimension of slices; scalars versus 1-element arrays?

Craig DeForest

unread,

Jan 8, 2005, 1:37:06 PM1/8/05

to perl6-l...@perl.org

I just re-read Synopsis 9, which covers PDL-related actions and array slicing,
and came to the conclusion that either (A) there's a hole in the syntax as it
is lain out, (B) I lack sufficient understanding of what has been thought
out so far, or (C) that part of the language definition isn't finished yet.

Is the perl6 expression
@a[4; 0..5];
a 1x6 array (probably correct)? Or a 6 array (probably not correct)? If the
former, how do you specify that you want the latter? There's a significant
difference between the two -- for example, if '4' is some list expression
that happens to have just one element, you don't want the whole shape of the
output array to change.

The problem is the near-universal wart that scalars are not merely lists with
but a single element, so you need to be able to tell the difference between a
dimension that exists but has size 1, and a dimension that doesn't exist.
(In perl5, all arrays are one-dimensional, so the issue is sidestepped by the
presence of scalar/list context.)

Perl5/PDL solves the problem with 'extra-parenthesis' syntax in slicing, and
with zero-size dimensions in operators like range() [no relation to perl6
range operators] that take a dimension size list. Examples:
$a( 4, 0:5 ); # This perl5/PDL slice is a 1x6-array
$a( (4), 0:5 ); # This perl5/PDL slice is a 6-array
$a->range($xy,[1,3]); # This perl5/PDL range is a 1x3-array
$a->range($xy,[0,3]); # This perl5/PDL range is a 3-array
The extra parens only remove the dimension if they would otherwise be no-ops.

It is not (yet) clear to me whether the extra parens syntax is a good solution
for perl 6 slices.

David Storrs

unread,

Jan 10, 2005, 10:33:11 AM1/10/05

to perl6-l...@perl.org

On Sat, Jan 08, 2005 at 11:37:06AM -0700, Craig DeForest wrote:

> @a[4; 0..5];
> a 1x6 array (probably correct)? Or a 6 array (probably not
> correct)?

For the ignorant among us (such as myself), what is a 6 array? Google
and pdl.perl.org did not yield any immediate answers.

--Dks

--
dst...@dstorrs.com

Michael Walter

unread,

Jan 10, 2005, 12:02:12 PM1/10/05

to perl6-l...@perl.org

6 elements..?

Larry Wall

unread,

Jan 10, 2005, 1:04:55 PM1/10/05

to perl6-l...@perl.org

On Sat, Jan 08, 2005 at 11:37:06AM -0700, Craig DeForest wrote:

: I just re-read Synopsis 9, which covers PDL-related actions and array slicing,

: and came to the conclusion that either (A) there's a hole in the syntax as it
: is lain out, (B) I lack sufficient understanding of what has been thought
: out so far, or (C) that part of the language definition isn't finished yet.

I expect C is closest to the mark. :-)

: Is the perl6 expression

: @a[4; 0..5];
: a 1x6 array (probably correct)? Or a 6 array (probably not correct)?

Certainly the former. I don't think dimensions should ever disappear
accidentally.

: If the former, how do you specify that you want the latter?

I don't know offhand. I see both the lure and the danger of the
extra parens, so we'll probably not go that route. We could find
some keyword that destroys the current dimension while supplying a
scalar argument:

@a[gone(4); 0..5];

Or maybe we want some notation that is explicitly a null slice with
a scalar value, maybe something like:

@a[() but 4; 0..5];

or

@a[():pick(4); 0..5];

or

@a[4 but dim(); 0..5];

or maybe even something strange like:

@a[()=4; 0..5];

But I'm certainly open to other suggestions.

Larry

Craig DeForest

unread,

Jan 10, 2005, 1:12:16 PM1/10/05

to perl6-l...@perl.org, David Storrs

Sorry, too terse :-)

I meant "...a two dimensional array with 1x6 elements (probably correct)? Or
a one dimensional array with 6 elements (probably not correct)?"

Cheers,
Craig

Quoth David Storrs on Monday 10 January 2005 08:33 am,

Craig DeForest

unread,

Jan 10, 2005, 5:56:34 PM1/10/05

to perl6-l...@perl.org, Larry Wall

Hmmmm... It would be easy to distinguish the slicing cases if it were easier
to distinguish between a number and a list containing just [In fact, that is
more or less how perl5/PDL's arg-list-based slicer ('mslice') does things.]

At the top of Synopsis 9, there's a discussion about exactly that:

> @array[0..10; 42; @x]
>is really short for
> @array.postcircumfix:<[ ]>( <== [0..10], [42], [@x] );
>though in the list of lists form, a bare number is interpreted as if it were
>a list of one element, so you can also say:
> @array.postcircumfix:<[ ]>( <== [0..10], 42, [@x] );

I believe that this is Wrong, because the distinction that is being blurred
turns out to be important. Ideally, those two postcircumfix cases should do
different things (slice versus index).

But you can have your cake and eat it too. If those postcircumfix cases were
really different, one could take advantage of the still-unused postfix unary
'!' (for example) to distinguish:
@array[0..10; 42!; @x] [[ maps to ]] (<== [0..10], 42 , [@x] );
@array[0..10; 42; @x] [[ maps to ]] (<== [0..10], [42], [@x] );

Then the default behavior is consistent (semicolons denote lists of lists)
but there is an "escape hatch" that lets you make a list of scalars-and-lists.

Quoth Larry Wall on Monday 10 January 2005 11:04 am,

Craig DeForest

unread,

Jan 10, 2005, 6:06:28 PM1/10/05

to perl6-l...@perl.org, Larry Wall

Double hmmm.... That would also supplant the lone '*' wart in indexing
syntax: instead of saying
@array[0..10;*;@x]
you could say
@array[0..10; !; @x]

Presumably, the '!;' would expand to the scalar undef value, which could be
interpreted as "do nothing on this axis", while in the related construct
@a=();
@array[0..10; @a; @x]
the '@a;' would instead expand to a list ref pointing to the empty list, which
could be interpreted instead as "return the null set" -- minimizing surprise
if @a is only occasionally empty.

(Of course, substitute your favorite postfix unary character instead of '!';
'*' would work just as well...)

Quoth Craig DeForest on Monday 10 January 2005 03:56 pm,

Juerd

unread,

Jan 10, 2005, 7:02:10 PM1/10/05

to Craig DeForest, perl6-l...@perl.org

Craig DeForest skribis 2005-01-10 15:56 (-0700):

> @array[0..10; 42!; @x] [[ maps to ]] (<== [0..10], 42 , [@x] );

@array[0..10; 1.40500611775288e+51; @x]? ;-)

Juerd

Dave Whipp

unread,

Jan 12, 2005, 12:06:35 AM1/12/05

to perl6-l...@perl.org

Larry Wall wrote:
> On Sat, Jan 08, 2005 at 11:37:06AM -0700, Craig DeForest wrote:

> : Is the perl6 expression
> : @a[4; 0..5];
> : a 1x6 array (probably correct)? Or a 6 array (probably not correct)?
>
> Certainly the former. I don't think dimensions should ever disappear
> accidentally.

Except when you assign in scalar context -- which I guess isn't accidental.

We know that

@a = (1, 2, 3);
$b = @a[1];

Loses the dimension as a DWIM.

So perhaps we could say that assigning to a lower dimension always gets
rid of a dimension of size 1 -- or error if it can't:

@c = ( 1,2,3 ; 4,5,6 );
@d[*;*] = @c[ 1 ; * ]; # @d is 2d
@e[*] = @c[ 1 ; * ]; # @e is 1d
@f = @c[1;1]; # @f is 2d: no dimensions lost

@g[*] = @c[ 0,1 ; 0,1 ]; # error: cannot collapse

as our disambiguator.

{ @c[;*] but shape(*) } wouldn't work, because the :shape option leaves
additional dimensions open.

Dave.

Craig DeForest

unread,

Jan 12, 2005, 12:58:05 AM1/12/05

to perl6-l...@perl.org, Dave Whipp

On Tuesday 11 January 2005 10:06 pm, Dave Whipp wrote:
> We know that
>
> @a = (1, 2, 3);
> $b = @a[1];
>
> Loses the dimension as a DWIM.
>
> So perhaps we could say that assigning to a lower dimension always gets
> rid of a dimension of size 1 -- or error if it can't:
>
> @c = ( 1,2,3 ; 4,5,6 );
> @d[*;*] = @c[ 1 ; * ]; # @d is 2d
> @e[*] = @c[ 1 ; * ]; # @e is 1d
> @f = @c[1;1]; # @f is 2d: no dimensions lost

Hmmm.... I'm sorry, but I'm not quite following what you mean by
the '=' in this case. I've been thinking of '=' as a full-on,
dimensional-context-free assignment, and of ':=' as a shaped, elementwise
assignment, by analogy to the current perl5/PDL setup. In that paradigm,
"@e[*] = @c[1;*]" doesn't mean anything, since the LHS isn't really a
full-on lvalue -- it's an array slice with (already) a definite shape (an
"l2value"?).

But I could just be confused about intent here. I'm used to thinking of
piddles, which are pretty regimented -- they have a definite size along each
axis. Do perl6 multi-dimensional arrays have definite sizes, or are they
"just" lists of lists of whatevers, under the hood? Each of those structures
would be very useful for an almost completely different set of things.

I guess Larry was right when he hinted that there're still dragons here... so,
er, sorry to have opened a can of wyrms...

David Green

unread,

Jan 12, 2005, 7:40:46 AM1/12/05

to perl6-l...@perl.org

OK, so at issue is the difference between an element of an array ($p5[1])
and a slice (that might contain only one element, @p5[1]), only
generalised to n dimensions. (A problem which didn't exist in P5 because
there were no higher dimensions!)

And we don't want @B[4; 0..6] to reduce to a 1-D 6-array because then
dimensions would just be disappearing into some other...dimension.
(To lose one dimension is a misfortune, to lose two is plane careless....)
On the other hand, nobody wants to have to write @B[gone(4)] every time
you need an array element.

Given $a=42 and @a=(42), what if @X[$a] returned a scalar and @X[@a]
returned a slice? Similarly, @X[$a; 0..6] would return a 6-array, and
@X[@a; 0..6] a 1x6-array -- scalar subscripts drop a given dimension and
array subscripts keep it. (I think this is almost what Craig was
suggesting, only without giving up ! for factorial. =))

The list-of-lists semicolon can still turn the 42 into [42], so that if
you have a list-of-listy function that doesn't care whether you started
with scalars or not, it doesn't have to know. But really the LoL
semicolon would turn 42 into C<[42] but used_to_be_scalar>, so that
something that *does* care (e.g. subscripting an array) can simply check
for that property.

Using a scalar to get a scalar feels rather appropriate, and not too
surprising, I think. Most people would probably expect @X[$a] to return
a scalar, and use @X[@a] to mean a slice (if @a happened to have only a
single element, you're still probably using an array subscript to get
possibly many elements -- otherwise why would you be using an array @a
instead of just a plain scalar in the first place?)

Plus if you do want to force an array subscript instead of a scalar, or
vice versa, you don't need any new keywords to do it: @X[~@a; 0..6] or
@X[[42]; 0..6] (which is the same as @X[list 42; 0..6], right? Which
could also be written @X[*42; 0..6], which is kind of nice, because [*42]
means "give me the 42nd slice" while [*] means "give me an unspecified
slice, a slice of everything".)

Anyway, delving right into the can of wyrms, in P5 there were list,
scalar, and void contexts (1-D, 0-D, and... uh... -1-D?), but now that we
have real multidimensional arrays, we could have infinite contexts (ouch).
Well, there must be some nice way to generalise that, but it raises a
bunch of questions. (I can imagine "table context" being reasonably
popular.)

Various functions in P5 left the dimension of their arg untouched (take a
list, return a list), or dropped it down one (take a list, return a
scalar). (Taking a scalar and returning a list is less common, but I can
imagine a 2-D version of 'split' that turns a string into a table....)

So in p6, should 'shift'ing an n-D array return a scalar or an array of
n-1 dimensions? It depends on whether you see it as a way to criss-cross
through an array one element at a time, or as a way to take one 'layer'
off something. Both would be useful.

'grep' could return a list (1-D) of all matching individual elements, but
perhaps more usefully it could preserve dimensionality:
my @board is shape(8;8);
#match alternating squares:
@checkerboard=grep { ($_.index[0] + $_.index[1])%2 } @board;

...to end up with a ragged 2-D array containing only the usable half of
our checkerboard. (I'm assuming we have something like .index to get the
x, y co-ords of an element?)

'reverse' would presumably flip around the indices in all dimensions. Ah,
the fun of coming up with new multidimensional variations on all the old
(or new) favourites!

- David "a head-scratcher no matter how you slice it" Green

David Green

unread,

Jan 12, 2005, 8:19:58 AM1/12/05

to perl6-l...@perl.org

In article <plato-DE101F....@x6.develooper.com>,
pl...@edmc.net (David Green) wrote:

>I can imagine "table context" being reasonably popular. [...]

>(Taking a scalar and returning a list is less common, but I can
>imagine a 2-D version of 'split' that turns a string into a table....)

One way to generalise it might be to allow an array (ref) for the thing
to split on. Each element of the array could specify the splitter for
the corresponding dimension:

@table_2D = split [/</TR><TR>/i, /</TD><TD>/i], $html_table;

I guess forcing "table context" on a list would effectively turn it from
a 1-D n-array into a 2-D 1xn-array.

Scalar context on a 2-D table should return some sort of count
(analogous with a list in scalar context), but maybe not the number of
elements in the table. I think the number of records would typically be
more useful.

And list context on a table... it might return a list of array [refs],
each containing a record -- in other words, convert the table into a
p5-style nested data structure that simulates a true 2-D array. On the
other hand, maybe list context simply returns a single plain list
consisting of the table "headers".

Actually, if we have "headings", that's very handy for DB modules, but
we've gone beyond a plain array in two dimensions. A table with named
fields would really be more of a 2-D hash...
my %rec is shape(Int; <foo bar bat>);
%rec<0;foo>="Silence is";
%rec<1>=<foosball barbell batboy>; #assign whole record at once(?)

Except those Int keys are effectively used as strings that just happen
to look like ints, right? That is, I'm not getting all the arrayary
goodness (like pushing or popping or ordering). What I really want here
is a hybrid hash-array. I suspect that there's no way to do that though
(other than creating my own class and overloading array stuff to handle
it for the numeric key(s)).

- David "2-D or not 2-D" Green

Craig DeForest

unread,

Jan 13, 2005, 6:03:08 PM1/13/05

to perl6-l...@perl.org, David Green

Hmmm... David, you seem to have covered all the issues with that rather
lucid screed [attached at bottom]. I have a couple of dragon-nits to pick,
one involving infrastructure and one involving syntax.

First: it seems strange to me to add yet another property ("but
used_to_be_scalar") to the output of the LoL semicolon, when there's a
perfectly good distinction still floating around (the one that is
deliberately blurred between [1] and 1). Using the scalarness of a LoL
element has the potential to make some really esoteric but useful things very
easy to do, since it's a general syntax that could be used for any LoL, not
just for slicing.

But then, there may be good reasons (not obvious to me) to use a property.

Second: Regardless of the LoL infrastructure, we seem to be adopting opposing
viewpoints on whether scalar-ish element in the LoL should be a scalar by
default, with a syntactic hook to denote it as a list-of-1; or whether it
should be a list-of-1 by default, with a syntactic hook to denote it as a
scalar. There are good reasons why either case will surprise a nontrivial
subset of people. Maybe we can sidestep the issue by making use of list
context as you pointed out.

Hmmm... What about replacing '*' entirely, with the yadda-yadda-yadda?
Certainly '0...' ought to enumerate (lazily) everything from 0 to infinity,
so it ought to return everything. But then why not make term-'...' into
an abbreviation for '0...'? That sort of keeps the conceptual crud down
while letting everything get written straightforwardly:

How about this scenario?:

Ground rules: list expressions slice; scalar expressions index;

Basic cases:
[5 ; 0..6] # scalar exps are scalar by default (index)
[(5) ; 0..6] # 5 is now in list context (slice)
[1..1 ; 0..6] # '..' returns a list of 1 element (slice)
[0... ; 0..6] # '0...' is a valid lazy list, if ugly (whole axis)
[... ; 0..6] # '...' acts like '0...' (whole axis)
[* ; 0..6] # OK, have it your way - '*' is a synonym for '0...'.

Cases that return nothing: (either an empty list or undef...)
[ ; 0..6] # An empty list yields nothing
[undef; 0..6] # undefined value yields nothing
[1..0; 0..6] # '..' gives list of no elements; you get nothing

Forcing syntax:
[+(5) ; 0..6] # unary '+' forces scalar context (index always)
[(5) ; 0..6] # List-context parens force list context (slice always)
[5* ; 0..6] # postfix-* forces a slice? (no conflicts?)

Not accepted syntax:
[*5 ; 0..6] # does something strange -- tries to index with a glob.

The only reason to gripe about that (for me) is that surrounding parens change
sense compared to perl5/PDL slicing -- but that seems trivial compared to
hammering out something that's convenient for everyone and still makes some
sort of coherent sense.

Quoth David Green on Wednesday 12 January 2005 05:40 am,