Spare brackets :-)

mar...@kurahaupo.gen.nz

unread,

Jan 27, 2003, 6:11:18 PM1/27/03

to perl6-l...@perl.org

This may sound like a silly idea but ...

Has anyone considered removing with the syntactic distinction between
numeric and string indexing -- that is, between array and hash lookup?

In particular, it would seem that

%foo[$key]

would be just as easy for the compiler to grok as

%foo{$key}

but would mean that we could stop worrying about the precedence of
postfix/infix "{", and things like

if %test { $count++ }

would not require whitespace before the "{" to be disambiguated.

I don't have a complete solution as anonymous array and hash construction would
still need different syntaces, but has anyone else thought about this?

- Martin

Damian Conway

unread,

Jan 27, 2003, 6:39:19 PM1/27/03

to perl6-l...@perl.org

> This may sound like a silly idea

It's been suggested previously.

> Has anyone considered removing with the syntactic distinction between
> numeric and string indexing -- that is, between array and hash lookup?

Yes. We rejected the idea.

> In particular, it would seem that
>
> %foo[$key]
>
> would be just as easy for the compiler to grok as
>
> %foo{$key}

Sure. But then is this:

$ref[$key]

an array or hash look-up???

Damian

Piers Cawley

unread,

Jan 28, 2003, 3:47:34 AM1/28/03

to Damian Conway, perl6-l...@perl.org

Damian Conway <dam...@conway.org> writes:

Decided at runtime?

--
Piers

John Williams

unread,

Jan 27, 2003, 6:38:36 PM1/27/03

to perl6-l...@perl.org

ECMAscript already tried this.

Bad idea.

If your hash keys happen to look like large numbers (e.g. you
have 7-digit product codes) as soon as you store one of them, it says:
"Oh, this looks like a number, so we'll store it like an array" and
happily creates a million empty array entries for you.

~ John Williams

Damian Conway

unread,

Jan 28, 2003, 11:14:37 AM1/28/03

to perl6-l...@perl.org

>>Sure. But then is this:
>>
>> $ref[$key]
>>
>>an array or hash look-up???
>
> Decided at runtime?

Doesn't help if $ref refers to a type that has both hash-like and array-like
accessability. And that will be very common, since all Perl 6 regexes return
such objects.

Damian

Austin Hastings

unread,

Jan 28, 2003, 12:24:50 PM1/28/03

to Dan Sugalski, Piers Cawley, Damian Conway, perl6-l...@perl.org

--- Dan Sugalski <d...@sidhe.org> wrote:
> At 8:47 AM +0000 1/28/03, Piers Cawley wrote:

> >Damian Conway <dam...@conway.org> writes:
> > > Sure. But then is this:
> >>
> >> $ref[$key]
> >>
> >> an array or hash look-up???
> >
> >Decided at runtime?
>

> How? People use strings as array indices and ints/floats as hash
> indices, and count on autoconversion to Make It Work.

On the one hand: Java/ECMA/J-script does it.

All objects are associative arrays. All arrays can be associative, on
demand. C<a["foo"] = 1;> Presto. Associative array.

On the other hand: This gets dangerous really quickly, since Perl's
autoconversion works differently. Specifically, since we treat things
as "strings unless they need to be otherwise" rather than treating them
as "the type that they were when you created them". (Javascript doesn't
have much in the way of I/O, so the act of getting data in is a bit of
an effort, and that effort usually has the side effect of providing
type data.)

Writing a roulette game may get challenging:

@colors["0"] = Green;
@colors["00"] = Green;
@colors["000"] = Green;

Do those get autoconverted to numbers? (They can, obviously. But they
shouldn't.)

This kind of thing points back at a discussion we had once before about
"more kinds of context" -- meaning at the time "numeric" versus
"string" versus ... whatever.

I think that if we do this we'd better know more about what we're
expecting versus what we're losing.

On the "losing" side, the difference between lowercase-array and
lowercase-hash is probably significant, performance-wise. Merging the
array and hash notions may cost a lot of speed for a lot of people.

On the "gaining/expecting" side is ... what? Freeing up curly braces?
Improving the syntax? Other stuff not obvious to me right now?

If we go that route, we could certainly include a new "pure array"
type:

my @trough = slop();
my PureArray @pa = ducks();
my PureHash @ph = dictionary();

=Austin

Dan Sugalski

unread,

Jan 28, 2003, 11:49:42 AM1/28/03

to Piers Cawley, Damian Conway, perl6-l...@perl.org

At 8:47 AM +0000 1/28/03, Piers Cawley wrote:
>Damian Conway <dam...@conway.org> writes:

> > Sure. But then is this:
>>
>> $ref[$key]
>>
>> an array or hash look-up???
>
>Decided at runtime?

How? People use strings as array indices and ints/floats as hash

indices, and count on autoconversion to Make It Work.

--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Piers Cawley

unread,

Jan 28, 2003, 12:26:46 PM1/28/03

to Dan Sugalski, Damian Conway, perl6-l...@perl.org

Dan Sugalski <d...@sidhe.org> writes:

> At 8:47 AM +0000 1/28/03, Piers Cawley wrote:
>>Damian Conway <dam...@conway.org> writes:
>> > Sure. But then is this:
>>>
>>> $ref[$key]
>>>
>>> an array or hash look-up???
>>
>>Decided at runtime?
>
> How? People use strings as array indices and ints/floats as hash
> indices, and count on autoconversion to Make It Work.

Nope. The count on the fact that, at runtime you'll know whether $ref
is a hash or an array. But I'm not actually arguing for this, just
pointing out that it's not necessarily impossible (just way harder to
optimize).

--
Piers

Aaron Sherman

unread,

Jan 28, 2003, 4:17:31 PM1/28/03

to Dan Sugalski, Perl6 Language List

On Tue, 2003-01-28 at 11:49, Dan Sugalski wrote:
> At 8:47 AM +0000 1/28/03, Piers Cawley wrote:
> >Damian Conway <dam...@conway.org> writes:
> > > Sure. But then is this:
> >>
> >> $ref[$key]
> >>
> >> an array or hash look-up???
> >
> >Decided at runtime?
>
> How? People use strings as array indices and ints/floats as hash
> indices, and count on autoconversion to Make It Work.

And it would. I'm not on one-side or another here, but I do see the
proposal working just fine.

$ref.fetch($key)

works even when $key is a string and $ref is an ARRAY ref.

$ref[$key]

is shorthand for

$ref.[$key]

which is in turn shorthand for

$ref.fetch($key)

and/or

$ref.store($key)

depending on how it's used.

Allowing $ref to be a hash reference doesn't change anything because
HASH's fetch and store methods (no matter how builtin or pre-optimized
they may be) will do the conversion.

You still need C<{}> vs. C<[]> for anonymous types, but I don't think
you NEED them for indexing. Now the question becomes, do you WANT them
for readability?

--
Aaron Sherman <a...@ajs.com>
This message (c) 2003 by Aaron Sherman,
and granted to the Public Domain in 2023.
Fight the DMCA and copyright extension!

Dan Sugalski

unread,

Jan 28, 2003, 4:34:23 PM1/28/03

to Aaron Sherman, Perl6 Language List

At 4:17 PM -0500 1/28/03, Aaron Sherman wrote:
> Now the question becomes, do you WANT them
>for readability?

Given that Larry's answer has been a resounding "yes" all along, the
technical reasons (Which are, themselves, sufficient) are pretty
irrelevant.

Aaron Sherman

unread,

Jan 28, 2003, 5:07:45 PM1/28/03

to Dan Sugalski, Perl6 Language List

On Tue, 2003-01-28 at 16:34, Dan Sugalski wrote:
> At 4:17 PM -0500 1/28/03, Aaron Sherman wrote:
> > Now the question becomes, do you WANT them
> >for readability?
>
> Given that Larry's answer has been a resounding "yes" all along,

I'm not sure that this specific case was brought up. I remember Larry
weighing in on the thread about using [] as the only list constructor,
but that's a different issue. Granted, I've been away focusing on work,
so I may have missed it.

> the
> technical reasons (Which are, themselves, sufficient) are pretty
> irrelevant.

I'm not sure I recall the sufficient, yet irrelevant technical reasons.
I certainly can't think of anything. It also helps in the case of
objects that are non truly arrayish or hashish:

my SuperTree $foo;
$foo["Munge"]; # Returns the node whose value is "Munge"
$foo[0]; # Returns a node based on tree-position

The only remaining behavior of braces that I can think of as different
from brackets is auto-quoting, and is there a good reason that brackets
could not auto-quote?

Dan Sugalski

unread,

Jan 28, 2003, 5:12:04 PM1/28/03

to Aaron Sherman, Perl6 Language List

At 5:07 PM -0500 1/28/03, Aaron Sherman wrote:
>On Tue, 2003-01-28 at 16:34, Dan Sugalski wrote:
>> At 4:17 PM -0500 1/28/03, Aaron Sherman wrote:
>> > Now the question becomes, do you WANT them
>> >for readability?
>>
>> Given that Larry's answer has been a resounding "yes" all along,
>
>I'm not sure that this specific case was brought up.

I am. Larry was clear--square brackets for array access, squiggle
brackets for hash access.

Adam Turoff

unread,

Jan 28, 2003, 7:29:06 PM1/28/03

to Austin Hastings, perl6-l...@perl.org

On Tue, Jan 28, 2003 at 09:24:50AM -0800, Austin Hastings wrote:
> --- Dan Sugalski <d...@sidhe.org> wrote:
> > At 8:47 AM +0000 1/28/03, Piers Cawley wrote:
> > >> $ref[$key]
> > >>
> > >> an array or hash look-up???
> > >
> > >Decided at runtime?
> >
> > How? People use strings as array indices and ints/floats as hash
> > indices, and count on autoconversion to Make It Work.
>
> On the one hand: Java/ECMA/J-script does it.

That's nice, but Perl isn't Java/ECMAScript/JavaScript/JScript/C/C++/Pascal.
It's Perl. Perl uses square brackets for arrays, and curly braces for
hashes. Period. And Perl 6 will continue in the path of Perl 1..5,
*not* in the path of some other broken syntax.

If you have any questions about this, please refer to the 1st, 2nd, or
3rd editions of Programming Perl, or to any of the millions of Perl
programmers who have that distinction hard-wired into their wetware.

Z.

John Williams

unread,

Jan 28, 2003, 7:38:43 PM1/28/03

to Aaron Sherman, Perl6 Language List

On 28 Jan 2003, Aaron Sherman wrote:

> I'm not sure I recall the sufficient, yet irrelevant technical reasons.
> I certainly can't think of anything. It also helps in the case of
> objects that are non truly arrayish or hashish:
>
> my SuperTree $foo;
> $foo["Munge"]; # Returns the node whose value is "Munge"
> $foo[0]; # Returns a node based on tree-position
>
> The only remaining behavior of braces that I can think of as different
> from brackets is auto-quoting, and is there a good reason that brackets
> could not auto-quote?

I think you are still overlooking the autovivification behavior.
i.e. What is the difference between these:

1) $a{1234567} = 1;

2) $a[1234567] = 1;

Answer: #1 creates 1 element. #2 creates 1,234,567 elements!
I think that is a big enough difference that perl should not be asked
to guess.

Similar precendents are + vs ~, == vs eq, etc.

~ John Williams

Michael G Schwern

unread,

Jan 29, 2003, 12:04:08 AM1/29/03

to mar...@kurahaupo.gen.nz, perl6-l...@perl.org

On Tue, Jan 28, 2003 at 12:11:18PM +1300, mar...@kurahaupo.gen.nz wrote:
> This may sound like a silly idea but ...
>
> Has anyone considered removing with the syntactic distinction between
> numeric and string indexing -- that is, between array and hash lookup?

PHP works this way.
http://www.php.net/manual/en/language.types.array.php

So that makes a nice case study to investigate.

--

Michael G. Schwern <sch...@pobox.com> http://www.pobox.com/~schwern/
Perl Quality Assurance <per...@perl.org> Kwalitee Is Job One

mar...@kurahaupo.gen.nz

unread,

Jan 29, 2003, 5:09:33 AM1/29/03

to perl6-l...@perl.org

>> In particular, it would seem that
>> %foo[$key]
>> would be just as easy for the compiler to grok as
>> %foo{$key}

On Mon, 27 Jan 2003 15:39:19 -0800, Damian Conway <dam...@conway.org> wrote:
> Sure. But then is this:
>
> $ref[$key]
>
> an array or hash look-up???

Yes, well I suppose that could be considered one of the things I hadn't figured
out yet.

But is seems to me that if we're changing "$X[$n]" to "@X[$n]", then it would
be more consistent to change "$ref->[$key]" to "@$ref[$key]". Except of course
that mixing prefix and postfix notation is horrible, so perhaps "$ref@[$key]" and
"$ref%[$key]". (I'd assumed that "%[" and "@[" would be single symbols?)

> Decided at runtime?

That might be OK, except (as others have pointed out) for auto-vivification,
where the object doesn't exist before we operate on it.

Maybe we would get away with the shorthand "$ref[$index]" *except* where
autovivification is desired, and then we'd have to use the long-hand
"$ref@[$index]" and "$ref%[$index]" versions?

Hmmmmm, actually, I think I could class that as a feature, if the reader --
human or compiler -- could know just by looking whether auto-viv is expected.

-Martin

Sam Vilain

unread,

Jan 29, 2003, 5:23:02 AM1/29/03

to Michael G Schwern, mar...@kurahaupo.gen.nz, perl6-l...@perl.org

On Wed, 29 Jan 2003 18:04, Michael G Schwern wrote:
> On Tue, Jan 28, 2003 at 12:11:18PM +1300, mar...@kurahaupo.gen.nz wrote:
> > This may sound like a silly idea but ...
> >
> > Has anyone considered removing with the syntactic distinction between
> > numeric and string indexing -- that is, between array and hash lookup?
>
> PHP works this way.
> http://www.php.net/manual/en/language.types.array.php
>
> So that makes a nice case study to investigate.

Stick this in yer pipe n smoke it. I haven't actually written the module
yet, just the man page. But I think it might just be crazy enough to
implement :-).

The restriction that a key is a scalar and a value is a blessed object
could be removed, but it was a useful pre-condition for my use.

The DWIMy function that tells the difference between an int and a string is
at the end.

=head1 NAME

Container::Object - set/array/hash/bag/whatever of objects

=head1 SYNOPSIS

use Container::Object;
$coll = Container::Object->new();

push @$coll, foo => (bless { }, "bar");
print ref $coll{foo}; # prints "bar"
print ref $coll[0]; # prints "bar"
print ref( ($coll->members)[0] ); # prints "bar"

=head1 DESCRIPTION

This modules implements a generic container of objects, that is, an
unordered, ordered, or keyed collection of objects, with or without
duplication of items.

This means that depending on how you treat your container, it
automagically behaves like a Set, Bag, Array, Hash or some perverse
combination of a subset of the above.

=head2 EH?

In the beginning, there was the array. Then Larry empowered the Hash
with Perl. Some other geezer followed up with C<Tie::ixHash>, hashes
that also retain ordering. Jean-Louis Leroy wrote C<Set::Object>,
which was a bit like an array but without the ordering, and no
duplicate elements. And Java as well as PHP have a poorly considered
bastard half and half mix of the two.

This container class tries to implement as many of the above classes
as possible, with DWIM and no more than O(log n) execution time being
the overriding principle.

=head2 Keys vs Values

Keys to objects in Container::Objects MUST be unblessed,
non-overloaded, scalar values. You can supply a tied scalar wherever
a key is accepted, but the value is read once.

Values MUST be blessed objects. They can be scalars, hashes,
references, whatever. It doesn't really matter.

=head2 Indexes vs Keys

A very DWIMy function is used to tell the difference between a string
that is intended for use as a hash key, and a whole number that is
intended for use as an index lookup.

This is required to get correct behaviour from looking up the element
at index 0, versus the item with hash key "0".

=head2 Duplicate Items

It is possible to have two objects with the same hash key, or even
indeed an object may appear twice in the container with a single hash
key.

However, if objects are to be fetched by (hash) key, and duplicate
objects are found, the I<last> object inserted with that key is
returned (that is, the one with the highest numerical index). This
emulates the behaviour of a hash.

To avoid unnecessary, stale or duplicate entries in the container
lying around, you can use the C<replace> method instead of C<insert>
to put your objects in.

=head2 Key Fabrication

If objects are inserted without hash keys, they are inserted with an
empty hash key (from a user's perspective; internally a hash is
computed from the object's address in memory for objects with
undefined keys so the container is still well balanced).

If objects are inserted with hash keys, they are added to the logical
end of the container.

=head2 Algorithms and execution time

Simple N-bucket hashing with complete re-indexing is used for
simplicity of the code and speed for the general target case.

Each bucket contains a series of hash table entries, which is an array
of pointers to C<hek> structures and C<SV>s. Each entry also contains
a reference to the next entry in the numerical index, and copy of its
hash key to fill the bucket entry to four CPU words.

The bucket index is a flat array of pointers, plus a flat array of
C<(index # => pointer)> pairs to various points in the index array
chain; so that inserts can be performed in near constant time (the
index is merely updated).

=head1 CLASS METHODS

=head2 new( [ [ I<key> =E<gt> ] I<value>] ... )

Return a new C<Container::Object> containing the elements passed in
I<list>. The elements must be objects. Keys are optional.

=head1 INSTANCE METHODS

=head2 insert( [ [ I<key> =E<gt> ] I<value>] ... )

Add objects to the C<Container::Object>.

Adding the same object several times is not an error, but any
C<Container::Object> will contain at most one occurance of the same
hash key/object pair, and the numeric index always remains continuous,
no holes, starting at zero. Returns the number of elements that were
actually added.

When adding elements to the collection's numerical index, inserts into
the list at the decided (or given) position.

=head2 replace( [ [ I<key> =E<gt> ] I<value>] ... )

Replaces objects in the C<Container::Object>.

Virtually identical to C<insert()>, but likes to replace elements
rather than duplicate them. It is not an error if the object, key or
key/value pair does or does not exist.

Will only replace entries in the numerical list if the passed key is
itself numeric.

You probably want to use this function when treating the collection as
a hash, otherwise you will end up with multiple values sharing the
same key.

=head2 includes( [ I<keys>, ] [ I<values> ])

Return C<true> if all the objects in I<value> are members of the
C<Container::Object>. The argument list may be empty, in which case
C<true> is returned.

If string keys are passed, the function behaves more like Perl's
C<exists> function; if numeric keys are passed, it checks that the
index passed is in the valid range of the container. If C<key =E<gt>
value> I<pairs> are passed, then the objects must exist at the given
location.

=head2 members( [I<keys>] )

Return the objects contained in the C<Container::Object> as an array,
in the order that they were inserted - or, if I<list> is non-null, it
is considered to be equivalent to a hash slice function (add C<undef>
to the end of the list to force this behaviour - it always returns no
element). Lookup failures return C<undef> in the returned list.

When slicing, if an element exists twice with the same hash key, the
one with the higher numerical index is returned.

=head2 member( [I<key>] )

If I<key> is given, returns the member at that key, or C<undef> if it
does not exist; much like the C<members()> function. However, on
subsequent calls to the C<member> function, the next key in the
numerical index is returned. Returns an EOT mark (I mean, C<undef>)
after all the items have been returned.

This means if you ask for the I<second> member of the container with
C<$container-E<gt>member(1)>, the next call to
C<$container-E<gt>member()> will return the I<third> member of the
container - ie, the same as C<$container-E<gt>members(2)>. The
I<last> call to C<$container-E<gt>member()> that returns a value will
return the I<first> member of the container. After that, you'll get a
single C<undef>, and then it will start at the I<first> element.

In list context, this function returns all the items in the container
with the passed key.

=head2 size

Return the number of elements in the C<Container::Object>.

=head2 remove( [ I<keys>, ] [I<values>] )

Remove objects from a C<Set::Object>.

Removing the same object more than once, or removing an object absent
from the C<Set::Object> is not an error.

Returns the number of elements that were actually removed.

=head2 utsl BLOCK

Use the Schwarz, Luke.

This function uses BLOCK to perform a sort, setting C<$a> and C<$b> to
the keys of the items in the hash and just updating index values to perform
the sort.

=head2 sort BLOCK

This function uses BLOCK to perform an in-place sort, setting C<$a>
and C<$b> to the VALUES of the items in the hash (or, rather, blessed
references to them), and preserving hash keys.

=head2 clear

Empty this C<Container::Object>.

=head2 as_string

Return a textual Smalltalk-ish representation of the
C<Container::Object>. Also available as overloaded operator "".

=head2 intersection( [I<list>] )

Return a new C<Container::Object> containing the intersection of the
C<Container::Object>s passed as arguments. Also available as
overloaded operator *.

The following container will actually be empty, as distinct C<key
=E<gt> value> pairs are considered unique for set intersection
purposes.

$o = new Object;
$newset = Container::Object->new(hi => $o) *
Container::Object->new($o);

=head2 union( [I<list>] )

Return a new C<Container::Object> containing the union of the
C<Container::Object>s passed as arguments. Also available as
overloaded operator +.

The following container will end up with two entries of C<$o>:

$o = new Object;
$newset = Container::Object->new($o, $o) +
Container::Object->new($o);

=head2 subset( I<set> )

Return C<true> if this C<Container::Object> is a subset of I<set>.
Also available as operator <=.

For subset comparison, container indexes are ignored.

=head2 proper_subset( I<set> )

Return C<true> if this C<Container::Object> is a proper subset of
I<set> Also available as operator <.

For subset comparison, container indexes are ignored.

=head2 superset( I<set> )

Return C<true> if this C<Set::Object> is a superset of I<set>.
Also available as operator >=.

For superset comparison, container indexes are ignored.

=head2 proper_superset( I<set> )

Return C<true> if this C<Container::Object> is a proper superset of
I<set> Also available as operator >.

For superset comparison, container indexes are ignored.

# This function is used to differentiate between an integer and a
# string for use by the hash container types

sub ish_int {
my $scalar = shift;

my $i;
eval { $i = _ish_int($scalar) };

if ($@) {
if ($@ =~ /overload/i) {
if (my $sub = UNIVERSAL::can($scalar, "(0+")) {
return ish_int(&$sub($scalar));
} else {
return undef;
}
} elsif ($@ =~ /tie/i) {
my $x = $scalar;
return ish_int($x);
}
} else {
return $i;
}
}

int
_ish_int(sv)
SV *sv
PROTOTYPE: $
CODE:
double dutch;
int innit;
int lp; // world famous in NZ
SV * MH;
// This function returns the integer value of a passed scalar, as
// long as the scalar can reasonably considered to already be a
// representation of an integer. This means if you want strings to
// be interpreted as integers, you're going to have to add 0 to
// them.

if (SvMAGICAL(sv)) {
// probably a tied scalar
//mg_get(sv);
Perl_croak(aTHX_ "Tied variables not supported");
}

if (SvAMAGIC(sv)) {
// an overloaded variable. need to actually call a function to
// get its value.
Perl_croak(aTHX_ "Overloaded variables not supported");
}

if ( SvIOK(sv) ) {
// IOK - the scalar is a true integer.
RETVAL = SvIV(sv);
} else if (SvNOK(sv)) {
// NOK - the scalar is a double

if (SvPOK(sv)) {
// POK - the scalar is also a string.

// we have to be careful; a scalar "2am" or, even worse, "2e6"
// may satisfy this condition if it has been evaluated in
// numeric context. Remember, we are testing that the value
// could already be considered an _integer_, and AFAIC 2e6 and
// 2.0 are floats, end of story.

// So, we stringify the numeric part of the passed SV, turn off
// the NOK bit on the scalar, so as to perform a string
// comparison against the passed in value. If it is not the
// same, then we almost certainly weren't given an integer.

MH = Perl_newSVnv(SvNV(sv));
Perl_sv_2pv(MH, &lp);
SvNOK_off(MH);
if (sv_cmp(MH, sv) != 0) {
XSRETURN_UNDEF;
}
}
dutch = SvNV(sv);
innit = (int)dutch;
if (dutch - innit < (0.000000001)) {
RETVAL = innit;
} else {
XSRETURN_UNDEF;
}
} else {
XSRETURN_UNDEF;
}
OUTPUT:
RETVAL

Leopold Toetsch

unread,

Jan 29, 2003, 5:29:22 AM1/29/03

to John Williams, Aaron Sherman, Perl6 Language List

John Williams wrote:

> I think you are still overlooking the autovivification behavior.
> i.e. What is the difference between these:
>
> 1) $a{1234567} = 1;
>
> 2) $a[1234567] = 1;
>
> Answer: #1 creates 1 element. #2 creates 1,234,567 elements!

Not currently: 2) does
- generate a sparse hole between old size and up to ~index
- generate one data chunk near index
- store the PerlInt at index

Reading non existent data in the hole generates then PerlUndef's on the fly.

> ~ John Williams

leo

Austin Hastings

unread,

Jan 29, 2003, 10:36:50 AM1/29/03

to Sam Vilain, Michael G Schwern, mar...@kurahaupo.gen.nz, perl6-l...@perl.org

--- Sam Vilain <s...@vilain.net> wrote:

> =head2 includes( [ I<keys>, ] [ I<values> ])

Where the <keys> and/or <values> are obviously junctions.

if ($container.includes(any("ant", "beaver", "cow", "duck"))(
...

This is *SO* cool.

=Austin

Aaron Sherman

unread,

Jan 29, 2003, 2:46:11 PM1/29/03

to Leopold Toetsch, Perl6 Language List

On Wed, 2003-01-29 at 05:29, Leopold Toetsch wrote:
> John Williams wrote:
>
> > I think you are still overlooking the autovivification behavior.
> > i.e. What is the difference between these:
> >
> > 1) $a{1234567} = 1;
> >
> > 2) $a[1234567] = 1;
> >
> > Answer: #1 creates 1 element. #2 creates 1,234,567 elements!

> Not currently: 2) does
> - generate a sparse hole between old size and up to ~index
> - generate one data chunk near index
> - store the PerlInt at index

I covered this under the term "storage". The storage is different for
arrays and hashes. This we know.

But, why do I need to waste a set of balanced tokens to indicate that
difference? Historical compatibility with Perl5? Perhaps. That's not a
bad reason actually, given how much this could wig people out (just look
at the response on this list :)

Also, you don't always pre-declare in Perl, and the following would be
ambiguous:

$x[7] = 8;

That could auto-vivify an array ref or a hash ref, and choosing one or
the other is kind of scary. I think you could work around that, but it
would require a real dedication to the IDEA that Perl has a generic
container type.

Smylers

unread,

Jan 29, 2003, 3:15:51 PM1/29/03

to perl6-l...@perl.org

Michael G Schwern wrote:

> On Tue, Jan 28, 2003 at 12:11:18PM +1300, mar...@kurahaupo.gen.nz wrote:
>
> > Has anyone considered removing with the syntactic distinction
> > between numeric and string indexing -- that is, between array and
> > hash lookup?
>
> PHP works this way.

Well, for some definition of "work", anyway.

> http://www.php.net/manual/en/language.types.array.php
>
> So that makes a nice case study to investigate.

The manual is strangely quiet on how this duality fits together, but
through using PHP I've managed to work some of it out.

Basically in PHP there are just hashes, but hashes retain the order in
which elements are stored. When iterating with C<foreach> or C<print_r> you
get elements back in the order you put them in:

$colour = array('grass' => 'green', 'sky' => 'blue',
'orange' => 'orange');
print_r($colour);

# output:
Array
(
[grass] => green
[sky] => blue
[orange] => orange
)

If you add an element that doesn't have a key, then PHP finds the
maximum key that looks like a non-negative integer, adds one to it, and
uses that as the key for the new element, or uses zero if no elements
have such keys.

This means that if you never assign any explicit keys you can make it
seem like the hash is a 'normal' array:

$colour = array('purple', 'black', 'silver');
print_r($colour);

Array
(
[0] => purple
[1] => black
[2] => silver
)

But it isn't. Remember that elements retain the order in which they
were stored. So if you set them not in numerical order then they remain
in that order:

$colour = array();
$colour[1] = 'taupe';
$colour[3] = 'red';
$colour[0] = 'zephyr';
$colour[2] = 'pink';
print_r($colour);

Array
(
[1] => taupe
[3] => red
[0] => zephyr
[2] => pink
)

And if you remove an element from the middle of an array, the existing
elements keep their previous indices, so the indices are no longer
consecutive:

$colour = array('purple', 'black', 'silver', 'gold');
unset($colour[2]);
print_r($colour);

Array
(
[0] => purple
[1] => black
[3] => gold
)

And if you put that element back, well it doesn't go there but appears
at the end:

$colour = array('purple', 'black', 'silver', 'gold');
unset($colour[2]);
$colour[2] = 'bronze';
print_r($colour);

Array
(
[0] => purple
[1] => black
[3] => gold
[2] => bronze
)

The ordering means that it's possible to shift elements off the
beginning of a hash, which is fine:

$colour = array('grass' => 'green', 'sky' => 'blue',
'orange' => 'orange');
array_shift($colour);
print_r($colour);

Array
(
[sky] => blue
[orange] => orange
)

There's also something in there which checks for non-negative integer
keys, and if so renumbers them after shifting. This helps keep up the
pretence that the hash can be used as an array:

$colour = array('purple', 'black', 'silver', 'gold');
array_shift($colour);
print_r($colour);

Array
(
[0] => black
[1] => silver
[2] => gold
)

But this can have surprising consequences if you are trying to associate
numbers with particular elements:

$days = array(1 => 'day', 7 => 'week', 14 => 'fortnight',
365 => 'year');
array_shift($days);
print_r($days);

Array
(
[0] => week
[1] => fortnight
[2] => year
)

Zero days in a week? And the same thing happens even if the keys are
quoted, so that they are strings that happen to look like integers but
not actual integers:

$days = array('1' => 'day', '7' => 'week', '14' => 'fortnight',
'365' => 'year');

Trying that on a hash where some of the keys happen to look like
integers and some don't results in the former being renumbered and the
latter being left alone.

Is that enough?

The amazing thing is that most of the time, despite the above, what PHP
calls arrays do actually work well enough. But really PHP has hashes
with elements that retain their storage order plus some awkward hacks
that may make them seem more like arrays in certain circumstances. It's
messy. It's bordering on evil.

And I've never found it to have any advantage over the Perl way of
allowing the programmer to specify which data structure is required
rather than making the interpreter guess.

I cannot think of a worse example for Perl to follow.

Smylers