Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

In defense of zero-indexed arrays.

12 views
Skip to first unread message

Michael G Schwern

unread,
Dec 5, 2002, 5:45:39 AM12/5/02
to perl6-l...@perl.org
I'm going to ask something that's probably going to launch off into a long,
silly thread. But I'm really curious what the results will be so I'll ask
it anyway. Think of it as an experiment.

So here's your essay topic:

Explain how having indexes (arrays, substr, etc...) in Perl 6 start at 0
will benefit most users. Do not invoke legacy. [1]


[1] ie. "because that's how most other languages do it" or "everyone is used
to it by now" are not valid arguments. Ask any Pascal programmer. :)


--

Michael G. Schwern <sch...@pobox.com> http://www.pobox.com/~schwern/
Perl Quality Assurance <per...@perl.org> Kwalitee Is Job One
Follow me to certain death!
http://www.unamerican.com/

Luke Palmer

unread,
Dec 5, 2002, 6:34:30 AM12/5/02
to sch...@pobox.com, perl6-l...@perl.org
> Mailing-List: contact perl6-lan...@perl.org; run by ezmlm
> Date: Thu, 5 Dec 2002 02:45:39 -0800
> From: Michael G Schwern <sch...@pobox.com>
> Content-Disposition: inline
> Sender: Michael G Schwern <sch...@blackrider.schwern.org>
> X-SMTPD: qpsmtpd/0.12, http://develooper.com/code/qpsmtpd/

>
> I'm going to ask something that's probably going to launch off into a long,
> silly thread. But I'm really curious what the results will be so I'll ask
> it anyway. Think of it as an experiment.
>
> So here's your essay topic:
>
> Explain how having indexes (arrays, substr, etc...) in Perl 6 start at 0
> will benefit most users. Do not invoke legacy. [1]

Through years of experience: "Because it's cleaner that way."

from 1: A Z
↓ ↓
+------+------+------+------+------+
$x: | "1" | "2" | "3" | "4" | "5" |
| | | | | |
+------+------+------+------+------+
↑ ↑ ↑ ↑
from 0: a b y z

They're just different ways of thinking. If you start from 1, you're
talking about the elements themselves; operations are [i,j]
(inclusive). If you start from 0, you're talking about the positions
between elements; operations are [i,j) (inclusive, exclusive).

Say you have $x as above, and you wish to partition it into two
strings "12" and "345". In the "1" paradigm:

$part = 3;
$first = substr $x, 1, $part-1;
$last = substr $x, $part, 5;

In the "0":

$part = 2;
$first = substr $x, 0, $part;
$last = substr $x, $part, 5;

In the former, you can call $part 2 if you want; it's equally as ugly.
I'm having flashbacks to my QBASIC days, where anything that
manipulated arrays seemed to be flooded with +1 and -1 in that way.
They say C has off by one errors, they have not tried BASIC.

I know this wasn't a strong argument, but in summary, most algorithms
are more elegant when working with spaces between elements than with
the indices of the elements themselves. And it only makes sense to
number them from zero then (otherwise you get length+1 as the end,
which doesn't make any sense).

Luke

Richard Proctor

unread,
Dec 5, 2002, 6:20:33 AM12/5/02
to Michael G Schwern, perl6-l...@perl.org
On Thu 05 Dec, Michael G Schwern wrote:
> So here's your essay topic:
>
> Explain how having indexes (arrays, substr, etc...) in Perl 6 start at 0
> will benefit most users. Do not invoke legacy. [1]
>
> [1] ie. "because that's how most other languages do it" or "everyone is
> used to it by now" are not valid arguments. Ask any Pascal programmer. :)

Many years ago I was involved with a project where all the software
people reffered to the hardware as planes 0 and 1 (it was a duplicated
system) and the hardware people always used 1 and 2. To avoid confusion
we settled on using 0 and 2.

Any way of indexing arrays has its proponents. Perl currently has the
heavily depreciated $[ to allow playing with this base, changing it has
nasty affects at a distance.

Long long ago some computer languages did base their arrays at 1 rather
than 0. Hopefully they are dead now - it led to confusion and bad practices.
But that is a legacy argument.

There was an argument when computer languages were close to the hardware,
when to index an array you added the index (multiplied by the size of
the element) to the base of the array to find what you wanted. This is
probably insignificant and not an issue today.

To conclude other than a very large legacy argument, there is probably
no strong reason to base arrays at 0 rather than 1. I would not want to
change.

Richard


--
Personal Ric...@waveney.org http://www.waveney.org
Telecoms Ric...@WaveneyConsulting.com http://www.WaveneyConsulting.com
Web services Ric...@wavwebs.com http://www.wavwebs.com
Independent Telecomms Specialist, ATM expert, Web Analyst & Services

Austin Hastings

unread,
Dec 5, 2002, 10:37:04 AM12/5/02
to Michael G Schwern, perl6-l...@perl.org
> Explain how having indexes (arrays, substr, etc...) in Perl 6 start
> at 0 will benefit most users.

The languages which do not start their indices at 0 are dead or dying.

> Do not invoke legacy.

How about FUD? :-)

=Austin

Brian Ingerson

unread,
Dec 6, 2002, 3:53:53 AM12/6/02
to Michael G Schwern, perl6-l...@perl.org
On 05/12/02 02:45 -0800, Michael G Schwern wrote:
> I'm going to ask something that's probably going to launch off into a long,
> silly thread. But I'm really curious what the results will be so I'll ask
> it anyway. Think of it as an experiment.
>
> So here's your essay topic:
>
> Explain how having indexes (arrays, substr, etc...) in Perl 6 start at 0
> will benefit most users. Do not invoke legacy. [1]

With languages like Perl that have negative subscripts, using a zero
base gives continuity. @INC[-2..2] should continue to DWIM.

Cheers, Brian

Damien Neil

unread,
Dec 5, 2002, 2:55:14 PM12/5/02
to perl6-l...@perl.org
On Thu, Dec 05, 2002 at 02:45:39AM -0800, Michael G Schwern wrote:
> Explain how having indexes (arrays, substr, etc...) in Perl 6 start at 0
> will benefit most users. Do not invoke legacy. [1]

Answer 1: Ignoring legacy, it won't.

Answer 2: Because C uses 0-based indexes, Parrot is written in C, and
it would be just painful to switch back and forth when working on
different layers of the system. (Not a legacy argument, unless you
want to argue that Parrot is a legacy system.)

Answer 3: In a lower-level language than Perl, an array is usually a
block of memory divided into array elements. The index is the offset
from the start of the array. In languages like C which allow pointer
arithmetic, it makes sense for the array index to be the element offset,
to allow a[i] to be equal to *(a + i). Higher level languages should
follow this convention, for consistency. (Again, not a legacy argument,
since it offers a first-principles rationale for 0-based arrays in
certain contexts.)

- Damien

Agent Secret

unread,
Dec 5, 2002, 3:20:30 PM12/5/02
to Michael G Schwern, perl6-l...@perl.org
> 2002-12-05 10:45:39, Michael G Schwern <sch...@pobox.com> wrote:
> I'm going to ask something that's probably going to launch off into a
> long, silly thread. But I'm really curious what the results will be so
> I'll ask it anyway. Think of it as an experiment.
>
> So here's your essay topic:
>
> Explain how having indexes (arrays, substr, etc...) in Perl 6 start at 0
> will benefit most users. Do not invoke legacy. [1]
>
> [1] ie. "because that's how most other languages do it" or "everyone is
> used to it by now" are not valid arguments. Ask any Pascal programmer.
> :)


The other (reverse) way out, i'm not trying to make an essay, just think
out loud but if you have $string = "Hello World", and you want the last
three chars, you do:

$wanted = substr $string, -3;

If the first index was 1, it could be ok too, but what would be offset
0? What if someone was looking at his string backwards?

$pos = 1; # 0 | 1
# ---------
substr $string, $pos--, 1; # 'H' | 'e'
substr $string, $pos--, 1; # 'd' | 'H'
substr $string, $pos--, 1; # 'l' | '' ?
substr $string, $pos--, 1; # 'r' | 'd' ?


Dont ask me why someone would do that... But i expect to get the last
$string's char with $pos == -1, not 0.

I also find the 'offset' idea to be consistent with binary math. After
all, with bytes, 0x7F + 1 == +0d127 but also -0d128... and i found it
sometimes useful to be able to mix signed and unsigned values.

One could argue it's not the way to go, it's tricky, you dont mix
signed/unsigned... blah. Walking is tricky, bicycling is tricky,
remember the first time you tried and you fell?

blah. no more args :-)
(yet another lurker)

Brad Hughes

unread,
Dec 6, 2002, 5:16:43 PM12/6/02
to Damien Neil, perl6-l...@perl.org
Damien Neil wrote:
> On Thu, Dec 05, 2002 at 02:45:39AM -0800, Michael G Schwern wrote:
>
>>Explain how having indexes (arrays, substr, etc...) in Perl 6 start at 0
>>will benefit most users. Do not invoke legacy. [1]
>
>
> Answer 1: Ignoring legacy, it won't.

Bingo.

> Answer 2: Because C uses 0-based indexes, Parrot is written in C, and
> it would be just painful to switch back and forth when working on
> different layers of the system. (Not a legacy argument, unless you
> want to argue that Parrot is a legacy system.)

I doubt "most users" will be writing Parrot.

> Answer 3: In a lower-level language than Perl, an array is usually a
> block of memory divided into array elements. The index is the offset
> from the start of the array.

Assuming the base index of the array is 0. More generally, the index of an
array element is that element's offset from the base index of the array.
Your argument is somewhat circular. I have oodles of arrays declared to
start at 1980. Most of my arrays start at index 1. But then I'm a Fortran
programmer. (And I hope that's not an opening for a language war thread.)

Choice of language aside, having max_index == num_elements appeals to me. YMMV.

In any case, the choice of default base index is less important for Perl than
for other languages given how seldom arrays in Perl are accessed by index as
opposed to manipulated by push, pop, for $x (@array) loops and such.

brad

Larry Wall

unread,
Dec 6, 2002, 8:59:33 PM12/6/02
to perl6-l...@perl.org
On Thu, Dec 05, 2002 at 02:45:39AM -0800, Michael G Schwern wrote:
: I'm going to ask something that's probably going to launch off into a long,

: silly thread. But I'm really curious what the results will be so I'll ask
: it anyway. Think of it as an experiment.
:
: So here's your essay topic:
:
: Explain how having indexes (arrays, substr, etc...) in Perl 6 start at 0
: will benefit most users. Do not invoke legacy. [1]

How about, because I like it? You may, of course, see that as a
legacy argument, depending on our relative ages... :-)

Anyway, that aside, I see no reason why we couldn't have array types
that are explicitly declared with array bases other than 0. Perhaps
even the built-in types can just take a range property:

my @array is range(1...);

One could even go so far as to have a pragma that causes all arrays declared
in the current *lexical* scope to be based at 1. Call it

use fortran;

or some such...

This is not problematical in the same way that $[ was, since we're
limiting the effect to the current lexical scope. In fact, speaking
of legacy, you'll recall that the "fix" for Perl 5 was to make

$[ = 1;

really do a lexically scoped declaration despite having the appearance
of a global assignment.

By the way, I noticed when visiting Uruguay that the elevators number
the floors ...-2, -1, 0, 1, 2..., where 0 is the ground floor, and
basement floors are negative. Way cool. Now all we have to do is
convince everyone that the year 1 B.C. is the same as year 0 A.D.,
and 2 B.C. is the same as -1 A.D., and so on.

Larry

Damian Conway

unread,
Dec 6, 2002, 9:14:24 PM12/6/02
to perl6-l...@perl.org
Larry wrote:

> : Explain how having indexes (arrays, substr, etc...) in Perl 6 start at 0
> : will benefit most users. Do not invoke legacy. [1]
>
> How about, because I like it? You may, of course, see that as a
> legacy argument, depending on our relative ages... :-)

A practical argument in its favour is that it makes circular-lists-via-modulo:

@list[++nextidx%7] = $nextval;

and cyclic-value-mapping-via-modulo:

$day_name = <<Sun Mon Tue Wed Thu Fri Sat>>[$day%7];

both work correctly.


> Anyway, that aside, I see no reason why we couldn't have array types
> that are explicitly declared with array bases other than 0. Perhaps
> even the built-in types can just take a range property:
>
> my @array is range(1...);

Surely, that would be:

my @array is domain(1...);

???

Damian

Uri Guttman

unread,
Dec 6, 2002, 11:15:07 PM12/6/02
to Damian Conway, perl6-l...@perl.org
>>>>> "DC" == Damian Conway <dam...@conway.org> writes:

DC> A practical argument in its favour is that it makes
DC> circular-lists-via-modulo:

DC> @list[++nextidx%7] = $nextval;

DC> $day_name = <<Sun Mon Tue Wed Thu Fri Sat>>[$day%7];

DC> both work correctly.

not to defend 1 based arrays but all you have to do with the above is
add the base offset to them:

DC> @list[++nextidx%7 + 1] = $nextval;
DC> $day_name = <<Sun Mon Tue Wed Thu Fri Sat>>[$day%7 + 1];

in any case i like 0 based. the best argument i have seen so far is that
is makes -1 a meaningful index. that is something pl1 and fortran could
never do without a range declaration.

and larry's property and pragma ideas are fine solutions for those who
want impaired indexing. :)

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Chromatic

unread,
Dec 7, 2002, 1:42:44 PM12/7/02
to perl6-l...@perl.org
On Fri, 06 Dec 2002 14:16:43 +0000, Brad Hughes wrote:

> In any case, the choice of default base index is less important for Perl than
> for other languages given how seldom arrays in Perl are accessed by index as
> opposed to manipulated by push, pop, for $x (@array) loops and such.

I slice a lot of lists, though, and expect the base index of a loop to
have a certain resemblance to the base index of an array.

-- c

Mark J. Reed

unread,
Dec 9, 2002, 3:05:01 PM12/9/02
to perl6-l...@perl.org
On 2002-12-06 at 17:59:33, Larry Wall wrote:
> Now all we have to do is
> convince everyone that the year 1 B.C. is the same as year 0 A.D.,
> and 2 B.C. is the same as -1 A.D., and so on.
Well, since that's already true, it hopefully won't take much
convincing. :) If you mean to convince the general public to actually
*use* 0 and negative years AD instead of BC, though, that'll take some doing.
(Astronomers do that already, but they don't count as the general public;
they're more a specific public.) :)

--
Mark REED | CNN Internet Technology
1 CNN Center Rm SW0831G | mark...@cnn.com
Atlanta, GA 30348 USA | +1 404 827 4754

Michael D. Adams

unread,
Dec 13, 2002, 4:00:28 AM12/13/02
to
sch...@pobox.com (Michael G Schwern) wrote in message news:<20021205104...@blackrider.schwern.org>...

> I'm going to ask something that's probably going to launch off into a long,
> silly thread. But I'm really curious what the results will be so I'll ask
> it anyway. Think of it as an experiment.
>
> So here's your essay topic:
>
> Explain how having indexes (arrays, substr, etc...) in Perl 6 start at 0
> will benefit most users. Do not invoke legacy. [1]
>
>
> [1] ie. "because that's how most other languages do it" or "everyone is used
> to it by now" are not valid arguments. Ask any Pascal programmer. :)

Consider how you specify a range.

Zero length range
[$x, $y] version: $s.substr($x, $x - 1)
[$x, $y) version: $s.substr($x, $x)

Length of a range
[$x, $y] version: $y - $x + 1
[$x, $y) version: $y - $x

The [$x, $y) version wins hands down. It is just weird to write
"$s.substr(0,-1)".
--
Given that we want [$x, $y), consider what happens when we specify the
complete range.

zero based index: $s.substr(0, $s.length())
one based index: $s.substr(1, $s.length() + 1)

The zero based version is better looking and more logical.
--
As mentioned in another post pointer arithmetic is also a reason for
zero based. It gets really important in multi-dimensional stuff.

x,y -> offset in zero based: $offset = $x + $y * $row_size
offset -> x,y in zero based: $x = $offset % $row_size; $y =
(int)($y/$row_size)
x,y -> offset in one based: $offset = $x - 1 + ($y - 1) * $row_size
offset -> x,y in one based: $x = ($offset - 1) % $row_size; $y =
(int)(($y - 1)/$row_size)

The division in that algorithm is a problem though because divides are
very slow. It can be optimized in zero based when $row_size is of the
form $row_size = 2**$k. In that case the zero based can be optimized
to the purely boolean bit operations:

x,y -> offset: $offset = $x | $y << $k
offset -> x,y: $x = $offset & $mask; $y = $offset & ~$mask

Where $mask depends on $k such as 0b11111000 if $k == 3 (formally
$mask == (~0) << $k). The later case is of more importance because
hardware can just take the top 5 bits as $y and the bottom 3 bits as
$x, which is way faster than *anything*. No such optimization exists
for one based indexing.
--
Finally, contrary to the argument that the general public uses one
based indexing, we actually do use zero based indexing in every day
life especially in time related things. Someone who has just been
born is zero years old until they have lived a full year at which
point they are a one year old. So of you were to label the years you
have lived you would have your 0th year, 1st year, 2nd year, etc.
Same for time (24hr not 12hr which is messed up). Time starts at hour
00:00 then 01:00 all the way to 23:59. We do use one based indexing
in day of month but that makes calculations hard. They exhibit the
same problems as in the pointer arithmetic example.
--
One interesting case if is a (perl5/C style) for loop indexing over
elements of an array.

zero based: for ($i = 0; $i < length(@x); $x++)
one based: for ($i = 1; $i <= length(@x); $x++)

Neither case is significantly better because of the "<" vs. "<="
thing. However we must consider what using the other would mean. For
example what would the following mean?

zero based: for ($i = 0; $i <= length(@x); $x++)
one based: for ($i = 1; $i < length(@x); $x++)

Frankly I don't know, but I would be interested to know if anyone can
come up with an idiomatic meaning for these.
--
After considering those cases zero is the best default. Being able to
declare a per array offset is okay but most cases you really need zero
based.

Michael D. Adams

0 new messages