Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why do arrays begin at zero?

34 views
Skip to first unread message

Chris

unread,
Jul 1, 1997, 3:00:00 AM7/1/97
to

Anyone know why arrays begin at zero? Wouldn't it be alot easier to
start at one so the last piece of data was the size of the array, not
size -1?

why do they do that?

Chris

Curtis Bass

unread,
Jul 1, 1997, 3:00:00 AM7/1/97
to

The simple reason is, "Because computers start counting from zero." It's
simply efficient for arrays to start at zero; the value "zero" is a
perfectly valid value, so why waste it by starting from one?

Once you get used to it, it makes sense.

Curtis

Phlip C Plumlee

unread,
Jul 1, 1997, 3:00:00 AM7/1/97
to Chris

Chris wrote:

> Anyone know why arrays begin at zero? Wouldn't it be alot easier to
> start at one so the last piece of data was the size of the array, not
> size -1?

The actual reason is because 'array [x]', 'x [array]', and '*(array +
x)' are all exactly the same. They are not "similar" or "equivalent" -
they are exactly the same expression.

Now let's dissect '*(array + x)'. We take the address of the first array
element, add an offset to it, and dereference this element. If you
expect the compiler to get the first element when x is 1, we must change
the definition of 'array'. But that means '*(array)' will now illegally
dereference un-owned memory. And I do not want to use a compiler that
changes the meaning of my variable if there is a '+' next to it!

If you feel the urge to base an array on any other number besides 0,
create a class that overloads 'operator[]'. Then add or subtract any
number you like to the index inside this operator function. But your
compiler will now have a more difficult time optimizing this.

BTW, "a lot" is two words.

-- Phlip
====== http://users.deltanet.com/~tegan/home.html ======
-- Read my lips - no new platitudes! --


John Nagle

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to

c.na...@mail.utexas.edu*nospam* (Chris) writes:
>Anyone know why arrays begin at zero? Wouldn't it be alot easier to
>start at one so the last piece of data was the size of the array, not
>size -1?

>why do they do that?

It seems to work out better. In FORTRAN, arrays start at 1,
and you seem to have "+1" all over the place in programs. In
Pascal, you get to pick a lower bound, and programs with 0 seem to
be cleaner than programs with 1. I've tried it both ways, and
it does seem cleaner to start from 0.

John Nagle

Sam Thomas

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to

On Tue, 01 Jul 1997 19:06:39 -0400, Curtis Bass
<c-b...@freedomnet.com> wrote:

>Chris wrote:
>>
>> Anyone know why arrays begin at zero? Wouldn't it be alot easier to
>> start at one so the last piece of data was the size of the array, not
>> size -1?
>

>The simple reason is, "Because computers start counting from zero." It's
>simply efficient for arrays to start at zero; the value "zero" is a
>perfectly valid value, so why waste it by starting from one?
>
>Once you get used to it, it makes sense.

Chris and Curtis:
I have to agree with you, Curtis.. That's the only real
reason that an array would start counting at 0.. A computer counts in
bits (all 0's and 1's), so the best place to start is 0.. I don't
know if this makes sense, but it does if you work with computers
enough (I hope)..

=======================================
Sam Thomas
sjt...@psu.edu
http://www.personal.psu.edu/sjt123/
=======================================

Charles Lin

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to

Chris (c.na...@mail.utexas.edu*nospam*) wrote:
|| Anyone know why arrays begin at zero? Wouldn't it be alot easier to
|| start at one so the last piece of data was the size of the array, not
|| size -1?

One reason is the relation between arrays and pointers. When you
have an array, say, x, and call it with index, i (that is, x[ i ]),
you are really doing *(x + i), where x is a location in memory pointing
to the beginning of an array. Now this uses pointer arithmetic,
so x + i is really more like addr(x) + sizeof(*x) * i; In other words,
if an array holds integers, and integers take 4 bytes each, then
x[ 2 ] is 8 (i.e., 4 * 2, where 4 is the number of bytes per integer,
and 2 is the index) bytes past x.

So, notice that x[ 0 ] is the same as *(x + 0) or equivalent
*x. And it makes more sense, in this case, to have x start at
data.

In any case, you get used to the fact that the arrays don't
go up to N, but only go to N - 1. Just as you get very aware
that testing for equality in C requires ==, not =.

--
Charles Lin
cl...@cs.umd.edu

Herbert Peters

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to

One reason is historic. The programs would run a little faster, and it was
easier to apply pointer arithmetics. Consider this

char string [20];
char *strptr = string;

strptr += 5; // To which element does strptr point now? To element 5 or 6?

For the compiler this is easy to translate and understand as it is for
humans. IMHO it is very unintuitive in regards of pointer if the string
starts from 1 to 20. See the example with pointer arithmetics above.

Chris <c.na...@mail.utexas.edu*nospam*> 次寫入到主題
<33b98981...@163.181.250.1>...


> Anyone know why arrays begin at zero? Wouldn't it be alot easier to
> start at one so the last piece of data was the size of the array, not
> size -1?
>

> why do they do that?
>

> Chris
>

Scott McMahan

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to

: Anyone know why arrays begin at zero?

In C, array syntax was syntactical sugar that internally turned into
pointer addressing. Pointers are *offsets* into memory from a specific
address, so they start at 0 (that is, the first element is at a zero
offset from the beginning). You're not talking about an element of the
array, you're talking about an offset from the beginning of the array.

C was designed by operating system writers for operating system
writers, and this kind of "offset thinking" is more natural than
1-based thinking when you're dealing with the machine level. The fact
that C broadened in its appeal to where even BASIC and FORTRAN
programmers started using it was not forseen by its authors!

C++ just inherits C's way of looking at things. In C++, you can create
an array class that can use any indexing scheme you prefer.

Scott


collin

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to

Because an array is roughly just a series of sequential memory
addresses. The zeroth element starts at the beginning. The 1st
element is one size unit down the memory addresses (or up depending on
how you look at it).


Jim shaw

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to


Chris <c.na...@mail.utexas.edu*nospam*> wrote in article


<33b98981...@163.181.250.1>...
> Anyone know why arrays begin at zero? Wouldn't it be alot easier to

Because each array element is an offset from the start. The first element
is always at the start of the array, and since arrays are really pointers
(historically) to memory, the first element is 0*sizeof(array) bytes offset
from the beginning. It was due to optimization reasons as well.

> start at one so the last piece of data was the size of the array, not
> size -1?

Logically it would be nice to! Actually, a lot of bugs are due to loop
errors, array offsets. I just recently fixed a memory overwrite (another
companies code) due to this offset problem. It solved nine different bugs
that were related to one line of code.

>
> why do they do that?
>
> Chris
>

Ciao!
Jim Shaw


Roger Vaughn

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to

"Herbert Peters" <hrpe...@ksts.seed.net.tw> writes:

>
> One reason is historic. The programs would run a little faster, and it was
> easier to apply pointer arithmetics. Consider this

>...

> For the compiler this is easy to translate and understand as it is for
> humans. IMHO it is very unintuitive in regards of pointer if the string
> starts from 1 to 20. See the example with pointer arithmetics above.

This isn't entirely correct. Arrays begin at 0 *because* of pointer
arithmetic, and because of the semantics of the array syntax.

The compiler translates an array reference, say:

a[i] = 1;

into an equivalent pointer-arithmetic statement:

*(a+i) = 1;

before interpreting it. This shows explicitly how the index is used as
a pointer offset to access the array. Since the first element of an array
is always 0 bytes from the array base address, you have to add 0 to the base
to get there. Backtracking into the array syntax, this means that the
first array index must be 0.

Now, it should be possible for the compiler to automatically subtract one
from the array index during the translation, but this has the serious
side-effects of compromising semantics of the language. Remember that
in C/C++, arrays are not really a complete type - they are simply a special
use of pointers. If we were to use 1 as the array base, then a[i] and
(a+i) would no longer be semantically equivalent, and believe me, would
be much more confusing to programmers everywhere.

You can prove all of this to yourself. If you have an array reference,
say:

a[i] = 1;

try writing it this way:

i[a] = 1;

It will still work because of how the compiler treats array syntax -
(a+i) and (i+a) are equivalent (because the "+" operator is commutative).

BTW, please don't make a practice of reversing your array refs in code -
it will just confuse everyone!


Roger Vaughn

Dag Henriksson

unread,
Jul 4, 1997, 3:00:00 AM7/4/97
to

Chris wrote:

> Anyone know why arrays begin at zero? Wouldn't it be alot easier to

> start at one so the last piece of data was the size of the array, not
> size -1?
>

> why do they do that?
>
> Chris

Think of the subscript operator [] as this:int a[10];
a[i] will give you the object pointed to by ( 'a' incremented (++) 'i'
times )
Thinking of it this way it would be even more confusing if 'a' should be
incremented 'i - 1' times...

By definition, the expression a[i] means *( a + i ); since addition
is commutative this is equivalent to *( i + a ). It follows therefore,
that i[a] mens the same thing as a[i].
This might not be true for overloaded operators. There is no guarantee
that i[a] == a[i] for user-defined types.


M. Kilgore

unread,
Jul 16, 1997, 3:00:00 AM7/16/97
to

Look at the situation from a binary stand point. If you exclude zero then
your index into the array has one less position it can point to before you
have to go to the next larger variable type. For instance: an 8-bit
unsigned integer can represent 256 discrete states. If you consider zero to
be an invalid state then if you want to index into an array with more than
256 elements you wouldn't be able to with just 8-bits.

EigenJosh <eige...@bigfoot.com> wrote in article
<33cd1711...@news.ioc.net>...
> Well, I think it makes bit shifts and offsets easier to understand.

EigenJosh

unread,
Jul 16, 1997, 3:00:00 AM7/16/97
to

George Kinney

unread,
Jul 17, 1997, 3:00:00 AM7/17/97
to

Arrays are just contiguous areas of memory.

Instead of thinking of an array index as just that, an index,
think of it as an offset. (which it is, in other words, how
many items precede this one in this list? If it is the first,
then none.)

example: array[0] is item at 0 offset (number of items to this
point is 0) from the beginning of the array(i.e. first item)

Also, the last item in an array of size n is at offset n-1, thus
the notation.

And for what it is worth, even in languages that commonly use 1,
0 is usually the proper base. It's a design due to the way
computers are actually designed and operate, not just a fluke
pulled out of the blue.

EigenJosh <eige...@bigfoot.com> wrote in article
<33cd1711...@news.ioc.net>...

Curtis Crowson

unread,
Jul 17, 1997, 3:00:00 AM7/17/97
to

George Kinney wrote:
>
> Arrays are just contiguous areas of memory.
>
> Instead of thinking of an array index as just that, an index,
> think of it as an offset. (which it is, in other words, how
> many items precede this one in this list? If it is the first,
> then none.)
>
> example: array[0] is item at 0 offset (number of items to this
> point is 0) from the beginning of the array(i.e. first item)
>
> Also, the last item in an array of size n is at offset n-1, thus
> the notation.
>
> And for what it is worth, even in languages that commonly use 1,
> 0 is usually the proper base. It's a design due to the way
> computers are actually designed and operate, not just a fluke
> pulled out of the blue.

It was an original design decision in C that had to be carried over to
C++. It does not generate any better code. C was originally thought of
as a portable assembly language. So you have offsets of 0 into memory
would be the first element.

In languages where you can vary the offset to match the data in the
array like pascal you can declare array test[-5..5, 25..30]. The
compiler can generate the offset calculations to be just as efficient C.
(Maybe more if you have to include code in C to transform your indexes
to the 0 begining format).

This is a religious issue with a lot of C programmers. You will here a
lot of comments about how more natural it is and how it generates better
code, and etc.. This was just a language decision that the original
designers made.

Most other languages do not do it this way and the original designers of
those languages could have made the same choice but didn't for their own
reasons.

George Kinney

unread,
Jul 18, 1997, 3:00:00 AM7/18/97
to

Curtis Crowson <curtis_...@emory.org> wrote in article
<33CE57...@emory.org>...
> George Kinney wrote:
[CLIP]

> > And for what it is worth, even in languages that commonly use 1,
> > 0 is usually the proper base. It's a design due to the way
> > computers are actually designed and operate, not just a fluke
> > pulled out of the blue.
[CLIP]

> In languages where you can vary the offset to match the data in the
> array like pascal you can declare array test[-5..5, 25..30]. The
> compiler can generate the offset calculations to be just as efficient C.
> (Maybe more if you have to include code in C to transform your indexes
> to the 0 begining format).

I wasn't trying to start a religious war, I use several languages
regularly.
I just used C/C++ as an example 'cause it was posted to c.l.c++. I'm a
serious
heathen when it comes to computer languages. :)

But even if you declare an array as [-5..5, 25..30], it will still map
the elements to a single contiguous piece of memory, and the first
element will still be at offset 0. In other words, it is not that it
won't generate code as efficiently as a C style array, it will generate
the same thing. You just didn't type it that way, but the computer still
works the same way. I would argue that there is no need to use any other
base
than 0 since that is what is actually produced. It is a design issue, and
does
vary from language to language, but I don't think it was done just to
confuse people.

Also, if you had to recode your indexes in C to match a 0 based array,
then your algorythm probably has fundamental problems to begin with. You
should know full well the types and amounts of data you will be handling,
and
how you will handle them before you handle them. That's just good design
practice, regardless of your language of choice.

The only advantage I can see is in the readability of the source code.
Which has it's place, but a given piece of code's readability is often
in direction relation to the reader's understanding of the language.
(obfuscated code, and shrouding methods are of course exceptions, they
don't make sense to most people, except perhaps to Fortran gurus :)

I have nothing against higher level languages, and more abstract data
types,
which can be much easier for humans to read, but I tend to be cautious
around
them. Such methods can easily lead to sloppy coding, and poorly designed
algorythms. Maybe I'm just a control geek though.

Of course with the pace at which compilers are advancing, in many cases the
compiler can optimize and emit better code than you are likely to write, so
this whole issue may likely be moot before long.

Mike Rubenstein

unread,
Jul 18, 1997, 3:00:00 AM7/18/97
to

"George Kinney" <goo...@mail.net> wrote:

Note that this is not true in all languages. In PL/I, for example, an
array is not necessarily a contigous piece of memory. Even with
constant subscripts, the offset cannot in general be computed at
compile time (though it can in many common situations).

>
> Also, if you had to recode your indexes in C to match a 0 based array,
> then your algorythm probably has fundamental problems to begin with. You
> should know full well the types and amounts of data you will be handling,
> and
> how you will handle them before you handle them. That's just good design
> practice, regardless of your language of choice.

Or perhaps the algorithm does not have a fundamental problem -- it
might just be that your statement is too broad.

As an example, consider the common data structures used in solving the
n queens problem.

Typically one will have an array to indicate whether there is a queen
on a column (or row, but both row and column arrays are not needed).
It's reasonably natural to number the columns from 0 through n - 1.

One also needs to keep track of which diagonals have queens on them.
For up diagonals (i.e., those going up as we go from left to right)
the sum of the row and column number is constant and and the natural
array to keep track of this has subscripts running from 0 through 2n
- 2.

For down diagonals, the difference of the row minus the column is
constant and the natural array has subscripts going from -n + 1
through n - 1.

These structures make the algorithm easy. When we place a queen on
row, col we also place it on up diagonal row + col and down diagonal
row - col.

Note that I'm not saying that handling this is at all difficult or
that it is a reason for introducing non-zero based arrays, but it is
an example where a good algorithm will use them and will need some
(albeit very easy) translation for C[++]. In C we must place the down
diagonal using subscript row - col + n - 1, easy, but certainly not as
clear.


Michael M Rubenstein

Mike Rubenstein

unread,
Jul 18, 1997, 3:00:00 AM7/18/97
to

mik...@ix.netcom.com (Mike Rubenstein) wrote:

> One also needs to keep track of which diagonals have queens on them.
> For up diagonals (i.e., those going up as we go from left to right)
> the sum of the row and column number is constant and and the natural
> array to keep track of this has subscripts running from 0 through 2n
> - 2.

I should have mentioned that I was assuming that the board is numbered
with the upper left square being 0,0. Obviously, if you number the
board differently the details will change, but the princple remains
the same.

Michael M Rubenstein

Russell Corfman

unread,
Jul 18, 1997, 3:00:00 AM7/18/97
to

But in C or C++ this can be handled easily. You just have to
keep track of the change when deallocating memory (if memory
is allocated.)

For example, using the n queens problem quoted below:

WhateverType downDiagArray[2 * n];
WhateverType *downDiag = downDiagArray + n - 1;

Now you can access the downDiagArray via downDiag using
subscripts (1 - n) to (n - 1):

for (int idx = 1-n; idx < n; ++idx)
downDiag[idx] = blah;

Note that there is absolutely no loss of efficiency compared
to using zero based subscripts (except for setting up the
pointer at the start). Actually, it may be more efficient
since you don't have to do things like:

for (int idx = 1-n; idx < n; ++idx)
downDiagArray[idx-n+1] = blah;

I converted some Fortran programs to C and it made it a lot
simpler to adjust the pointer by one than to worry about
changing the array indices from being one based to zero based.

This is exactly what is done in "Numerical Recipes in C" too.

Adios,
Russell Corfman

In article <33d14e8c....@nntp.ix.netcom.com>,


Mike Rubenstein <mik...@ix.netcom.com> wrote:
>Or perhaps the algorithm does not have a fundamental problem -- it
>might just be that your statement is too broad.

>As an example, consider the common data structures used in solving the
>n queens problem.

>Typically one will have an array to indicate whether there is a queen
>on a column (or row, but both row and column arrays are not needed).
>It's reasonably natural to number the columns from 0 through n - 1.

>One also needs to keep track of which diagonals have queens on them.


>For up diagonals (i.e., those going up as we go from left to right)
>the sum of the row and column number is constant and and the natural
>array to keep track of this has subscripts running from 0 through 2n
>- 2.

>For down diagonals, the difference of the row minus the column is

Mike Rubenstein

unread,
Jul 19, 1997, 3:00:00 AM7/19/97
to

corf...@agcs.com (Russell Corfman) wrote:

I do not like your fix. It's particulary troublesome that you
discuss it on a newsgroup frequented by beginners without warning that
one is very limited in the adjustments one can do to the subscripting.
A novice might assume, for example, that one could use this as a
general method of adjusting subscripts.

Most troubling is that you claim to convert code from Fortran by
adjusting the array by 1. This is illegal. Code like

int a_arr[10];
int* a = a_arr - 1;
a[1] = 0;

results in undefined behavior. If p is a pointer to an element of an
array, then p + n and p - n must fall within the array or one past the
last element. If this is not the case, the behavior is undefined
(i.e., anything may happen) (draft 5.7). This is also true in C (ISO
6.3.6).

As I said, it's not difficult, but your response demonstrates the
problem. There are a lot of people who don't know C[++} very well who
will code an incorrect fix.

Michael M Rubenstein

Fred

unread,
Jul 19, 1997, 3:00:00 AM7/19/97
to

Andy Connors <n...@nospam.com> wrote in article
<33d176da...@news.mindspring.com>...

> On 2 Jul 1997 16:54:22 GMT, "Jim shaw" <James_...@msn.com> wrote:
>
> >
> >
> >Logically it would be nice to! Actually, a lot of bugs are due to loop
> >errors, array offsets. I just recently fixed a memory overwrite (another
> >companies code) due to this offset problem. It solved nine different
bugs
> >that were related to one line of code.
>
>

K & R defined an expression like this a[5] to *(a + 5). So, it is clear
that if array was not starting at zero, it should not be possible to access
the first item.


0 new messages