Why index starts in C from 0 and not 1

kapilk

unread,

Aug 16, 2004, 7:38:08 AM8/16/04

to

Sir,

I know that the array index starts in C from 0 and not 1 can any
body pls. tell me the reason.

Is it because in the subscript i can have a unsigned integer and
these start from 0

Thanks

Allan Bruce

unread,

Aug 16, 2004, 7:47:37 AM8/16/04

to

"kapilk" <kapilp...@rediffmail.com> wrote in message
news:692a8f14.04081...@posting.google.com...

I think it is due to the way that the compilers work. If you have an array
of sometype then the way to access these uses the notation

addressOfStartOfArray + (index * sizeof(sometype))

if the accesses were from 1, then this would add extra computation and
therefore be slower. Also, almost every programming language adopts 0 as
the initial index.

Allan

Message has been deleted

Does It Matter

unread,

Aug 16, 2004, 8:58:20 AM8/16/04

to

I would suspect it has something to do with the fact that C language is a
language designed to work closely with the hardware architecture and most
assembly languages that has an indexed addressing mode start at zero.

On the other hand, C language originated on a PDP-11. The PDP-11 assembly
language just uses a fixed source and destination for things like
assignment (MOV), addition (ADD), subtraction (SUB) and comparison (CMP).
In other words, there is not address+offset mode like the C68000 or more
modern processors.

If you believe this is why C starts at zero you'll have to ask the
question, why does assembly language start at zero? But you'll have to ask
it in an assembly language newsgroup.

--
Send e-mail to: darrell at cs dot toronto dot edu
Don't send e-mail to vice.pr...@whitehouse.gov

Thomas Dickey

unread,

Aug 16, 2004, 10:43:08 AM8/16/04

to

Does It Matter <dar...@no.unwanted.email.thanks.com> wrote:

> On the other hand, C language originated on a PDP-11. The PDP-11 assembly
> language just uses a fixed source and destination for things like
> assignment (MOV), addition (ADD), subtraction (SUB) and comparison (CMP).
> In other words, there is not address+offset mode like the C68000 or more
> modern processors.

That's incorrect (the PDP-11 has 8 addressing modes - including offsets
from a register value).

--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net

Thomas Dickey

unread,

Aug 16, 2004, 10:45:11 AM8/16/04

to

Does It Matter <dar...@no.unwanted.email.thanks.com> wrote:

> If you believe this is why C starts at zero you'll have to ask the
> question, why does assembly language start at zero? But you'll have to ask
> it in an assembly language newsgroup.

...or Pascal, or other languages that don't date from 1959.

Default User

unread,

Aug 16, 2004, 12:38:23 PM8/16/04

to

Probably because the array indexing operator is really syntactic sugar
for pointer operations.

ptr[i] == *(ptr + i);

Obviously, when using pointer arithmetic, the first element is at ptr +
0, so the first element when using [] to access it is ptr[0].

Brian Rodenborn

Gordon Burditt

unread,

Aug 16, 2004, 2:06:59 PM8/16/04

to

My answer to this is that C starting from zero is likely to be influenced
by a lot of *MATHEMATICS* starting from zero.

Also, it is more likely that loading or storing an element of an array can
be accomplished with a single machine instruction if you don't have to
deal with the offset of 1.
Gordon L. Burditt

Thomas Stegen

unread,

Aug 16, 2004, 3:27:10 PM8/16/04

to

Maybe because the index value is reallythe offset from the start
of the array...

One never knows though.

--
Thomas.

Lew Pitcher

unread,

Aug 16, 2004, 3:44:08 PM8/16/04

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thomas Stegen wrote:

> kapilk wrote:
>
>> Sir,
>> I know that the array index starts in C from 0 and not 1 can any
>> body pls. tell me the reason.
>>
>> Is it because in the subscript i can have a unsigned integer and
>> these start from 0
>>
>
> Maybe because the index value is reallythe offset from the start
> of the array...

Bingo!

"Rather more surprising, at least at first sight, is the fact that a reference
to a[i] can also be written as *(a+i). In evaluating a[i], C converts it to
*(a+i) immediately; the two forms are completely equivalent. Applying the
operator & to both parts of this equivalence, it follows that &a[i] and a+i are
identical: a+i is the address of the i-th element beyond a." (from Section 5.3
of "The C Programming Language" by Brian W. Kernighan and Dennis M. Ritchie, (c)
1978)

So, the genesis of C has a+i being the same as a[i]. If a is an array, then
&a[1] is the same as a+1, and thus a+0 must be the same as &a[0]. This makes
arrays zero based.

This is not to say that the C standard retains this bias. Simply that it came
from the fact that the index value of an array was really the offset of the
specific item from the start of the array.

- --
Lew Pitcher
IT Consultant, Enterprise Application Architecture,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFBIQ6FagVFX4UWr64RAu9NAKD0AjpIVqgsBerdAA3Rt355FnHdjACfTyUG
293Wn2tpoVhKs4IHcx2PwIY=
=ck5V
-----END PGP SIGNATURE-----

E. Robert Tisdale

unread,

Aug 16, 2004, 4:44:41 PM8/16/04

to

Lew Pitcher wrote:

> Thomas Stegen wrote:
>
>>kapilk wrote:
>>

>>> I know that the array index starts in C from 0 and not 1.
>>> Can anybody please tell me the reason?

>>>
>>> Is it because in the subscript i can have a unsigned integer
>>> and these start from 0

No.

>> Maybe because the index value is really
>> the offset from the start of the array...
>
> Bingo!
>
> "Rather more surprising, at least at first sight,
> is the fact that a reference to a[i] can also be written as *(a+i).
> In evaluating a[i], C converts it to *(a+i) immediately;
> the two forms are completely equivalent.
> Applying the operator & to both parts of this equivalence,
> it follows that &a[i] and a+i are identical:
> a+i is the address of the i-th element beyond a."
> (from Section 5.3 of "The C Programming Language"
> by Brian W. Kernighan and Dennis M. Ritchie, (c) 1978)
>
> So, the genesis of C has a+i being the same as a[i].
> If a is an array, then &a[1] is the same as a+1,
> and thus a+0 must be the same as &a[0]. This makes arrays zero based.
>
> This is not to say that the C standard retains this bias.
> Simply that it came from the fact that the index value of an array
> was really the offset of the specific item from the start of the array.

You forgot to answer, "Why?"

In order to reference element a[i],
the computer must first calculate its address.
If you use a [one-based] index,
the compiler would be obliged to calculate

(a + i - 1)

Today, good optimizing C compilers
would eliminate the superfluous subtraction
but, when K & R were designing C,
compilers usually didn't have the resources
(fast processors and large memories)
required to perform such optimizations.

Martin Ambuhl

unread,

Aug 16, 2004, 5:13:06 PM8/16/04

to

Does It Matter wrote:

> On the other hand, C language originated on a PDP-11. The PDP-11 assembly
> language just uses a fixed source and destination for things like
> assignment (MOV), addition (ADD), subtraction (SUB) and comparison (CMP).
> In other words, there is not address+offset mode like the C68000 or more
> modern processors.

This is just silly. Please check the eight addressing modes in the
PDP-11 before posting more (just barely topical) "information."

Joe Wright

unread,

Aug 16, 2004, 7:29:15 PM8/16/04

to

Because I like it that way! But really, it's hard to say.

IBM was the first major OEM disk drive maker. IBM numbers tracks
from 0 and sectors from 1. Why? Seagate, Western Digital, Maxtor,
etc. do the same. Why?

Bytes in a record are numbered from 0 while columns on a punch card
number from 1. Go figure.
--
Joe Wright mailto:joeww...@comcast.net
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Dan Pop

unread,

Aug 17, 2004, 10:20:51 AM8/17/04

to

In <692a8f14.04081...@posting.google.com> kapilp...@rediffmail.com (kapilk) writes:

> I know that the array index starts in C from 0 and not 1 can any
>body pls. tell me the reason.

Because the language designers decided to make array[i] an alternate
notation for *(array + i). They could have chosen to make array[i]
an alternate notation for *(array + i - 1), in which case array
indices would have been 1-based, but they didn't.

I don't know if this is an original C feature or merely inherited from
one of its predecessors (CPL, BCPL, B).

To someone with a solid assembly background, 0-based indexing appears as
the most natural option, because this is how indexed addressing modes
work on most processors supporting them. And the processor for which
C was originally designed was no exception.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan...@ifh.de

Dan Pop

unread,

Aug 17, 2004, 10:25:11 AM8/17/04

to

In <cfq6sn$q5u$1...@news.freedom2surf.net> "Allan Bruce" <all...@TAKEAWAYf2s.com> writes:

>if the accesses were from 1, then this would add extra computation and
>therefore be slower. Also, almost every programming language adopts 0 as
>the initial index.

The most popular languages at the time C was designed used 1-based
indexing: FORTRAN, BASIC, Pascal.

Richard Tobin

unread,

Aug 17, 2004, 11:58:18 AM8/17/04

to

In article <cfq6sn$q5u$1...@news.freedom2surf.net>,
Allan Bruce <all...@TAKEAWAYf2s.com> wrote:

>I think it is due to the way that the compilers work. If you have an array
>of sometype then the way to access these uses the notation
>
>addressOfStartOfArray + (index * sizeof(sometype))
>
>if the accesses were from 1, then this would add extra computation and
>therefore be slower.

Only if the compilers were particularly stupid.

Real compilers would just produce

(addressOfStartOfArray - sizeof(sometype)) + (index * sizeof(sometype))

where the first parenthesized expression is known at compile time.

C arrays start at zero because it's The Right Thing to do.

-- Richard

boa

unread,

Aug 17, 2004, 12:22:22 PM8/17/04

to

Richard Tobin wrote:

> In article <cfq6sn$q5u$1...@news.freedom2surf.net>,
> Allan Bruce <all...@TAKEAWAYf2s.com> wrote:
>
>
>>I think it is due to the way that the compilers work. If you have an array
>>of sometype then the way to access these uses the notation
>>
>>addressOfStartOfArray + (index * sizeof(sometype))
>>
>>if the accesses were from 1, then this would add extra computation and
>>therefore be slower.
>
>
> Only if the compilers were particularly stupid.
>
> Real compilers would just produce
>
> (addressOfStartOfArray - sizeof(sometype)) + (index * sizeof(sometype))
>
> where the first parenthesized expression is known at compile time.

Always? Even when the "array" is a pointer to dynamically allocated memory?

>
> C arrays start at zero because it's The Right Thing to do.

Agreed. ;-)

boa@home

Richard Tobin

unread,

Aug 17, 2004, 1:02:05 PM8/17/04

to

In article <2hqUc.172$8c.1...@juliett.dax.net>,
boa <ro...@localhost.com> wrote:

>>>addressOfStartOfArray + (index * sizeof(sometype))
>>>
>>>if the accesses were from 1, then this would add extra computation and
>>>therefore be slower.

>> Real compilers would just produce

>>
>> (addressOfStartOfArray - sizeof(sometype)) + (index * sizeof(sometype))
>>
>> where the first parenthesized expression is known at compile time.

>Always? Even when the "array" is a pointer to dynamically allocated memory?

True, I was assuming addressOfStartOfArray was supposed to be a constant.

But in many common cases, other optimizations will remove the
overhead. For example, when looping over the array, the index can be
adjusted instead of the base.

-- Richard

Keith Thompson

unread,

Aug 17, 2004, 1:34:59 PM8/17/04

to

Dan...@cern.ch (Dan Pop) writes:
> In <cfq6sn$q5u$1...@news.freedom2surf.net> "Allan Bruce"
> <all...@TAKEAWAYf2s.com> writes:
>
> >if the accesses were from 1, then this would add extra computation and
> >therefore be slower. Also, almost every programming language adopts 0 as
> >the initial index.
>
> The most popular languages at the time C was designed used 1-based
> indexing: FORTRAN, BASIC, Pascal.

Quibble: Pascal allows arrays to be based however the user specifies.
For example (if I remember the syntax correctly):

type
My_Array = array[37 .. 42] of Integer;

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

kal

unread,

Aug 17, 2004, 8:30:00 PM8/17/04

to

ric...@cogsci.ed.ac.uk (Richard Tobin) wrote in message news:<cftdmd$1k89$1...@pc-news.cogsci.ed.ac.uk>...

> True, I was assuming addressOfStartOfArray was supposed to be a constant.
>
> But in many common cases, other optimizations will remove the
> overhead. For example, when looping over the array, the index can be
> adjusted instead of the base.

Not a satisfactory explanation. Your earlier statement was incorrect.

kal

unread,

Aug 17, 2004, 9:14:10 PM8/17/04

to

Dan...@cern.ch (Dan Pop) wrote in message news:<cft483$d3$2...@sunnews.cern.ch>...

> To someone with a solid assembly background, 0-based indexing appears as
> the most natural option, because this is how indexed addressing modes
> work on most processors supporting them. And the processor for which
> C was originally designed was no exception.

I don't have a solid background in assembly, or in anything
else for that matter; still, I find zero based counting
(not to mention indexing) convenient.

For instance, if the years had started at 0 we wouldn't have
had all the argument about if 2000 or 2001 is the start of the
new millennium.

Zero based counting also uses the full value set of any given
number of bits.

Then there are conveniences such as:

string[length] = 0;

for(i = 0; i < size; i++) a[i] += b[i%8];

IMHO the list is quite long. So, the question should not be
why the indexing starts at zero in C but rather why it doesn't
start at zero in some languages.

<OT>
> Dan
Hope you had a great vacation.
</OT>

Dan Pop

unread,

Aug 18, 2004, 6:50:31 AM8/18/04

to

In <ln1xi5v...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:

>Dan...@cern.ch (Dan Pop) writes:
>> In <cfq6sn$q5u$1...@news.freedom2surf.net> "Allan Bruce"
>> <all...@TAKEAWAYf2s.com> writes:
>>
>> >if the accesses were from 1, then this would add extra computation and
>> >therefore be slower. Also, almost every programming language adopts 0 as
>> >the initial index.
>>
>> The most popular languages at the time C was designed used 1-based
>> indexing: FORTRAN, BASIC, Pascal.
>
>Quibble: Pascal allows arrays to be based however the user specifies.
>For example (if I remember the syntax correctly):
>
> type
> My_Array = array[37 .. 42] of Integer;

So does F77, IIRC, but the default is 1-based indexing.

Richard Tobin

unread,

Aug 18, 2004, 8:31:30 AM8/18/04

to

In article <a5da4cc1.04081...@posting.google.com>,
kal <k_a...@yahoo.com> wrote:

>Not a satisfactory explanation.

Not a satisfactory exlpanation of what?

-- Richard

Does It Matter

unread,

Aug 20, 2004, 3:08:45 PM8/20/04

to

On Mon, 16 Aug 2004, Thomas Dickey wrote:

> Does It Matter <dar...@no.unwanted.email.thanks.com> wrote:
>
> > On the other hand, C language originated on a PDP-11. The PDP-11 assembly
> > language just uses a fixed source and destination for things like
> > assignment (MOV), addition (ADD), subtraction (SUB) and comparison (CMP).
> > In other words, there is not address+offset mode like the C68000 or more
> > modern processors.
>
> That's incorrect (the PDP-11 has 8 addressing modes - including offsets
> from a register value).

Obviously, from the number of people who corrected me, you are correct. It
has been a long time since I used PDP-11 assembly and my memory has
obviously failed me.

Thank you for correcting me.

> --
> Thomas E. Dickey
> http://invisible-island.net
> ftp://invisible-island.net
>

--

James Dow Allen

unread,

Aug 23, 2004, 3:12:55 AM8/23/04

to

kapilp...@rediffmail.com (kapilk) wrote in message news:<692a8f14.04081...@posting.google.com>...

> I know that the array index starts in C from 0 and not 1 can any
> body pls. tell me the reason.

I've appended an excerpt from
http://tinyurl.com/2452h/lesson5.htm

James

* * *& * * * * * * * * *

Realizing that *ptr and ptr[0] are synonyms
[when evaluated as expressions],
and that & is just the inverse of *,
we know immediately that all of the following
(or rather that subset of them
legal in a given context) must be equivalent:

* ptr
* * & ptr
* & * ptr
& * * ptr /* legal if (*ptr) is a pointer */
& & * * * ptr /* legal if (**ptr) is a pointer */
* (ptr + 0)
ptr [0]

Appreciate the sheer simplicity and elegance !
Note that C's choice of 0 for the first index of
an array follows as the night the day as long
as we insist that

ptr == ptr + 0

.....

Chris Torek

unread,

Aug 23, 2004, 4:01:40 AM8/23/04

to

In article <news:266426e1.04082...@posting.google.com>
James Dow Allen <jdall...@yahoo.com> wrote (in part):

>I've appended an excerpt from
>http://tinyurl.com/2452h/lesson5.htm

> & & * * * ptr /* legal if (**ptr) is a pointer */

This one is wrong. The rest appear to be correct, at least in
C99.

C89 and C99 are rather different here -- in C89, "&" and "*" do
not "automatically cancel", as it were, so if p == NULL, then the
expression:

p == NULL

is always OK (and produces the "int" value 1), but:

&(*p) == NULL

is an error in C89, but valid in C99. (The C99 way is "better",
in my opinion. The C89 restriction allows a really stupid compiler
to generate code to follow the pointer, even though the result of
that indirection is never used. I find it hard to imagine that a
compiler could do this and yet still get the right answer -- the
"int" value 0 -- when p is both valid and non-NULL.)

The problem with "& & * * ptr" is that the operand of the first
"&" is another "&", which produces a value ("rvalue") rather than
an object ("lvalue", more or less, except for the bizarre redefinition
in C99). The "&" operator can only be applied to objects.

Clearly, if we were to have the inner "&*" pair cancel out first,
then (not legal C syntax):

(&(&*)*) *ptr

would have the inner parenthesized sequence drop out, leaving:

(&*) *ptr

which would then have the parenthesized pair drop out, leaving just
"*ptr". Unfortunately, nothing says the cancelling is done inside-out
like this; and in fact, gcc -- assuming gcc is correct here! --
says:

foo.c:2: invalid lvalue in unary `&'

(under both -std=c89 and -std=c99).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.