On Monday, August 29, 2022 at 4:16:00 AM UTC-7, Juha Nieminen wrote:
(snip)
> I don't know if it's the *original* reason, but I would assume that at least
> in C one of the main reasons is the principle of maximum efficiency.
> In many processor architectures the concept of "array" exists, at least
> when it comes to values of the register sizes (ie. usually 1-byte,
> 2-byte, 4-byte and 8-byte elements, the last one at least on 64-bit
> architectures). Prominently the concept of an indexable array exists
> in the x86 architecture. (I don't remember now if it also exists in
> the ARM architecture, but I would guess so.)
BCPL, a predecessor of C, was written on a word addressable
machine. Pointer arithmetic was integer arithmetic. It was with
C that variables could be larger than the addressable unit,
and that pointers (addresses) incremented and decremented
in appropriate sized units.
In any case, early C (and B and BCPL) programmers were likely
also assembly programmers, and would have been used
to 0 origin indexing.
> Generally when a processor architecture supports the concept of an "array",
> it does so by having instructions that take (at least) two registers as
> the input or the output parameter: A base address, and an offset. The
> memory location of the element is calculated by adding those two. (The
> number of bytes that an offset of 1 jumps depends on the instruction,
> and thus multi-byte elements are supported.)
For many machines, you have to multiply by the element size,
but yes some do it for you. C pointer arithmetic is defined in terms
of the size of the pointed-to object, and array indexing is defined
in terms of pointer arithmetic.
> Thus zero-indexing is extraordinarily natural in processor architectures:
> The "index" is actually an offset. It's a value you add to the base
> address in order to get to the location you want. Thus, the first element
> is at index/offset 0.
> Since that's the case, the most optimal way to handle low-level arrays is
> to have 0-based indexing in the programming language as well. That way
> you don't need to be subtracting 1 from the index every time an array is
> accessed (or you don't need an extraneous unused element at the beginning
> of the array, consuming memory for no reason).
In the case of Fortran, starting with static arrays, the compiler can subtract
before generating the address constant, such that the subtract is done
at compile time. No run-time cost.
For allocatable arrays, it can adjust the offset once, and use that.
For automatic variables on the stack, the stack pointer offset is
computed once.
The way C pointers are defined, that would not be quite as
easy to do. Or, C pointers are defined the way they are to make it easier.
Fortran was originally Formula Translation, and mathematicians did,
and still do, use 1 based indexing for matrices and vectors.
> Also, since C supports pointer arithmetic, many operations become simpler.
> Such as getting the index of an element when what you have is a pointer to
> it (and the pointer to the start of the array).
Note that it gets more interesting for Fortran. When you pass an array
as an argument to a subroutine, it loses the original origin.
(Well, most of the time. Remembering when it does and doesn't is
probably harder than figuring out 0 vs.1 origin.)
One that C programmers like to do, and I suspect was faster on
many early machines, is index through arrays with a pointer to an array,
indexing it as it goes. That means no subscript calculation in the loop.
Many early C processors didn't have multiply, or had slow multiply.
However, one of the favorite optimizations for Fortran compilers
is strength reduction, converting multiply inside a loop into addition
as the loop increments. The result is pretty much the same.