Indexing

3,836 views
Skip to first unread message

Noam Yorav-Raphael

unread,
Feb 23, 2012, 7:06:36 PM2/23/12
to juli...@googlegroups.com
Hello,

My name is Noam Yorav-Raphael. I found out about Julia two days ago and got really excited - I stayed up until 4am and read the manual and experimented.

I want to add my word about why I think 1-based indexing is wrong, in the hope that there's some chance that you'll change it. (I tried to reply to the thread, but couldn't)

One reason is that it will make Julia a scientific-only language. All of today's general-purpose languages are zero-based. Using 1-based indexes will mean that it will be cumbersome to interface with many things written in other languages, and that people who are looking for a general-purpose language will be skeptical about Julia. I'm using Python a lot, and numpy/scipy/matplotlib quite a bit, and I find the fact that you can do both scientific stuff and other stuff (GUI, web, files, DB) a great advantage. When I read about Julia I thought, "Wow, a language better than Python and Matlab!" With 1-based indexing I'm afraid it will just be better than Matlab.

The other reason is that I think that 0-based indexing, with (start <= i < stop) ranges are great. The moment I saw the following diagram, when learning Python, I felt that suddenly years of adding and subtracting ones have come to an end.

0   1   2   3   4   5
|   |   |   |   |   |
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+

The diagram shows an array with 5 elements, with a number line on top of it. Instead of numbers specifying cells (which have size), they specify places - just like coordinates do. So a[2] means "the value which comes right after 2", which is ok. It shines when you realize that a[2:4] means "cut the rectangle between places 2 and 4 and take what's in between". It's so clear! You want to take the last three elements of the array? Just use a[len(a)-3:len(a)] and it's obvious why it works. Compare this to a[len(a)-2:len(a)]. And I'm not talking about Python's a[-3:] shortcut, which wouldn't make sense in 1-based. You want to drop the first three elements? It's a[3:len(a)], not a[4:len(a)]. You just don't need to do plus ones and minus ones all the time.

And generally, about counting from 0 --

Why does the 18th century include all the year that start with 17--, except for 1700 which belongs to the 17th century and 1800 which belongs to the 18th? Because we count starting with 1. If the first year since Jesus' fourth birthday was called year 0, and the first century since was called the 0th century, the century number 17 would simply include the years 1700-1799.

What I'm saying is that the only advantage of 1-based indexing is that we're used to it. Besides that, it just causes trouble. Starting to count from 0 is the right thing to do. That's the reason why the assembly code to get the third element of an array will always include the number 2 - it's not some computer-sciency stuff. It's because its *real* index is 2, not 3.

Mathematicians use 1-based indices when it doesn't matter. Whenever they use 1-based indexing, they could use 0-based indexing and it will work just as fine. However, try to use 1-based indexing to index polynomial coefficients (a_0 + a_1*x + a_2*x^2) or simply digits (123 = 1*10^2 + 2*10^1 + 3*10^0) - it just won't work.

Sometimes, using 1-based indices won't hurt you. Sometimes it will. Using 0-based indices will always work. That's why 0-based arrays in Python+numpy work fine. That's why older programming languages (fortran, cobol, algol, not including lisp) use 1-based indices and today all common languages are 0-based (*). You'd have thought that since people tend to stick with the familiar, all programming languages today would be 1-based. But in spite of people used to counting from 1, and people used to programming languages counting from 1, people moved to languages that count from 0. Isn't that a sure sign that there were enough problems with 1-based so people moved to 0-based?


So, if you want a language that would be more than a Matlab replacement, and I think that you do, go with 0-based indexing. It will require some getting used to from Matlab programmers (and after that it'll be fine), but it will let you join the wider programming community.

I hope I wasn't rude. The idea of writing your own ultimate language really excited me, so when I spotted something I thought is wrong I had to try to convince you to fix it :)

Best luck,
Noam

Jeff Bezanson

unread,
Feb 23, 2012, 7:46:48 PM2/23/12
to juli...@googlegroups.com
I must say, this is a very nice argument. The only problem is that
ultimately, I can't believe it is so important. You cannot seriously
argue that it's impossible to write a gui or database with 1-based
indexing.

Also, it is important to realize that julia is very unlike matlab and
python, in that the language imposes very little. The language only
provides a type system, function definition, and a compiler. You can
define other types that use 0-based indexing. You could even fork our
standard library (which is 100% julia) and change it to 0-based
indexing. That is not terribly practical, but the important thing here
is the standard library, and you can write alternate standard
libraries for julia that would put a very different "skin" on the
language, with the same performance.

I'm more of a C programmer than matlab programmer myself, and I
understand the index arithmetic issues. I don't believe simply picking
0 or 1 can solve all indexing problems or eliminate off-by-one errors.
I got used to 1-based quickly and it doesn't bother me any more.

Thanks for all the positive impressions of julia, it means a lot!

-Jeff

Bill Hart

unread,
Feb 23, 2012, 9:13:18 PM2/23/12
to juli...@googlegroups.com
I also found Julia the other day and think 1-based is a problem for
me. But, I think there is another way to do 0-based indexing.

Julia can eval its own code. Therefore it is possible to present a
front-end which simply converts all 0-based indexing to 1-based
indexing. I don't even think it would be that hard to do, and I have
considered doing it for my own purposes. It shouldn't be inefficient.

I also think the "end" statement is largely superfluous (the "dangling
else" problem notwithstanding) and I had thought to make them optional
in any such front end I wrote for myself.

I am not working on this at the moment though. The first priority is a
decent bignum interface with GMP. I've had a bit of a play with that,
and despite some things that aren't quite clear to me yet, it seems to
be pretty easy to write a first approximation of this. Julia probably
even makes it possible to do a twos complement fixed precision bignum
interface using GMP's mpn interface. I don't know of any other
languages that would make that even vaguely efficient.

Julia presents so many possibilities. I have also thought about
writing an optional statically typed front end with type inference.
Julia seems like the perfect language to implement such a project.

Where has Julia been this past two years!? I had searched for it high
and low, day and night, to the point of nearly driving myself insane.
It's not even listed here:

http://llvm.org/ProjectsWithLLVM/

Bill.

Viral Shah

unread,
Feb 23, 2012, 9:14:33 PM2/23/12
to juli...@googlegroups.com
There is this tension between julia's general purpose nature and its
specific target for technical computing. I could give examples of how
mathematical code look much nicer with 1-based indexing for
multi-dimensional arrays. The thing with 0-based indexing is that you
always then have to write code of the type for i=0:len(a)-1 when
iterating. Either ways, there is always some indexing arithmetic.

I agree with Jeff that even though I was not sure about 1-based
indexing, it very quickly grew on me, and I do feel it is the right
choice. Even though we can easily support arbitrary base indexing with
some effort, accommodating it will slow down things - and we still
need our indexing to become faster than it is.

-viral

--
-viral

Viral Shah

unread,
Feb 23, 2012, 9:18:53 PM2/23/12
to juli...@googlegroups.com
Julia was mentioned here when LLVM 3.0 released.

http://www.llvm.org/releases/3.0/docs/ReleaseNotes.html

-viral

--
-viral

Viral Shah

unread,
Feb 23, 2012, 9:24:54 PM2/23/12
to juli...@googlegroups.com
GMP support would be really useful to have!

I guess I am not that religious about the use of end for function
definitions, but it would make it difficult to then define inner
functions. It would be useful to have a discussion on syntax, and how
to improve it.

We could think of additional syntax for supporting 0 or arbitrary
based indexing, but then there is always the question of picking a
default. We need to think harder about this issue of supporting more
general indexing without giving up the elegance or performance that we
currently have.

-viral

On Fri, Feb 24, 2012 at 7:43 AM, Bill Hart <goodwi...@googlemail.com> wrote:

--
-viral

Yvan Godin

unread,
Feb 24, 2012, 4:13:51 AM2/24/12
to juli...@googlegroups.com

2012/2/24 Viral Shah <vi...@mayin.org>

GMP support would be really useful to have!
+1 

I guess I am not that religious about the use of end for function definitions, but it would make it difficult to then define inner functions. It would be useful to have a discussion on syntax, and how to improve it.
à la python with 2 spaces indent as an alternate syntax ??
but END is not really a big problem. 

We could think of additional syntax for supporting 0 or arbitrary based indexing, but then there is always the question of picking a default. We need to think harder about this issue of supporting more general indexing without giving up the elegance or performance that we
currently have.
Julia is now V1.0 so please do not change the rules,  keep index 1 based. 
 

Just my point
    Yvan

Konrad Hinsen

unread,
Feb 24, 2012, 8:07:21 AM2/24/12
to juli...@googlegroups.com
Bill Hart writes:

> Julia can eval its own code. Therefore it is possible to present a
> front-end which simply converts all 0-based indexing to 1-based
> indexing. I don't even think it would be that hard to do, and I have
> considered doing it for my own purposes. It shouldn't be inefficient.
>
> I also think the "end" statement is largely superfluous (the "dangling
> else" problem notwithstanding) and I had thought to make them optional
> in any such front end I wrote for myself.

From a purely technical point of view you are right, and the idea of
writing your own personal syntax for Julia using its metaprogramming
features will certainly tempt many programmers, in particular those
coming from a Lisp background.

But please consider the consequences: the Julia community will become
fragmented as no one will be at ease reading someone else's code, even
though everything works fine together. That's in my opinion what
prevented Lisp from gaining serious momentum: The number of dialects
is hard to estimate, and then on top of each dialect there are dozens
of personal adaptations in the form of macro libraries.

Konrad.

Noam Yorav-Raphael

unread,
Feb 24, 2012, 10:18:54 AM2/24/12
to juli...@googlegroups.com
On Fri, Feb 24, 2012 at 2:46 AM, Jeff Bezanson <jeff.b...@gmail.com> wrote:
I must say, this is a very nice argument.
‎Thanks! 
The only problem is that
ultimately, I can't believe it is so important. You cannot seriously
argue that it's impossible to write a gui or database with 1-based
indexing.
It's surely possible. What I meant was that using other libraries that handle that stuff (for example, GTK) will be cumbersome since you'll have to use 1-based for things written for julia and 0-based for wrapped libraries (or try to convert the indices at the wrapper level, which will have its share of complications.) I think that a great deal of Python's popularity comes from the fact that it's easy to wrap C and C++ libraries, which makes it a great "glue language".

Also, it is important to realize that julia is very unlike matlab and
python, in that the language imposes very little. The language only
provides a type system, function definition, and a compiler. You can
define other types that use 0-based indexing. You could even fork our
standard library (which is 100% julia) and change it to 0-based
indexing. That is not terribly practical, but the important thing here
is the standard library, and you can write alternate standard
libraries for julia that would put a very different "skin" on the
language, with the same performance.

The fact that most of julia is written in julia itself was one of the things that got me excited. All of Python's basic types are written in C because writing them in Python would be inefficient. Not having to go to another language whenever you need some more performance is a great gain.

It's great that changing the indexing is a matter of editing the standard library and not the core language. It makes the move to 0-based more feasible. However, I don't think that having two standard libraries is a good idea. It will divide the community. I think that's one of the reasons why the D language hasn't gotten much traction: it has two standard libraries. If I fork the standard library, I will actually be creating a new language, with its own separate community. I don't want to do that.

I think that at this point you, the core developers of the language, have to decide whether to continue with 1-based indexing or to move to 0-based. There's still not a lot of external code base that will have to be converted, so it's still possible. It will require a substantial effort to port the whole standard library, but it's doable. I hope I'll be able to help if you decide of this.

1-based indexing is certainly possible. But I predict it will create a barrier between you and the rest of the programming community. Julia will be used and developed mainly by those who are using Matlab now, and the rest, who are used to C/Python/Java/Javascript/Ruby/Perl will mostly stay outside. Both because people will have to get themselves used to another mental model (and an inferior one, sorry to say), and because the interfaces between the languages will be clunky.

Why not join the rest of the programming community? It's certainly possible - Python is a great general-purpose language, and numpy/scipy/matplotlib make it a very decent technical programming language. There are a lot of people who are interested in a high level programming language which is also efficient. The more people you embrace, the more chance your language has to survive and prosper.


I'm more of a C programmer than matlab programmer myself, and I
understand the index arithmetic issues. I don't believe simply picking
0 or 1 can solve all indexing problems or eliminate off-by-one errors.
I got used to 1-based quickly and it doesn't bother me any more.

I don't think it will eliminate all errors, but I do believe that it will make all programs either more elegant and less error-prone, or as elegant as they were before. 

Thanks for all the positive impressions of julia, it means a lot!

You deserve it! 

Thanks for listening,
Noam

P.S. I have to add another reason for 0-based which I thought of. You can safely ignore it.
In Python you can use the slice syntax in assignments. The following works:
>>> a = "abcde"
>>> a[2:4]
"cd"
>>> a[2:4] = 'X'
>>> a
"abXe"

(It actually doesn't work, because strings are immutable. But it works with lists, and the string syntax will keep me from a lot of typing)
This you can translate easily to 1-based. But how do you insert a string in the middle of a string? It's simple:
>>> a = "abcde"
>>> a[2:2] = "HELLO"
>>> a
"abHELLOcde"

Remember that 2 is the line between "b" and "c", so that's exactly what you'd expect. I guess that the 1-based translation to the inserting operation is:
>>> a[3:2] = "HELLO"
which I find incomprehensible. And prepending a string is a[1:0] = "prefix", which gives you a zero index!

This is related to the reddit discussion, which brought Dijkstra's argument that zero-length slices are ugly in inclusive ranges. People said that it doesn't matter, since zero-length slices aren't meaningful. So here you see they are - they give you a great syntax for inserting a sequence into another.

Noam Yorav-Raphael

unread,
Feb 24, 2012, 10:23:44 AM2/24/12
to juli...@googlegroups.com
On Fri, Feb 24, 2012 at 4:14 AM, Viral Shah <vi...@mayin.org> wrote:
There is this tension between julia's general purpose nature and its
specific target for technical computing. I could give examples of how
mathematical code look much nicer with 1-based indexing for
multi-dimensional arrays. The thing with 0-based indexing is that you
always then have to write code of the type for i=0:len(a)-1 when
iterating. Either ways, there is always some indexing arithmetic.

Can you give some concrete examples? I believe that the properly translated code would be either simpler or of the same complexity as 1-based code.

For example, the code you mentioned will be "for i=0:len(a)", since ranges do not include the stop index. You see? No indexing arithmetic!

Noam

Konrad Hinsen

unread,
Feb 24, 2012, 11:08:48 AM2/24/12
to juli...@googlegroups.com
Noam Yorav-Raphael writes:

> On Fri, Feb 24, 2012 at 4:14 AM, Viral Shah <vi...@mayin.org> wrote:
>
> There is this tension between julia's general purpose nature and its
> specific target for technical computing. I could give examples of how
> mathematical code look much nicer with 1-based indexing for
> multi-dimensional arrays. The thing with 0-based indexing is that you
> always then have to write code of the type for i=0:len(a)-1 when
> iterating. Either ways, there is always some indexing arithmetic.
>
> Can you give some concrete examples? I believe that the properly translated code would
> be either simpler or of the same complexity as 1-based code.

I suspect the reason for 1-based indexing in Fortran and Matlab comes
from the pervasive use of 1-based indexing in mathematics. Grab any linear
algebra textbook; the sums over element of vectors or matrices always
run from 1 to N.

That said, I prefer 0-based indexing as well, coming from a C and
Python background, but I don't think it's such a big issue.

Konrad.

Joe Groff

unread,
Feb 24, 2012, 3:40:43 PM2/24/12
to juli...@googlegroups.com
On Thu, Feb 23, 2012 at 6:24 PM, Viral Shah <vi...@mayin.org> wrote:
> We could think of additional syntax for supporting 0 or arbitrary
> based indexing, but then there is always the question of picking a
> default. We need to think harder about this issue of supporting more
> general indexing without giving up the elegance or performance that we
> currently have.

As they say, all problems can be solved by another layer of
abstraction. If Julia efficiently supports and promotes sequence
manipulation using higher-order functions and/or generic abstractions
such as iterators or D-style ranges, then the underlying indexing base
becomes less of an issue.

-Joe

Jeff Bezanson

unread,
Feb 24, 2012, 4:23:47 PM2/24/12
to juli...@googlegroups.com

Good point. I think we will need a small design iteration at some
point around abstracting indexes and iteration over them a bit more.

As for automatically converting code to a different index base, I
don't think it's that easy. First, many uses of brackets might refer
to other data structures such as hash tables. At the same time,
function arguments might be indexes semantically, with no easy way to
tell --- f(0) might need to become f(1).
The only realistic approach, just to play along for a bit, is to
redefine ref and assign in array.j, at a minimum. The next problem is
all the library code that uses "for i=1:n", which I admit is a
weakness and should be more abstract.

Peer Stritzinger

unread,
Feb 24, 2012, 5:06:19 PM2/24/12
to juli...@googlegroups.com
I really can't understand why many programmers think about a new
language like this: "Hey cool new language, please make it like my old
favorite language or I won't use it".

I also recently discovered Julia and read the manual and played with
it some. And to get it out of the way at the beginning: I like what
you did so far -- Julia will definitely be in my toolbox from now on.

Now to the complaints about syntax and indexing:

* Syntax is not as important as many programmers think, its semantics
you should care more about.

Syntax is just something to get used to and if you don't think so
learn more programming languages until you don't.

What I'm not saying is that thinking about syntax is not important for
creating a new language. I'm only talking about using. I see the
same arguments again and again in different programming languages
mailing lists, from small tweaks wanted to the syntax sucks totally.
But even the small tweaks usually come with hidden pitfalls.

That said there is one property of programming language syntax I care
about: it shouldn't be too verbose. I see this like how much of my
finger joints do I have to use up to get a piece of code done (If you
don't know what I'm talking about: you will if you stay coding through
your life).

To conclude this rambling: you did well with the syntax IMHO, not too
verbose, concise and powerful.

* Indexing: well there are arguments for 0 based arrays and there are
arguments for 1 based arrays. But there is definitely no *right* way
to do this. One possible cop out is having it both ways or even have
arrays with indices from 2:5 or even negative.

0-based: good for some things, pretty hard to translate a numerical
algorithm to 0-based arrays. Getting a numerical algorithm implemented
right is hard enough as it is.

1-based: good for other things -- especially mathematical
calculation as mentioned, maybe a bit of confusion when interfacing to
the external (non Fortran/Matlab) world but this interfacing doesn't
have to be done with arrays if this should be a problem.

n-based-settable: well thats the cop out but is it really worth the
confusion? Never knowing where the arrays start ...

Talking about 0-based "like all the other languages" is not very
accurate since there are plenty of languages that have 1-base arrays.
Or different bases for different data structures like 1-based of list
and 0-based for buffers.

Again with indices like Julia the way it is and I don't want all my
programming languages the same ... otherwise why learn a new one.

Keep up the good work.

Cheers
-- Peer

Jeff Bezanson

unread,
Feb 24, 2012, 5:11:46 PM2/24/12
to juli...@googlegroups.com
Good points, and thank you for the kind words.
I think as a compromise we should switch to 2-based indexing :D

Viral Shah

unread,
Feb 25, 2012, 1:36:45 AM2/25/12
to juli...@googlegroups.com
I propose we switch to pi based indexing.

-viral

Jeff Bezanson

unread,
Feb 25, 2012, 1:41:36 AM2/25/12
to juli...@googlegroups.com
That gives me an idea: index arrays with polar coordinates!

Viral Shah

unread,
Feb 25, 2012, 2:03:36 AM2/25/12
to juli...@googlegroups.com
Hi Noam,

Putting jokes on indexing aside, ranges in julia (matlab also) do include the endpoints. In fact, for writing scientific codes, I find this to be rather convenient.

julia> for i=1:5; println(i); end


1
2
3
4
5

julia> for i=Range1(1,5); println(i); end


1
2
3
4
5

Although many people like you and I are perfectly comfortable with 0-based or 1-based indexing, I suspect that people who only use very high level languages such as Matlab and R with no other exposure to programming otherwise will find 0 based indexing a bit unnatural. At least this is my experience working with scientists from various domains.

I do agree with the other comment by Konrad about forking the codebase and having too many dialects. I would request if we can try this out for some time. It would be great if we could figure out a way to keep everyone happy - and there seem to be some suggestions made on the list.

-viral

Noam Yorav-Raphael

unread,
Feb 25, 2012, 1:20:21 PM2/25/12
to juli...@googlegroups.com
You mean tau based indexing ;)
(see tauday.com)

Noam

Noam Yorav-Raphael

unread,
Feb 25, 2012, 1:36:08 PM2/25/12
to juli...@googlegroups.com
Hi Viral,
Indeed, I think that indexing should be 0-based and ranges should not include the last number.

I think that the situation can be summarized like this: 0-based, a<=i<b code will always be more elegant or equally elegant to 1-based, a<=i<=b code. However, it takes some effort for people used to 1-based code to get themselves used to 0-based code, and it takes some effort to convert algorithms written in 1-based notation to 0-based code.

You have to choose: make a special-purpose language familiar to Matlab and R programmers, or make a general-purpose language which will require some effort of them to adjust.

Noam

Viral Shah

unread,
Feb 25, 2012, 1:52:48 PM2/25/12
to juli...@googlegroups.com
Yes, you sum it up quite nicely.

Just want to say that it won't be me or the original julia developers that will choose, but it should be community driven. We have just started this journey, and I am sure that the julia community will come up with its own style of making decisions, implementing features, etc.

-viral

Jeff Bezanson

unread,
Feb 25, 2012, 3:33:51 PM2/25/12
to juli...@googlegroups.com
I think of it in terms of porting code. Porting python code to julia
is not something that ever occurred to me; it doesn't seem too useful.
But porting matlab code can potentially save you time and money. And
converting from 1-based to 0-based for complex numerical algorithms
can be a nightmare.

jonas.a...@gmail.com

unread,
Feb 26, 2012, 5:56:29 PM2/26/12
to juli...@googlegroups.com
Oh, the excitement when I found out about Julia! Not only is it the name of my sister, but finally someone is correcting the sorry state of my scientific computing, and what an impressive speed! I took for granted indexing would start at zero, that being the natural starting point for natural numbers and all. Oh, the disappointment! In fact, the main reason I want to do away with Matlab and Octave is their one-based indexing. I would have used Julia instead of them even if the performance situation was reversed (i.e., Matlab and Octave being much faster than Julia) if only Julia started indexing at zero.

Once we get zero-based indexing in Julia (I believe in you to do the right thing!), the language is saved. Then, perhaps, one might want to look at arrays indexed by integers (including negative numbers), rather than natural numbers. It could be useful for convolution kernels of non-causal systems, for example.


On Friday, February 24, 2012 4:18:54 PM UTC+1, Noam wrote:
This is related to the reddit discussion, which brought Dijkstra's argument that zero-length slices are ugly in inclusive ranges.

Allow me to link to said document, wherein Dijkstra beautifully argues that ranges for a natural number i should be written as "2 ≤ i < 13" (with the inequalities just so) and, as a corollary, concludes that array indexing should start at zero: EWD831

Jeff Bezanson

unread,
Feb 26, 2012, 6:11:29 PM2/26/12
to juli...@googlegroups.com
This is an important thing to think about, and we are thinking about
it. You may be right about this, but for now I will just disagree that
0-based indexing is more important than performance :)

Bill Hart

unread,
Feb 26, 2012, 6:51:31 PM2/26/12
to juli...@googlegroups.com
On 26 February 2012 23:11, Jeff Bezanson <jeff.b...@gmail.com> wrote:
> This is an important thing to think about, and we are thinking about
> it. You may be right about this, but for now I will just disagree that
> 0-based indexing is more important than performance :)
>

Not if you want the language to be popular. Look at Python. The
performance sucks, yet because of the stylish syntax it is one of the
most popular languages out there. In general people don't care about
performance, and even worse, sometimes can't tell the difference.
Flexibility and familiar syntax mean far more to a lot of people.

My opinion is different of course. Performance is everything for me!
Trendy syntax reminds me of a friend of mine who whilst watching Star
Trek exclaimed, "Oh look at me, I am an alien. I have superficial
cosmetic differences"!

Having said that, the established computer science trend (for general
purpose languages) is 0-based indexing and the mathematical trend (for
technical languages like Fortran and Matlab) is 1-based. Twenty years
ago when I seriously started programming I found 0-based indexing odd.
Now I have a 20 year habit that I'm not going to be able to kick and
large code bases all using 0-based indexing. Most experienced
programmers coming from general purpose languages will think
similarly.

I personally feel like the answer might be something like 0-based and
1-based modes, the 1-based mode being the default in line with the
original target of the Julia language as a Matlab replacement. I
haven't look at quite how the language is implemented, but this would
presumably only affect the parser.

Bill.

Jason Nielsen

unread,
Feb 26, 2012, 7:13:34 PM2/26/12
to juli...@googlegroups.com
I think the julia devs have made very good decisions in terms of syntax given this languages goals.  Quite honestly I've programmed more than a couple of lines of C, common lisp, Fortran, Matlab, Python/Numpy/SciPy and R and indexing doesn't bother me too much.  To be honest the only place that I find indexing an issue is when using Numpy since in the linear algebra context I find 0-based indexing annoying.  That said I suspect this is primarily because I had programmed quite a bit in Fortran, Matlab and R before trying Numpy but to be honest any language that supports syntax for slicing multi-dimensional arrays makes 0-based indexing rather unappetizing to me.

All this aside the devs have made some choices about syntax and I think they should stick with their initial decisions as making any serious changes as time goes on becomes progressively more difficult and unproductive... including spending their precious time responding to this thread when they could be using their talents to move julia forward.

In truth if I were to pick a syntax the best choice would be a type-inferenced lisp/scheme sitting on top of llvm (yummy!)... that would be great but I doubt anyone would flock to julia in that case.  I could of course list of fa bunch of good reason and quote a bunch of smart people as to why that would be the ideal choice... but I'll leave it at that for another post.... not!  

Jason

Jonathan Dursi

unread,
Feb 26, 2012, 7:14:55 PM2/26/12
to juli...@googlegroups.com
On 26 Feb 6:51PM, Bill Hart wrote:

> I personally feel like the answer might be something like 0-based and
> 1-based modes,

That way lies madness. Imagine tracking a weird numerical bug through
several routines, some of which use 0-based "mode" and some of which use
1.

I'm not just speculating here; I work at an HPC centre and I see a lot
of modern fortran code, some of which uses the adjustable lower bounds
option, where you can specify the lower bound on an array-by-array basis.

It seems like such a useful, reasonable feature on paper, especially for
things like applying stencils, etc. But I promise you I've never seen
use of this feature solve more readability problems than it causes.

I think the julia prime movers have to pull a Steve Jobs on this one;
make a design choice and stick with it. People telling you what they
think they want is useless; if people knew what they wanted, they'd have
it already.

0 or 1 is fine by me. (Anyone who has passionate feelings about how a
language labels the first item of a list/array needs to rethink how they
live their lives.) If you are going for the underserved
scientific/technical computing market, then 1 is a perfectly sensible
choice; as someone upthread succinctly put it, they're indices, not
offsets. If you want people to pull their matlab or R code over to
julia -- and again, working at an HPC centre, every time you convert
someone away from matlab, an angel gets their wings -- then 1 is a good
tactical choice. If you're more worried about looking pythonic, then 0
works best.

- Jonathan

--
Jonathan Dursi <ljd...@gmail.com>

Jeff Bezanson

unread,
Feb 26, 2012, 7:18:31 PM2/26/12
to juli...@googlegroups.com
Well, we're not going to settle this now for sure, but see my previous
comments. Given "f(0)", you can't tell whether the zero represents an
index. So this is not a parser tweak.

0-based indexing isn't the heart of python's stylish syntax.

As far as python, it would make more sense to talk about a
julia<->python calling interface instead of changing core decisions to
match up with python.

Won g'Faere

unread,
Feb 26, 2012, 7:41:19 PM2/26/12
to juli...@googlegroups.com
On Sun, Feb 26, 2012 at 7:14 PM, Jonathan Dursi <ljd...@gmail.com> wrote:
On 26 Feb 6:51PM, Bill Hart wrote:

I personally feel like the answer might be something like 0-based and
1-based modes,

That way lies madness.  Imagine tracking a weird numerical bug through several routines, some of which use 0-based "mode" and some of which use 1.


Yes, I agree.  Both would be worse than making a choice.

Having both origins was tried in the APL language and it was a failure since what happened was that programmers started writing their programs so that they worked in both modes which just added more effort and complexity to all programs needlessly. 

BTW, check out this video on 0-origin as it relates to APL:

Bill Hart

unread,
Feb 26, 2012, 8:14:46 PM2/26/12
to juli...@googlegroups.com
On 27 February 2012 00:18, Jeff Bezanson <jeff.b...@gmail.com> wrote:
> Well, we're not going to settle this now for sure, but see my previous
> comments. Given "f(0)", you can't tell whether the zero represents an
> index. So this is not a parser tweak.

Yes, I see the problem now. The base of the indexing needs to be a
property of the array, not of the [].

>
> 0-based indexing isn't the heart of python's stylish syntax.

No, obviously not. I only meant that (unfortunately) it is often
stupid irrelevant things that make languages popular. Aiming to be
really popular is perhaps not a worthy aim. Lots of really terrible
things are popular.

>
> As far as python, it would make more sense to talk about a
> julia<->python calling interface instead of changing core decisions to
> match up with python.

I don't personally use Python regularly, but I could see the sense in
that. It's become a fairly popular language for technical computing.

Bill.

Stefan Karpinski

unread,
Feb 26, 2012, 9:16:28 PM2/26/12
to juli...@googlegroups.com
Let me try to briefly convey how I learned to stop worrying and love 1-based indexing. Basically it comes down to the fact that there are three significant numbers you have to think about when it comes to an array: it's initial index, final index, and length. In 0-based indexing, these are all different: 0, n-1, and n. In 1-based indexing, the final index and length are the same: 0, n, n. That's one less thing to think about and keep track of when coding. It seems trivial, but when you're doing something tricky and juggling a lot of complicated things in your head, even the tiniest alleviation of cognitive load helps, I find.

The way Python does things where v[a:b] is a slice of length b-a only works because their slices don't include the final endpoint — which strikes me as very unintuitive and strange. I get why it's done that way: it's precisely so that they have this property. But still, it's confusing. It's more acceptable because Python slices aren't first-class objects — they're just notation. Having a:b by itself be the the range containing a, a+1, ..., b-1 as a standalone object seems weirder still. The diagram in Noam's original post is great, but the very fact that you *need* a diagram to explain how this works in Python seems like a sign that maybe it's not very intuitive. You don't need to explain 1-based indexing to anyone and you certainly don't need a diagram.

Then there's the fact that Julia looks and feels a lot like Matlab. That's intentional — it's a tactical move that will help it appeal to and spread among non-computer-science academics who already have exposure to Matlab. The choice is also due to the fact that Matlab really does get a lot of the syntax and semantics for linear algebra right. Plain Python, on the other hand, does *not* at all get the semantics for linear algebra right — nor was it designed to. NumPy is tacked on and has to completely change the behavior of arrays to make things work — hence the major distinction between Python arrays and NumPy vectors and matrices. Since numerical linear algebra is in so many ways the heart and soul of scientific computing, getting that part right is essential. When it comes to stealing array semantics from Matlab versus Python, it's a no-brainer: you steal from the language that has ruled numerical linear algebra for decades, not the one that needs to add incompatible array semantics in order to be able to do linear algebra reasonably. Given the syntactic and semantic similarity to Matlab, Matlab is really the only language that is possibly going to be trivial to port code from — and there's tons of really useful Matlab scientific codes out there, waiting to be ported and contributed to Julia's ecosystem. If we used 0-based indexing, then that one advantage would be out the window: linear algebra code from Matlab would never be trivial to port — it would always be a massive pain in the ass.

Patrick O'Leary

unread,
Feb 26, 2012, 10:50:17 PM2/26/12
to juli...@googlegroups.com
On Sun, Feb 26, 2012 at 20:16, Stefan Karpinski <ste...@karpinski.org> wrote:
> Then there's the fact that Julia looks and feels a lot like Matlab. That's
> intentional — it's a tactical move that will help it appeal to and spread
> among non-computer-science academics who already have exposure to Matlab.
> The choice is also due to the fact that Matlab really does get a lot of the
> syntax and semantics for linear algebra right. Plain Python, on the other
> hand, does *not* at all get the semantics for linear algebra right — nor was
> it designed to. NumPy is tacked on and has to completely change the behavior
> of arrays to make things work — hence the major distinction between Python
> arrays and NumPy vectors and matrices. Since numerical linear algebra is in
> so many ways the heart and soul of scientific computing, getting that part
> right is essential. When it comes to stealing array semantics from Matlab
> versus Python, it's a no-brainer: you steal from the language that has ruled
> numerical linear algebra for decades, not the one that needs to add
> incompatible array semantics in order to be able to do linear algebra
> reasonably. Given the syntactic and semantic similarity to Matlab, Matlab is
> really the only language that is possibly going to be trivial to port code
> from — and there's tons of really useful Matlab scientific codes out there,
> waiting to be ported and contributed to Julia's ecosystem. If we used
> 0-based indexing, then that one advantage would be out the window: linear
> algebra code from Matlab would never be trivial to port — it would always be
> a massive pain in the ass.

In support of Stefan's point, porting MATLAB code with intent to
contribute to the ecosystem is exactly what I'm doing right now, and
it's going quite smoothly thus far, even if what I have isn't
completely idiomatic Julia yet. (WIP:
https://github.com/pao/Julia/tree/odefuns)

I've attempted to use NumPy, and it feels wrong--but not because of
indexing; I have no problem with the indexing while writing Python. As
a whole NumPy just feels tacked on (which, in a sense, it is). I know
plenty of people are perfectly happy with it, but at work when you
have MATLAB sitting there waiting to be used with its spiffy IDE and
better performance and it's somewhere I'm comfortable, there's no
contest.

But if I can trivially port that code to Julia, and start
parallelizing it without giving MathWorks more money? Or write
performance-critical code without limiting myself to the Embedded
MATLAB Subset, then forgetting to regenerate and recompile when I make
a change? Interface with existing C code without generating a wrapper
with Legacy Code Tool or writing tedious MEX boilerplate? That's a
pretty good business plan. That's what excites me about Julia.

--
Patrick O'Leary
patrick...@gmail.com

Fredrik Johansson

unread,
Feb 27, 2012, 7:42:00 AM2/27/12
to juli...@googlegroups.com
Hello everybody,

I'd like to offer my two cents on this simultaneously trivial and
important issue (you can never spend too much time arguing about
arbitrary conventions). To be perfectly clear, the following is not
meant to suggest that Julia should do one thing or the other, but just
to provide some arguments and ideas that don't seem to have been
brought up yet.

I have used Matlab and Mathematica (another language using one-based
indexing) as well as C and Python, all fairly extensively, and feel
comfortable with both conventions.

My general opinion is that zero- or one-based indexing doesn't matter.
What's most important is that the language provides constructs that
are powerful and consistent with the convention used. If the standard
way to denote a range of numbers is the Matlab style 1:n, then that
will obviously make 1-based indexing seem slightly nicer because you
don't need to subtract 1 from the length to write the range. But in
Python, range(n) or [:n] works out just as nicely. (Saying that one
set of notations/functions is a poor fit for the other's indexing
convention is rather begging the question.)

I'd like to be able to use functional constructs like map, reduce and
list comprehensions to avoid indices when possible; when indices are
necessary, something like Python's enumerate(), perhaps with slightly
more compact syntax, would be nice. Often, when iterating over a
collection of items, what you want to express is "all the indices of
this collection", not 1:n or 0:n-1 specifically -- with "all the
indices", the same code might even work automatically with hash
tables, sparse arrays, etc. as input. Functions like first(), rest()
and last() can provide some level of abstraction for common indexing
operations. Of course, you can't avoid indices entirely, and quite
often plain indices are much clearer than any other way of writing the
code.

I happen to agree with Dijkstra's argument, although I don't think
it's strong enough to make the case by itself. I can think of two
other, practical arguments in favor of zero-based indexing.

The first is that indexing strided, blocked, or multidimensional data
is more elegant. Adding one dimension is just a mul-add, and removing
one dimension is just a div-rem:

array[i*n+j]
array[i/n][i%n]

With one-based indexing, you need extra adjustments:

array[(i-1)*n+j]
array[(i-1)/n+1][(i-1)%n+1]

This is a vital simplification in a language like C where you often
end up doing pointer arithmetic manually. It's somewhat less of a
nuisance in a language with high-level constructs for working with
multidimensional arrays, slices, etc., though it's still common enough
to have to do such indexing manually.

A related, convenient property is that the ranges of unsigned integer
types neatly correspond to the indices of arrays of the same size. A
rather common operation is to use a byte to look something up in a
table of size 256. Or, as an extreme case, indexing a table of
false/true cases can be done on the truth value (or parity of an
integer) directly (in a language where false == 0, true == 1). Again,
high-level constructs make such concerns somewhat less important.

The second argument is that for many types of data, the first index
corresponds to the value 0 in a natural way. When the data *does*
happen to start at 1, one can often just add a dummy entry in a
zero-based array to avoid making ugly adjustments to formulas; the
converse in one-based indexing is obviously not possible. Some
examples:

-- In histograms or similar, one is likely to include a zero count of something.

-- The first entry in a list of coefficients of a polynomial or Taylor
series is the coefficient of x^0. Ditto for many other mathematical
objects. (Actually, Matlab represents polynomials in reverse order,
wich is objectively in poor taste. Presumably, making the order
entirely wrong makes it seem like less of a wart than having them in
the right order but just off by one Any language aiming for
compatibility with this convention will be tainted, but oh well. ;-)

-- In signal processing or simulations, the first sample often
corresponds to time t = 0 or position x = 0, and the first entry in
the DFT of the same data corresponds to the frequency f = 0.
Translating between indices and coordinates (in real or Fourier
space), then, is just scaling by a factor, rather than a scaling and
an offset. This often simplifies formulas.

Those things said, as recently as yesterday, I was writing some C code
for solving a four-dimensional recurrence equation, which had been
formulated with 0 as the starting coordinates. But to accommodate
boundary conditions, it turned out to be convenient to insert extra
planes corresponding to the position -1. So zero-based indexing didn't
save me from having to insert a few offsets in the end. Another
problem was that the domain was triangular along one axis, so
allocating a single large cube would have used twice as much memory as
necessary. I ended up using nested arrays instead of just making a
single large array; I could have allocated a large array of half the
size and done triangular offsets with pointer arithmetic, but it would
probably have been tedious to debug.

At times like that, what would be convenient is to have some
higher-level functionality to be able to declare "arrays" in terms of
more general index sets than the usual hyperrectangles in N^n. For
example, it would be nice to be able to use arbitrary affine
constraints over the integers, writing something like

array(indices=(x,y,z: -1 <= x <= N, -1 <= y <= N, -1 <= z <= x))

and having the implementation take care of packing the data together
and doing all the offset calculations.

Of course, for prototyping one can just overallocate or use a hash
table, but often this isn't good enough (in my case, overallocating
would have used twice as much memory, and a hash table would have
taken several times as much space). The point is to have the data
packed together contiguously in memory, to get the performance
benefits of a plain array, but with complex indexing done
automatically and efficiently, taking advantage of JIT compilation.

Again, this is probably well beyond the scope of a basic array type
and semantics thereof, and more an idea for additional high-level
functionality (that might solve some issues related to zero vs one
based indexing). In Matlab, a common idiom is to use an arbitrary
array as an index set for another array. This is very general and
actually quite elegant, but also obviously inefficient. One can think
about ways to achieve similar functionality without the drawbacks.

If you support "arrays with an offset", you do end up with extra cases
that need handling. Most code shouldn't really need to become more
complicated, as the language should provide constructs for iteration
and indexing that make the offset irrelevant. But there will still
probably be ambiguous cases -- what happens if you attempt a pointwise
addition of two arrays of the same size but different offsets, for
example? (Though, making the offset explicitly part of the type can be
argued to be a good thing in such a case, to catch errors.) It's
probably not worth trying to provide such functionality for the basic
array type. When in doubt, simple is better than complex.

Anyway, as I said at the start of this long rant, zero or one based
indexing is just an arbitrary convention; a language should stick to
one convention and do it well. The most common use is to iterate from
one point to another, and off-by-one errors occur no matter what.
Besides, C and Python users are likely to be better programmers than
Matlab users, so they will have less trouble porting their code ;-)

Fredrik

Viral Shah

unread,
Feb 27, 2012, 7:47:21 AM2/27/12
to juli...@googlegroups.com
This is very well argued. I have myself come across these kinds of
adjustments of indexing in various different types of codes, similar
to the examples you provide.

The convincing argument is your closing sentence: "Besides, C and


Python users are likely to be better programmers than Matlab users, so

they will have less trouble porting their code". :-)

-viral

--
-viral

jonas.a...@gmail.com

unread,
Feb 27, 2012, 4:03:18 PM2/27/12
to juli...@googlegroups.com
On Monday, February 27, 2012 1:42:00 PM UTC+1, Fredrik Johansson wrote:

Besides, C and Python users are likely to be better programmers than
Matlab users, so they will have less trouble porting their code ;-)

Let us instead take the hint from the languages the better programmers are using!

Furthermore, I can't help beginning to think about arrays as functions whose domains are subsets of the set of all integers. That way, you could have a three-element array for which valid indices were, say, -3, -2 and 17, but nothing else. That's more general than what I have seen suggested in the thread so far (or possibly equivalent to how it works in some other language that has been referred to). This may or may not affect the ongoing discussion.

Jeff Bezanson

unread,
Feb 27, 2012, 5:21:59 PM2/27/12
to juli...@googlegroups.com
Of course sparse index sets and such are valuable. But the purpose of
the standard Array type is to be a dirt-simple dense array, as close
to "char*" as we dare. That way other things can be efficiently
layered on top. I can imagine very elaborate array models where you
have metadata (e.g. "missing" codes), arbitrary index sets, named
dimensions and indexes, distributions, compressed storage, etc.

Noam Yorav-Raphael

unread,
Feb 28, 2012, 4:37:25 AM2/28/12
to juli...@googlegroups.com
Hi,

Just wanted to say that I'm glad that my post brought such an interesting discussion. I think that no matter what the decision will be, it will be much more well thought of.

About converting algorithms from 1-based to 0-based: Indeed, automatically converting 1-based to 0-based code is a problem. However, if you explicitly mark indices it can be much easier. Let's assume julia was 0-based and had a<=i<b ranges. I would define two functions:

function i1(i) # convert 1-based index to 0-based index
  i-1
end

function irange(a, b) # inclusive range
  a:b+1
end

And then, in my julia code, define all indexes and ranges using these functions. Then it would be easy to write a tool that would convert the code to a code that doesn't use these functions. You could even have a tool that checks that you haven't forgotten to use these functions by parsing the code and making sure that all variables that go into ranges were set using i1(), and no ranges were defined without irange().
<