3,819 views

Skip to first unread message

Feb 23, 2012, 7:06:36 PM2/23/12

to juli...@googlegroups.com

Hello,

My name is Noam Yorav-Raphael. I found out about Julia two days ago and got really excited - I stayed up until 4am and read the manual and experimented.

I want to add my word about why I think 1-based indexing is wrong, in the hope that there's some chance that you'll change it. (I tried to reply to the thread, but couldn't)

One reason is that it will make Julia a scientific-only language. All of today's general-purpose languages are zero-based. Using 1-based indexes will mean that it will be cumbersome to interface with many things written in other languages, and that people who are looking for a general-purpose language will be skeptical about Julia. I'm using Python a lot, and numpy/scipy/matplotlib quite a bit, and I find the fact that you can do both scientific stuff and other stuff (GUI, web, files, DB) a great advantage. When I read about Julia I thought, "Wow, a language better than Python and Matlab!" With 1-based indexing I'm afraid it will just be better than Matlab.

The other reason is that I think that 0-based indexing, with (start <= i < stop) ranges are great. The moment I saw the following diagram, when learning Python, I felt that suddenly years of adding and subtracting ones have come to an end.

0 1 2 3 4 5

| | | | | |

+---+---+---+---+---+

| a | b | c | d | e |

+---+---+---+---+---+

The diagram shows an array with 5 elements, with a number line on top of it. Instead of numbers specifying cells (which have size), they specify places - just like coordinates do. So a[2] means "the value which comes right after 2", which is ok. It shines when you realize that a[2:4] means "cut the rectangle between places 2 and 4 and take what's in between". It's so clear! You want to take the last three elements of the array? Just use a[len(a)-3:len(a)] and it's obvious why it works. Compare this to a[len(a)-2:len(a)]. And I'm not talking about Python's a[-3:] shortcut, which wouldn't make sense in 1-based. You want to drop the first three elements? It's a[3:len(a)], not a[4:len(a)]. You just don't need to do plus ones and minus ones all the time.

And generally, about counting from 0 --

Why does the 18th century include all the year that start with 17--, except for 1700 which belongs to the 17th century and 1800 which belongs to the 18th? Because we count starting with 1. If the first year since Jesus' fourth birthday was called year 0, and the first century since was called the 0th century, the century number 17 would simply include the years 1700-1799.

What I'm saying is that the only advantage of 1-based indexing is that we're used to it. Besides that, it just causes trouble. Starting to count from 0 is the right thing to do. That's the reason why the assembly code to get the third element of an array will always include the number 2 - it's not some computer-sciency stuff. It's because its *real* index is 2, not 3.

Mathematicians use 1-based indices when it doesn't matter. Whenever they use 1-based indexing, they could use 0-based indexing and it will work just as fine. However, try to use 1-based indexing to index polynomial coefficients (a_0 + a_1*x + a_2*x^2) or simply digits (123 = 1*10^2 + 2*10^1 + 3*10^0) - it just won't work.

Sometimes, using 1-based indices won't hurt you. Sometimes it will. Using 0-based indices will always work. That's why 0-based arrays in Python+numpy work fine. That's why older programming languages (fortran, cobol, algol, not including lisp) use 1-based indices and today all common languages are 0-based (*). You'd have thought that since people tend to stick with the familiar, all programming languages today would be 1-based. But in spite of people used to counting from 1, and people used to programming languages counting from 1, people moved to languages that count from 0. Isn't that a sure sign that there were enough problems with 1-based so people moved to 0-based?

So, if you want a language that would be more than a Matlab replacement, and I think that you do, go with 0-based indexing. It will require some getting used to from Matlab programmers (and after that it'll be fine), but it will let you join the wider programming community.

I hope I wasn't rude. The idea of writing your own ultimate language really excited me, so when I spotted something I thought is wrong I had to try to convince you to fix it :)

Best luck,

Noam

Feb 23, 2012, 7:46:48 PM2/23/12

to juli...@googlegroups.com

I must say, this is a very nice argument. The only problem is that

ultimately, I can't believe it is so important. You cannot seriously

argue that it's impossible to write a gui or database with 1-based

indexing.

ultimately, I can't believe it is so important. You cannot seriously

argue that it's impossible to write a gui or database with 1-based

indexing.

Also, it is important to realize that julia is very unlike matlab and

python, in that the language imposes very little. The language only

provides a type system, function definition, and a compiler. You can

define other types that use 0-based indexing. You could even fork our

standard library (which is 100% julia) and change it to 0-based

indexing. That is not terribly practical, but the important thing here

is the standard library, and you can write alternate standard

libraries for julia that would put a very different "skin" on the

language, with the same performance.

I'm more of a C programmer than matlab programmer myself, and I

understand the index arithmetic issues. I don't believe simply picking

0 or 1 can solve all indexing problems or eliminate off-by-one errors.

I got used to 1-based quickly and it doesn't bother me any more.

Thanks for all the positive impressions of julia, it means a lot!

-Jeff

Feb 23, 2012, 9:13:18 PM2/23/12

to juli...@googlegroups.com

I also found Julia the other day and think 1-based is a problem for

me. But, I think there is another way to do 0-based indexing.

me. But, I think there is another way to do 0-based indexing.

Julia can eval its own code. Therefore it is possible to present a

front-end which simply converts all 0-based indexing to 1-based

indexing. I don't even think it would be that hard to do, and I have

considered doing it for my own purposes. It shouldn't be inefficient.

I also think the "end" statement is largely superfluous (the "dangling

else" problem notwithstanding) and I had thought to make them optional

in any such front end I wrote for myself.

I am not working on this at the moment though. The first priority is a

decent bignum interface with GMP. I've had a bit of a play with that,

and despite some things that aren't quite clear to me yet, it seems to

be pretty easy to write a first approximation of this. Julia probably

even makes it possible to do a twos complement fixed precision bignum

interface using GMP's mpn interface. I don't know of any other

languages that would make that even vaguely efficient.

Julia presents so many possibilities. I have also thought about

writing an optional statically typed front end with type inference.

Julia seems like the perfect language to implement such a project.

Where has Julia been this past two years!? I had searched for it high

and low, day and night, to the point of nearly driving myself insane.

It's not even listed here:

http://llvm.org/ProjectsWithLLVM/

Bill.

Feb 23, 2012, 9:14:33 PM2/23/12

to juli...@googlegroups.com

There is this tension between julia's general purpose nature and its

specific target for technical computing. I could give examples of how

mathematical code look much nicer with 1-based indexing for

multi-dimensional arrays. The thing with 0-based indexing is that you

always then have to write code of the type for i=0:len(a)-1 when

iterating. Either ways, there is always some indexing arithmetic.

specific target for technical computing. I could give examples of how

mathematical code look much nicer with 1-based indexing for

multi-dimensional arrays. The thing with 0-based indexing is that you

always then have to write code of the type for i=0:len(a)-1 when

iterating. Either ways, there is always some indexing arithmetic.

I agree with Jeff that even though I was not sure about 1-based

indexing, it very quickly grew on me, and I do feel it is the right

choice. Even though we can easily support arbitrary base indexing with

some effort, accommodating it will slow down things - and we still

need our indexing to become faster than it is.

-viral

--

-viral

Feb 23, 2012, 9:18:53 PM2/23/12

to juli...@googlegroups.com

Julia was mentioned here when LLVM 3.0 released.

http://www.llvm.org/releases/3.0/docs/ReleaseNotes.html

-viral

--

-viral

Feb 23, 2012, 9:24:54 PM2/23/12

to juli...@googlegroups.com

GMP support would be really useful to have!

I guess I am not that religious about the use of end for function

definitions, but it would make it difficult to then define inner

functions. It would be useful to have a discussion on syntax, and how

to improve it.

We could think of additional syntax for supporting 0 or arbitrary

based indexing, but then there is always the question of picking a

default. We need to think harder about this issue of supporting more

general indexing without giving up the elegance or performance that we

currently have.

-viral

On Fri, Feb 24, 2012 at 7:43 AM, Bill Hart <goodwi...@googlemail.com> wrote:

--

-viral

Feb 24, 2012, 4:13:51 AM2/24/12

to juli...@googlegroups.com

2012/2/24 Viral Shah <vi...@mayin.org>

GMP support would be really useful to have!

+1

I guess I am not that religious about the use of end for function definitions, but it would make it difficult to then define inner functions. It would be useful to have a discussion on syntax, and how to improve it.

à la python with 2 spaces indent as an alternate syntax ??

but END is not really a big problem.

We could think of additional syntax for supporting 0 or arbitrary based indexing, but then there is always the question of picking a default. We need to think harder about this issue of supporting more general indexing without giving up the elegance or performance that we

currently have.

Julia is now V1.0 so please do not change the rules, keep index 1 based.

Just my point

Yvan

Feb 24, 2012, 8:07:21 AM2/24/12

to juli...@googlegroups.com

Bill Hart writes:

> Julia can eval its own code. Therefore it is possible to present a

> front-end which simply converts all 0-based indexing to 1-based

> indexing. I don't even think it would be that hard to do, and I have

> considered doing it for my own purposes. It shouldn't be inefficient.

>

> I also think the "end" statement is largely superfluous (the "dangling

> else" problem notwithstanding) and I had thought to make them optional

> in any such front end I wrote for myself.

From a purely technical point of view you are right, and the idea of

writing your own personal syntax for Julia using its metaprogramming

features will certainly tempt many programmers, in particular those

coming from a Lisp background.

But please consider the consequences: the Julia community will become

fragmented as no one will be at ease reading someone else's code, even

though everything works fine together. That's in my opinion what

prevented Lisp from gaining serious momentum: The number of dialects

is hard to estimate, and then on top of each dialect there are dozens

of personal adaptations in the form of macro libraries.

Konrad.

Feb 24, 2012, 10:18:54 AM2/24/12

to juli...@googlegroups.com

On Fri, Feb 24, 2012 at 2:46 AM, Jeff Bezanson <jeff.b...@gmail.com> wrote:

I must say, this is a very nice argument.

Thanks!

The only problem is that

ultimately, I can't believe it is so important. You cannot seriously

argue that it's impossible to write a gui or database with 1-based

indexing.

It's surely possible. What I meant was that using other libraries that handle that stuff (for example, GTK) will be cumbersome since you'll have to use 1-based for things written for julia and 0-based for wrapped libraries (or try to convert the indices at the wrapper level, which will have its share of complications.) I think that a great deal of Python's popularity comes from the fact that it's easy to wrap C and C++ libraries, which makes it a great "glue language".

Also, it is important to realize that julia is very unlike matlab and

python, in that the language imposes very little. The language only

provides a type system, function definition, and a compiler. You can

define other types that use 0-based indexing. You could even fork our

standard library (which is 100% julia) and change it to 0-based

indexing. That is not terribly practical, but the important thing here

is the standard library, and you can write alternate standard

libraries for julia that would put a very different "skin" on the

language, with the same performance.

The fact that most of julia is written in julia itself was one of the things that got me excited. All of Python's basic types are written in C because writing them in Python would be inefficient. Not having to go to another language whenever you need some more performance is a great gain.

It's great that changing the indexing is a matter of editing the standard library and not the core language. It makes the move to 0-based more feasible. However, I don't think that having two standard libraries is a good idea. It will divide the community. I think that's one of the reasons why the D language hasn't gotten much traction: it has two standard libraries. If I fork the standard library, I will actually be creating a new language, with its own separate community. I don't want to do that.

I think that at this point you, the core developers of the language, have to decide whether to continue with 1-based indexing or to move to 0-based. There's still not a lot of external code base that will have to be converted, so it's still possible. It will require a substantial effort to port the whole standard library, but it's doable. I hope I'll be able to help if you decide of this.

1-based indexing is certainly possible. But I predict it will create a barrier between you and the rest of the programming community. Julia will be used and developed mainly by those who are using Matlab now, and the rest, who are used to C/Python/Java/Javascript/Ruby/Perl will mostly stay outside. Both because people will have to get themselves used to another mental model (and an inferior one, sorry to say), and because the interfaces between the languages will be clunky.

Why not join the rest of the programming community? It's certainly possible - Python is a great general-purpose language, and numpy/scipy/matplotlib make it a very decent technical programming language. There are a lot of people who are interested in a high level programming language which is also efficient. The more people you embrace, the more chance your language has to survive and prosper.

I'm more of a C programmer than matlab programmer myself, and I

understand the index arithmetic issues. I don't believe simply picking

0 or 1 can solve all indexing problems or eliminate off-by-one errors.

I got used to 1-based quickly and it doesn't bother me any more.

I don't think it will eliminate all errors, but I do believe that it will make all programs either more elegant and less error-prone, or as elegant as they were before.

Thanks for all the positive impressions of julia, it means a lot!

You deserve it!

Thanks for listening,

Noam

P.S. I have to add another reason for 0-based which I thought of. You can safely ignore it.

In Python you can use the slice syntax in assignments. The following works:

>>> a = "abcde"

>>> a[2:4]

"cd"

>>> a[2:4] = 'X'

>>> a

"abXe"

(It actually doesn't work, because strings are immutable. But it works with lists, and the string syntax will keep me from a lot of typing)

This you can translate easily to 1-based. But how do you insert a string in the middle of a string? It's simple:

>>> a = "abcde"

>>> a[2:2] = "HELLO"

>>> a

"abHELLOcde"

Remember that 2 is the line between "b" and "c", so that's exactly what you'd expect. I guess that the 1-based translation to the inserting operation is:

>>> a[3:2] = "HELLO"

which I find incomprehensible. And prepending a string is a[1:0] = "prefix", which gives you a zero index!

This is related to the reddit discussion, which brought Dijkstra's argument that zero-length slices are ugly in inclusive ranges. People said that it doesn't matter, since zero-length slices aren't meaningful. So here you see they are - they give you a great syntax for inserting a sequence into another.

Feb 24, 2012, 10:23:44 AM2/24/12

to juli...@googlegroups.com

On Fri, Feb 24, 2012 at 4:14 AM, Viral Shah <vi...@mayin.org> wrote:

There is this tension between julia's general purpose nature and its

specific target for technical computing. I could give examples of how

mathematical code look much nicer with 1-based indexing for

multi-dimensional arrays. The thing with 0-based indexing is that you

always then have to write code of the type for i=0:len(a)-1 when

iterating. Either ways, there is always some indexing arithmetic.

Can you give some concrete examples? I believe that the properly translated code would be either simpler or of the same complexity as 1-based code.

For example, the code you mentioned will be "for i=0:len(a)", since ranges do not include the stop index. You see? No indexing arithmetic!

Noam

Feb 24, 2012, 11:08:48 AM2/24/12

to juli...@googlegroups.com

Noam Yorav-Raphael writes:

> On Fri, Feb 24, 2012 at 4:14 AM, Viral Shah <vi...@mayin.org> wrote:

>

> There is this tension between julia's general purpose nature and its

> specific target for technical computing. I could give examples of how

> mathematical code look much nicer with 1-based indexing for

> multi-dimensional arrays. The thing with 0-based indexing is that you

> always then have to write code of the type for i=0:len(a)-1 when

> iterating. Either ways, there is always some indexing arithmetic.

>

> Can you give some concrete examples? I believe that the properly translated code would

> be either simpler or of the same complexity as 1-based code.

I suspect the reason for 1-based indexing in Fortran and Matlab comes

from the pervasive use of 1-based indexing in mathematics. Grab any linear

algebra textbook; the sums over element of vectors or matrices always

run from 1 to N.

That said, I prefer 0-based indexing as well, coming from a C and

Python background, but I don't think it's such a big issue.

Konrad.

Feb 24, 2012, 3:40:43 PM2/24/12

to juli...@googlegroups.com

On Thu, Feb 23, 2012 at 6:24 PM, Viral Shah <vi...@mayin.org> wrote:

> We could think of additional syntax for supporting 0 or arbitrary

> based indexing, but then there is always the question of picking a

> default. We need to think harder about this issue of supporting more

> general indexing without giving up the elegance or performance that we

> currently have.

> We could think of additional syntax for supporting 0 or arbitrary

> based indexing, but then there is always the question of picking a

> default. We need to think harder about this issue of supporting more

> general indexing without giving up the elegance or performance that we

> currently have.

As they say, all problems can be solved by another layer of

abstraction. If Julia efficiently supports and promotes sequence

manipulation using higher-order functions and/or generic abstractions

such as iterators or D-style ranges, then the underlying indexing base

becomes less of an issue.

-Joe

Feb 24, 2012, 4:23:47 PM2/24/12

to juli...@googlegroups.com

Good point. I think we will need a small design iteration at some

point around abstracting indexes and iteration over them a bit more.

As for automatically converting code to a different index base, I

don't think it's that easy. First, many uses of brackets might refer

to other data structures such as hash tables. At the same time,

function arguments might be indexes semantically, with no easy way to

tell --- f(0) might need to become f(1).

The only realistic approach, just to play along for a bit, is to

redefine ref and assign in array.j, at a minimum. The next problem is

all the library code that uses "for i=1:n", which I admit is a

weakness and should be more abstract.

Feb 24, 2012, 5:06:19 PM2/24/12

to juli...@googlegroups.com

I really can't understand why many programmers think about a new

language like this: "Hey cool new language, please make it like my old

favorite language or I won't use it".

language like this: "Hey cool new language, please make it like my old

favorite language or I won't use it".

I also recently discovered Julia and read the manual and played with

it some. And to get it out of the way at the beginning: I like what

you did so far -- Julia will definitely be in my toolbox from now on.

Now to the complaints about syntax and indexing:

* Syntax is not as important as many programmers think, its semantics

you should care more about.

Syntax is just something to get used to and if you don't think so

learn more programming languages until you don't.

What I'm not saying is that thinking about syntax is not important for

creating a new language. I'm only talking about using. I see the

same arguments again and again in different programming languages

mailing lists, from small tweaks wanted to the syntax sucks totally.

But even the small tweaks usually come with hidden pitfalls.

That said there is one property of programming language syntax I care

about: it shouldn't be too verbose. I see this like how much of my

finger joints do I have to use up to get a piece of code done (If you

don't know what I'm talking about: you will if you stay coding through

your life).

To conclude this rambling: you did well with the syntax IMHO, not too

verbose, concise and powerful.

* Indexing: well there are arguments for 0 based arrays and there are

arguments for 1 based arrays. But there is definitely no *right* way

to do this. One possible cop out is having it both ways or even have

arrays with indices from 2:5 or even negative.

0-based: good for some things, pretty hard to translate a numerical

algorithm to 0-based arrays. Getting a numerical algorithm implemented

right is hard enough as it is.

1-based: good for other things -- especially mathematical

calculation as mentioned, maybe a bit of confusion when interfacing to

the external (non Fortran/Matlab) world but this interfacing doesn't

have to be done with arrays if this should be a problem.

n-based-settable: well thats the cop out but is it really worth the

confusion? Never knowing where the arrays start ...

Talking about 0-based "like all the other languages" is not very

accurate since there are plenty of languages that have 1-base arrays.

Or different bases for different data structures like 1-based of list

and 0-based for buffers.

Again with indices like Julia the way it is and I don't want all my

programming languages the same ... otherwise why learn a new one.

Keep up the good work.

Cheers

-- Peer

Feb 24, 2012, 5:11:46 PM2/24/12

to juli...@googlegroups.com

Good points, and thank you for the kind words.

I think as a compromise we should switch to 2-based indexing :D

I think as a compromise we should switch to 2-based indexing :D

Feb 25, 2012, 1:36:45 AM2/25/12

to juli...@googlegroups.com

I propose we switch to pi based indexing.

-viral

Feb 25, 2012, 1:41:36 AM2/25/12

to juli...@googlegroups.com

That gives me an idea: index arrays with polar coordinates!

Feb 25, 2012, 2:03:36 AM2/25/12

to juli...@googlegroups.com

Hi Noam,

Putting jokes on indexing aside, ranges in julia (matlab also) do include the endpoints. In fact, for writing scientific codes, I find this to be rather convenient.

julia> for i=1:5; println(i); end

1

2

3

4

5

julia> for i=Range1(1,5); println(i); end

1

2

3

4

5

Although many people like you and I are perfectly comfortable with 0-based or 1-based indexing, I suspect that people who only use very high level languages such as Matlab and R with no other exposure to programming otherwise will find 0 based indexing a bit unnatural. At least this is my experience working with scientists from various domains.

I do agree with the other comment by Konrad about forking the codebase and having too many dialects. I would request if we can try this out for some time. It would be great if we could figure out a way to keep everyone happy - and there seem to be some suggestions made on the list.

-viral

Feb 25, 2012, 1:20:21 PM2/25/12

to juli...@googlegroups.com

You mean tau based indexing ;)

Feb 25, 2012, 1:36:08 PM2/25/12

to juli...@googlegroups.com

Hi Viral,

Indeed, I think that indexing should be 0-based and ranges should not include the last number.

I think that the situation can be summarized like this: 0-based, a<=i<b code will always be more elegant or equally elegant to 1-based, a<=i<=b code. However, it takes some effort for people used to 1-based code to get themselves used to 0-based code, and it takes some effort to convert algorithms written in 1-based notation to 0-based code.

You have to choose: make a special-purpose language familiar to Matlab and R programmers, or make a general-purpose language which will require some effort of them to adjust.

Noam

Feb 25, 2012, 1:52:48 PM2/25/12

to juli...@googlegroups.com

Yes, you sum it up quite nicely.

Just want to say that it won't be me or the original julia developers that will choose, but it should be community driven. We have just started this journey, and I am sure that the julia community will come up with its own style of making decisions, implementing features, etc.

-viral

Feb 25, 2012, 3:33:51 PM2/25/12

to juli...@googlegroups.com

I think of it in terms of porting code. Porting python code to julia

is not something that ever occurred to me; it doesn't seem too useful.

But porting matlab code can potentially save you time and money. And

converting from 1-based to 0-based for complex numerical algorithms

can be a nightmare.

is not something that ever occurred to me; it doesn't seem too useful.

But porting matlab code can potentially save you time and money. And

converting from 1-based to 0-based for complex numerical algorithms

can be a nightmare.

Feb 26, 2012, 5:56:29 PM2/26/12

to juli...@googlegroups.com

Oh, the excitement when I found out about Julia! Not only is it the name of my sister, but finally someone is correcting the sorry state of my scientific computing, and what an impressive speed! I took for granted indexing would start at zero, that being the natural starting point for natural numbers and all. Oh, the disappointment! In fact, the main reason I want to do away with Matlab and Octave is their one-based indexing. I would have used Julia instead of them even if the performance situation was reversed (i.e., Matlab and Octave being much faster than Julia) if only Julia started indexing at zero.

Once we get zero-based indexing in Julia (I believe in you to do the right thing!), the language is saved. Then, perhaps, one might want to look at arrays indexed by integers (including negative numbers), rather than natural numbers. It could be useful for convolution kernels of non-causal systems, for example.

On Friday, February 24, 2012 4:18:54 PM UTC+1, Noam wrote:

Allow me to link to said document, wherein Dijkstra beautifully argues that ranges for a natural number i should be written as "2 ≤*i* < 13" (with the inequalities just so) and, as a corollary, concludes that array indexing should start at zero: EWD831

Once we get zero-based indexing in Julia (I believe in you to do the right thing!), the language is saved. Then, perhaps, one might want to look at arrays indexed by integers (including negative numbers), rather than natural numbers. It could be useful for convolution kernels of non-causal systems, for example.

On Friday, February 24, 2012 4:18:54 PM UTC+1, Noam wrote:

This is related to the reddit discussion, which brought Dijkstra's argument that zero-length slices are ugly in inclusive ranges.

Allow me to link to said document, wherein Dijkstra beautifully argues that ranges for a natural number i should be written as "2 ≤

Feb 26, 2012, 6:11:29 PM2/26/12

to juli...@googlegroups.com

This is an important thing to think about, and we are thinking about

it. You may be right about this, but for now I will just disagree that

0-based indexing is more important than performance :)

it. You may be right about this, but for now I will just disagree that

0-based indexing is more important than performance :)

Feb 26, 2012, 6:51:31 PM2/26/12

to juli...@googlegroups.com

On 26 February 2012 23:11, Jeff Bezanson <jeff.b...@gmail.com> wrote:

> This is an important thing to think about, and we are thinking about

> it. You may be right about this, but for now I will just disagree that

> 0-based indexing is more important than performance :)

>

> This is an important thing to think about, and we are thinking about

> it. You may be right about this, but for now I will just disagree that

> 0-based indexing is more important than performance :)

>

Not if you want the language to be popular. Look at Python. The

performance sucks, yet because of the stylish syntax it is one of the

most popular languages out there. In general people don't care about

performance, and even worse, sometimes can't tell the difference.

Flexibility and familiar syntax mean far more to a lot of people.

My opinion is different of course. Performance is everything for me!

Trendy syntax reminds me of a friend of mine who whilst watching Star

Trek exclaimed, "Oh look at me, I am an alien. I have superficial

cosmetic differences"!

Having said that, the established computer science trend (for general

purpose languages) is 0-based indexing and the mathematical trend (for

technical languages like Fortran and Matlab) is 1-based. Twenty years

ago when I seriously started programming I found 0-based indexing odd.

Now I have a 20 year habit that I'm not going to be able to kick and

large code bases all using 0-based indexing. Most experienced

programmers coming from general purpose languages will think

similarly.

I personally feel like the answer might be something like 0-based and

1-based modes, the 1-based mode being the default in line with the

original target of the Julia language as a Matlab replacement. I

haven't look at quite how the language is implemented, but this would

presumably only affect the parser.

Bill.

Feb 26, 2012, 7:13:34 PM2/26/12

to juli...@googlegroups.com

I think the julia devs have made very good decisions in terms of syntax given this languages goals. Quite honestly I've programmed more than a couple of lines of C, common lisp, Fortran, Matlab, Python/Numpy/SciPy and R and indexing doesn't bother me too much. To be honest the only place that I find indexing an issue is when using Numpy since in the linear algebra context I find 0-based indexing annoying. That said I suspect this is primarily because I had programmed quite a bit in Fortran, Matlab and R before trying Numpy but to be honest any language that supports syntax for slicing multi-dimensional arrays makes 0-based indexing rather unappetizing to me.

All this aside the devs have made some choices about syntax and I think they should stick with their initial decisions as making any serious changes as time goes on becomes progressively more difficult and unproductive... including spending their precious time responding to this thread when they could be using their talents to move julia forward.

In truth if I were to pick a syntax the best choice would be a type-inferenced lisp/scheme sitting on top of llvm (yummy!)... that would be great but I doubt anyone would flock to julia in that case. I could of course list of fa bunch of good reason and quote a bunch of smart people as to why that would be the ideal choice... but I'll leave it at that for another post.... not!

Jason

Feb 26, 2012, 7:14:55 PM2/26/12

to juli...@googlegroups.com

On 26 Feb 6:51PM, Bill Hart wrote:

> I personally feel like the answer might be something like 0-based and

> 1-based modes,

That way lies madness. Imagine tracking a weird numerical bug through

several routines, some of which use 0-based "mode" and some of which use

1.

I'm not just speculating here; I work at an HPC centre and I see a lot

of modern fortran code, some of which uses the adjustable lower bounds

option, where you can specify the lower bound on an array-by-array basis.

It seems like such a useful, reasonable feature on paper, especially for

things like applying stencils, etc. But I promise you I've never seen

use of this feature solve more readability problems than it causes.

I think the julia prime movers have to pull a Steve Jobs on this one;

make a design choice and stick with it. People telling you what they

think they want is useless; if people knew what they wanted, they'd have

it already.

0 or 1 is fine by me. (Anyone who has passionate feelings about how a

language labels the first item of a list/array needs to rethink how they

live their lives.) If you are going for the underserved

scientific/technical computing market, then 1 is a perfectly sensible

choice; as someone upthread succinctly put it, they're indices, not

offsets. If you want people to pull their matlab or R code over to

julia -- and again, working at an HPC centre, every time you convert

someone away from matlab, an angel gets their wings -- then 1 is a good

tactical choice. If you're more worried about looking pythonic, then 0

works best.

- Jonathan

--

Jonathan Dursi <ljd...@gmail.com>

Feb 26, 2012, 7:18:31 PM2/26/12

to juli...@googlegroups.com

Well, we're not going to settle this now for sure, but see my previous

comments. Given "f(0)", you can't tell whether the zero represents an

index. So this is not a parser tweak.

comments. Given "f(0)", you can't tell whether the zero represents an

index. So this is not a parser tweak.

0-based indexing isn't the heart of python's stylish syntax.

As far as python, it would make more sense to talk about a

julia<->python calling interface instead of changing core decisions to

match up with python.

Feb 26, 2012, 7:41:19 PM2/26/12

to juli...@googlegroups.com

On Sun, Feb 26, 2012 at 7:14 PM, Jonathan Dursi <ljd...@gmail.com> wrote:

On 26 Feb 6:51PM, Bill Hart wrote:That way lies madness. Imagine tracking a weird numerical bug through several routines, some of which use 0-based "mode" and some of which use 1.

I personally feel like the answer might be something like 0-based and

1-based modes,

Yes, I agree. Both would be worse than making a choice.

Having both origins was tried in the APL language and it was a failure since what happened was that programmers started writing their programs so that they worked in both modes which just added more effort and complexity to all programs needlessly.

BTW, check out this video on 0-origin as it relates to APL:

Feb 26, 2012, 8:14:46 PM2/26/12

to juli...@googlegroups.com

On 27 February 2012 00:18, Jeff Bezanson <jeff.b...@gmail.com> wrote:

> Well, we're not going to settle this now for sure, but see my previous

> comments. Given "f(0)", you can't tell whether the zero represents an

> index. So this is not a parser tweak.

> Well, we're not going to settle this now for sure, but see my previous

> comments. Given "f(0)", you can't tell whether the zero represents an

> index. So this is not a parser tweak.

Yes, I see the problem now. The base of the indexing needs to be a

property of the array, not of the [].

>

> 0-based indexing isn't the heart of python's stylish syntax.

No, obviously not. I only meant that (unfortunately) it is often

stupid irrelevant things that make languages popular. Aiming to be

really popular is perhaps not a worthy aim. Lots of really terrible

things are popular.

>

> As far as python, it would make more sense to talk about a

> julia<->python calling interface instead of changing core decisions to

> match up with python.

I don't personally use Python regularly, but I could see the sense in

that. It's become a fairly popular language for technical computing.

Bill.

Feb 26, 2012, 9:16:28 PM2/26/12

to juli...@googlegroups.com

Let me try to briefly convey how I learned to stop worrying and love 1-based indexing. Basically it comes down to the fact that there are three significant numbers you have to think about when it comes to an array: it's initial index, final index, and length. In 0-based indexing, these are all different: 0, n-1, and n. In 1-based indexing, the final index and length are the same: 0, n, n. That's one less thing to think about and keep track of when coding. It seems trivial, but when you're doing something tricky and juggling a lot of complicated things in your head, even the tiniest alleviation of cognitive load helps, I find.

The way Python does things where v[a:b] is a slice of length b-a only works because their slices don't include the final endpoint — which strikes me as very unintuitive and strange. I get why it's done that way: it's precisely so that they have this property. But still, it's confusing. It's more acceptable because Python slices aren't first-class objects — they're just notation. Having a:b by itself be the the range containing a, a+1, ..., b-1 as a standalone object seems weirder still. The diagram in Noam's original post is great, but the very fact that you *need* a diagram to explain how this works in Python seems like a sign that maybe it's not very intuitive. You don't need to explain 1-based indexing to anyone and you certainly don't need a diagram.

Then there's the fact that Julia looks and feels a lot like Matlab. That's intentional — it's a tactical move that will help it appeal to and spread among non-computer-science academics who already have exposure to Matlab. The choice is also due to the fact that Matlab really does get a lot of the syntax and semantics for linear algebra right. Plain Python, on the other hand, does *not* at all get the semantics for linear algebra right — nor was it designed to. NumPy is tacked on and has to completely change the behavior of arrays to make things work — hence the major distinction between Python arrays and NumPy vectors and matrices. Since numerical linear algebra is in so many ways the heart and soul of scientific computing, getting that part right is essential. When it comes to stealing array semantics from Matlab versus Python, it's a no-brainer: you steal from the language that has ruled numerical linear algebra for decades, not the one that needs to add incompatible array semantics in order to be able to do linear algebra reasonably. Given the syntactic and semantic similarity to Matlab, Matlab is really the only language that is possibly going to be trivial to port code from — and there's tons of really useful Matlab scientific codes out there, waiting to be ported and contributed to Julia's ecosystem. If we used 0-based indexing, then that one advantage would be out the window: linear algebra code from Matlab would never be trivial to port — it would always be a massive pain in the ass.

Feb 26, 2012, 10:50:17 PM2/26/12

to juli...@googlegroups.com

On Sun, Feb 26, 2012 at 20:16, Stefan Karpinski <ste...@karpinski.org> wrote:

> Then there's the fact that Julia looks and feels a lot like Matlab. That's

> intentional — it's a tactical move that will help it appeal to and spread

> among non-computer-science academics who already have exposure to Matlab.

> The choice is also due to the fact that Matlab really does get a lot of the

> syntax and semantics for linear algebra right. Plain Python, on the other

> hand, does *not* at all get the semantics for linear algebra right — nor was

> it designed to. NumPy is tacked on and has to completely change the behavior

> of arrays to make things work — hence the major distinction between Python

> arrays and NumPy vectors and matrices. Since numerical linear algebra is in

> so many ways the heart and soul of scientific computing, getting that part

> right is essential. When it comes to stealing array semantics from Matlab

> versus Python, it's a no-brainer: you steal from the language that has ruled

> numerical linear algebra for decades, not the one that needs to add

> incompatible array semantics in order to be able to do linear algebra

> reasonably. Given the syntactic and semantic similarity to Matlab, Matlab is

> really the only language that is possibly going to be trivial to port code

> from — and there's tons of really useful Matlab scientific codes out there,

> waiting to be ported and contributed to Julia's ecosystem. If we used

> 0-based indexing, then that one advantage would be out the window: linear

> algebra code from Matlab would never be trivial to port — it would always be

> a massive pain in the ass.

> Then there's the fact that Julia looks and feels a lot like Matlab. That's

> intentional — it's a tactical move that will help it appeal to and spread

> among non-computer-science academics who already have exposure to Matlab.

> The choice is also due to the fact that Matlab really does get a lot of the

> syntax and semantics for linear algebra right. Plain Python, on the other

> hand, does *not* at all get the semantics for linear algebra right — nor was

> it designed to. NumPy is tacked on and has to completely change the behavior

> of arrays to make things work — hence the major distinction between Python

> arrays and NumPy vectors and matrices. Since numerical linear algebra is in

> so many ways the heart and soul of scientific computing, getting that part

> right is essential. When it comes to stealing array semantics from Matlab

> versus Python, it's a no-brainer: you steal from the language that has ruled

> numerical linear algebra for decades, not the one that needs to add

> incompatible array semantics in order to be able to do linear algebra

> reasonably. Given the syntactic and semantic similarity to Matlab, Matlab is

> really the only language that is possibly going to be trivial to port code

> from — and there's tons of really useful Matlab scientific codes out there,

> waiting to be ported and contributed to Julia's ecosystem. If we used

> 0-based indexing, then that one advantage would be out the window: linear

> algebra code from Matlab would never be trivial to port — it would always be

> a massive pain in the ass.

In support of Stefan's point, porting MATLAB code with intent to

contribute to the ecosystem is exactly what I'm doing right now, and

it's going quite smoothly thus far, even if what I have isn't

completely idiomatic Julia yet. (WIP:

https://github.com/pao/Julia/tree/odefuns)

I've attempted to use NumPy, and it feels wrong--but not because of

indexing; I have no problem with the indexing while writing Python. As

a whole NumPy just feels tacked on (which, in a sense, it is). I know

plenty of people are perfectly happy with it, but at work when you

have MATLAB sitting there waiting to be used with its spiffy IDE and

better performance and it's somewhere I'm comfortable, there's no

contest.

But if I can trivially port that code to Julia, and start

parallelizing it without giving MathWorks more money? Or write

performance-critical code without limiting myself to the Embedded

MATLAB Subset, then forgetting to regenerate and recompile when I make

a change? Interface with existing C code without generating a wrapper

with Legacy Code Tool or writing tedious MEX boilerplate? That's a

pretty good business plan. That's what excites me about Julia.

--

Patrick O'Leary

patrick...@gmail.com

Feb 27, 2012, 7:42:00 AM2/27/12

to juli...@googlegroups.com

Hello everybody,

I'd like to offer my two cents on this simultaneously trivial and

important issue (you can never spend too much time arguing about

arbitrary conventions). To be perfectly clear, the following is not

meant to suggest that Julia should do one thing or the other, but just

to provide some arguments and ideas that don't seem to have been

brought up yet.

I have used Matlab and Mathematica (another language using one-based

indexing) as well as C and Python, all fairly extensively, and feel

comfortable with both conventions.

My general opinion is that zero- or one-based indexing doesn't matter.

What's most important is that the language provides constructs that

are powerful and consistent with the convention used. If the standard

way to denote a range of numbers is the Matlab style 1:n, then that

will obviously make 1-based indexing seem slightly nicer because you

don't need to subtract 1 from the length to write the range. But in

Python, range(n) or [:n] works out just as nicely. (Saying that one

set of notations/functions is a poor fit for the other's indexing

convention is rather begging the question.)

I'd like to be able to use functional constructs like map, reduce and

list comprehensions to avoid indices when possible; when indices are

necessary, something like Python's enumerate(), perhaps with slightly

more compact syntax, would be nice. Often, when iterating over a

collection of items, what you want to express is "all the indices of

this collection", not 1:n or 0:n-1 specifically -- with "all the

indices", the same code might even work automatically with hash

tables, sparse arrays, etc. as input. Functions like first(), rest()

and last() can provide some level of abstraction for common indexing

operations. Of course, you can't avoid indices entirely, and quite

often plain indices are much clearer than any other way of writing the

code.

I happen to agree with Dijkstra's argument, although I don't think

it's strong enough to make the case by itself. I can think of two

other, practical arguments in favor of zero-based indexing.

The first is that indexing strided, blocked, or multidimensional data

is more elegant. Adding one dimension is just a mul-add, and removing

one dimension is just a div-rem:

array[i*n+j]

array[i/n][i%n]

With one-based indexing, you need extra adjustments:

array[(i-1)*n+j]

array[(i-1)/n+1][(i-1)%n+1]

This is a vital simplification in a language like C where you often

end up doing pointer arithmetic manually. It's somewhat less of a

nuisance in a language with high-level constructs for working with

multidimensional arrays, slices, etc., though it's still common enough

to have to do such indexing manually.

A related, convenient property is that the ranges of unsigned integer

types neatly correspond to the indices of arrays of the same size. A

rather common operation is to use a byte to look something up in a

table of size 256. Or, as an extreme case, indexing a table of

false/true cases can be done on the truth value (or parity of an

integer) directly (in a language where false == 0, true == 1). Again,

high-level constructs make such concerns somewhat less important.

The second argument is that for many types of data, the first index

corresponds to the value 0 in a natural way. When the data *does*

happen to start at 1, one can often just add a dummy entry in a

zero-based array to avoid making ugly adjustments to formulas; the

converse in one-based indexing is obviously not possible. Some

examples:

-- In histograms or similar, one is likely to include a zero count of something.

-- The first entry in a list of coefficients of a polynomial or Taylor

series is the coefficient of x^0. Ditto for many other mathematical

objects. (Actually, Matlab represents polynomials in reverse order,

wich is objectively in poor taste. Presumably, making the order

entirely wrong makes it seem like less of a wart than having them in

the right order but just off by one Any language aiming for

compatibility with this convention will be tainted, but oh well. ;-)

-- In signal processing or simulations, the first sample often

corresponds to time t = 0 or position x = 0, and the first entry in

the DFT of the same data corresponds to the frequency f = 0.

Translating between indices and coordinates (in real or Fourier

space), then, is just scaling by a factor, rather than a scaling and

an offset. This often simplifies formulas.

Those things said, as recently as yesterday, I was writing some C code

for solving a four-dimensional recurrence equation, which had been

formulated with 0 as the starting coordinates. But to accommodate

boundary conditions, it turned out to be convenient to insert extra

planes corresponding to the position -1. So zero-based indexing didn't

save me from having to insert a few offsets in the end. Another

problem was that the domain was triangular along one axis, so

allocating a single large cube would have used twice as much memory as

necessary. I ended up using nested arrays instead of just making a

single large array; I could have allocated a large array of half the

size and done triangular offsets with pointer arithmetic, but it would

probably have been tedious to debug.

At times like that, what would be convenient is to have some

higher-level functionality to be able to declare "arrays" in terms of

more general index sets than the usual hyperrectangles in N^n. For

example, it would be nice to be able to use arbitrary affine

constraints over the integers, writing something like

array(indices=(x,y,z: -1 <= x <= N, -1 <= y <= N, -1 <= z <= x))

and having the implementation take care of packing the data together

and doing all the offset calculations.

Of course, for prototyping one can just overallocate or use a hash

table, but often this isn't good enough (in my case, overallocating

would have used twice as much memory, and a hash table would have

taken several times as much space). The point is to have the data

packed together contiguously in memory, to get the performance

benefits of a plain array, but with complex indexing done

automatically and efficiently, taking advantage of JIT compilation.

Again, this is probably well beyond the scope of a basic array type

and semantics thereof, and more an idea for additional high-level

functionality (that might solve some issues related to zero vs one

based indexing). In Matlab, a common idiom is to use an arbitrary

array as an index set for another array. This is very general and

actually quite elegant, but also obviously inefficient. One can think

about ways to achieve similar functionality without the drawbacks.

If you support "arrays with an offset", you do end up with extra cases

that need handling. Most code shouldn't really need to become more

complicated, as the language should provide constructs for iteration

and indexing that make the offset irrelevant. But there will still

probably be ambiguous cases -- what happens if you attempt a pointwise

addition of two arrays of the same size but different offsets, for

example? (Though, making the offset explicitly part of the type can be

argued to be a good thing in such a case, to catch errors.) It's

probably not worth trying to provide such functionality for the basic

array type. When in doubt, simple is better than complex.

Anyway, as I said at the start of this long rant, zero or one based

indexing is just an arbitrary convention; a language should stick to

one convention and do it well. The most common use is to iterate from

one point to another, and off-by-one errors occur no matter what.

Besides, C and Python users are likely to be better programmers than

Matlab users, so they will have less trouble porting their code ;-)

Fredrik

Feb 27, 2012, 7:47:21 AM2/27/12

to juli...@googlegroups.com

This is very well argued. I have myself come across these kinds of

adjustments of indexing in various different types of codes, similar

to the examples you provide.

adjustments of indexing in various different types of codes, similar

to the examples you provide.

The convincing argument is your closing sentence: "Besides, C and

Python users are likely to be better programmers than Matlab users, so

they will have less trouble porting their code". :-)

-viral

--

-viral

Feb 27, 2012, 4:03:18 PM2/27/12

to juli...@googlegroups.com

On Monday, February 27, 2012 1:42:00 PM UTC+1, Fredrik Johansson wrote:

Besides, C and Python users are likely to be better programmers than

Matlab users, so they will have less trouble porting their code ;-)

Let us instead take the hint from the languages the better programmers are using!

Furthermore, I can't help beginning to think about arrays as functions whose domains are subsets of the set of all integers. That way, you could have a three-element array for which valid indices were, say, -3, -2 and 17, but nothing else. That's more general than what I have seen suggested in the thread so far (or possibly equivalent to how it works in some other language that has been referred to). This may or may not affect the ongoing discussion.

Furthermore, I can't help beginning to think about arrays as functions whose domains are subsets of the set of all integers. That way, you could have a three-element array for which valid indices were, say, -3, -2 and 17, but nothing else. That's more general than what I have seen suggested in the thread so far (or possibly equivalent to how it works in some other language that has been referred to). This may or may not affect the ongoing discussion.

Feb 27, 2012, 5:21:59 PM2/27/12

to juli...@googlegroups.com

Of course sparse index sets and such are valuable. But the purpose of

the standard Array type is to be a dirt-simple dense array, as close

to "char*" as we dare. That way other things can be efficiently

layered on top. I can imagine very elaborate array models where you

have metadata (e.g. "missing" codes), arbitrary index sets, named

dimensions and indexes, distributions, compressed storage, etc.

the standard Array type is to be a dirt-simple dense array, as close

to "char*" as we dare. That way other things can be efficiently

layered on top. I can imagine very elaborate array models where you

have metadata (e.g. "missing" codes), arbitrary index sets, named

dimensions and indexes, distributions, compressed storage, etc.

Feb 28, 2012, 4:37:25 AM2/28/12