Also, it is important to realize that julia is very unlike matlab and
python, in that the language imposes very little. The language only
provides a type system, function definition, and a compiler. You can
define other types that use 0-based indexing. You could even fork our
standard library (which is 100% julia) and change it to 0-based
indexing. That is not terribly practical, but the important thing here
is the standard library, and you can write alternate standard
libraries for julia that would put a very different "skin" on the
language, with the same performance.
I'm more of a C programmer than matlab programmer myself, and I
understand the index arithmetic issues. I don't believe simply picking
0 or 1 can solve all indexing problems or eliminate off-by-one errors.
I got used to 1-based quickly and it doesn't bother me any more.
Thanks for all the positive impressions of julia, it means a lot!
-Jeff
Julia can eval its own code. Therefore it is possible to present a
front-end which simply converts all 0-based indexing to 1-based
indexing. I don't even think it would be that hard to do, and I have
considered doing it for my own purposes. It shouldn't be inefficient.
I also think the "end" statement is largely superfluous (the "dangling
else" problem notwithstanding) and I had thought to make them optional
in any such front end I wrote for myself.
I am not working on this at the moment though. The first priority is a
decent bignum interface with GMP. I've had a bit of a play with that,
and despite some things that aren't quite clear to me yet, it seems to
be pretty easy to write a first approximation of this. Julia probably
even makes it possible to do a twos complement fixed precision bignum
interface using GMP's mpn interface. I don't know of any other
languages that would make that even vaguely efficient.
Julia presents so many possibilities. I have also thought about
writing an optional statically typed front end with type inference.
Julia seems like the perfect language to implement such a project.
Where has Julia been this past two years!? I had searched for it high
and low, day and night, to the point of nearly driving myself insane.
It's not even listed here:
http://llvm.org/ProjectsWithLLVM/
Bill.
I agree with Jeff that even though I was not sure about 1-based
indexing, it very quickly grew on me, and I do feel it is the right
choice. Even though we can easily support arbitrary base indexing with
some effort, accommodating it will slow down things - and we still
need our indexing to become faster than it is.
-viral
--
-viral
http://www.llvm.org/releases/3.0/docs/ReleaseNotes.html
-viral
--
-viral
I guess I am not that religious about the use of end for function
definitions, but it would make it difficult to then define inner
functions. It would be useful to have a discussion on syntax, and how
to improve it.
We could think of additional syntax for supporting 0 or arbitrary
based indexing, but then there is always the question of picking a
default. We need to think harder about this issue of supporting more
general indexing without giving up the elegance or performance that we
currently have.
-viral
On Fri, Feb 24, 2012 at 7:43 AM, Bill Hart <goodwi...@googlemail.com> wrote:
--
-viral
GMP support would be really useful to have!
I guess I am not that religious about the use of end for function definitions, but it would make it difficult to then define inner functions. It would be useful to have a discussion on syntax, and how to improve it.
We could think of additional syntax for supporting 0 or arbitrary based indexing, but then there is always the question of picking a default. We need to think harder about this issue of supporting more general indexing without giving up the elegance or performance that we
currently have.
> Julia can eval its own code. Therefore it is possible to present a
> front-end which simply converts all 0-based indexing to 1-based
> indexing. I don't even think it would be that hard to do, and I have
> considered doing it for my own purposes. It shouldn't be inefficient.
>
> I also think the "end" statement is largely superfluous (the "dangling
> else" problem notwithstanding) and I had thought to make them optional
> in any such front end I wrote for myself.
From a purely technical point of view you are right, and the idea of
writing your own personal syntax for Julia using its metaprogramming
features will certainly tempt many programmers, in particular those
coming from a Lisp background.
But please consider the consequences: the Julia community will become
fragmented as no one will be at ease reading someone else's code, even
though everything works fine together. That's in my opinion what
prevented Lisp from gaining serious momentum: The number of dialects
is hard to estimate, and then on top of each dialect there are dozens
of personal adaptations in the form of macro libraries.
Konrad.
I must say, this is a very nice argument.
The only problem is that
ultimately, I can't believe it is so important. You cannot seriously
argue that it's impossible to write a gui or database with 1-based
indexing.
Also, it is important to realize that julia is very unlike matlab and
python, in that the language imposes very little. The language only
provides a type system, function definition, and a compiler. You can
define other types that use 0-based indexing. You could even fork our
standard library (which is 100% julia) and change it to 0-based
indexing. That is not terribly practical, but the important thing here
is the standard library, and you can write alternate standard
libraries for julia that would put a very different "skin" on the
language, with the same performance.
I'm more of a C programmer than matlab programmer myself, and I
understand the index arithmetic issues. I don't believe simply picking
0 or 1 can solve all indexing problems or eliminate off-by-one errors.
I got used to 1-based quickly and it doesn't bother me any more.
Thanks for all the positive impressions of julia, it means a lot!
There is this tension between julia's general purpose nature and its
specific target for technical computing. I could give examples of how
mathematical code look much nicer with 1-based indexing for
multi-dimensional arrays. The thing with 0-based indexing is that you
always then have to write code of the type for i=0:len(a)-1 when
iterating. Either ways, there is always some indexing arithmetic.
> On Fri, Feb 24, 2012 at 4:14 AM, Viral Shah <vi...@mayin.org> wrote:
>
> There is this tension between julia's general purpose nature and its
> specific target for technical computing. I could give examples of how
> mathematical code look much nicer with 1-based indexing for
> multi-dimensional arrays. The thing with 0-based indexing is that you
> always then have to write code of the type for i=0:len(a)-1 when
> iterating. Either ways, there is always some indexing arithmetic.
>
> Can you give some concrete examples? I believe that the properly translated code would
> be either simpler or of the same complexity as 1-based code.
I suspect the reason for 1-based indexing in Fortran and Matlab comes
from the pervasive use of 1-based indexing in mathematics. Grab any linear
algebra textbook; the sums over element of vectors or matrices always
run from 1 to N.
That said, I prefer 0-based indexing as well, coming from a C and
Python background, but I don't think it's such a big issue.
Konrad.
As they say, all problems can be solved by another layer of
abstraction. If Julia efficiently supports and promotes sequence
manipulation using higher-order functions and/or generic abstractions
such as iterators or D-style ranges, then the underlying indexing base
becomes less of an issue.
-Joe
Good point. I think we will need a small design iteration at some
point around abstracting indexes and iteration over them a bit more.
As for automatically converting code to a different index base, I
don't think it's that easy. First, many uses of brackets might refer
to other data structures such as hash tables. At the same time,
function arguments might be indexes semantically, with no easy way to
tell --- f(0) might need to become f(1).
The only realistic approach, just to play along for a bit, is to
redefine ref and assign in array.j, at a minimum. The next problem is
all the library code that uses "for i=1:n", which I admit is a
weakness and should be more abstract.
I also recently discovered Julia and read the manual and played with
it some. And to get it out of the way at the beginning: I like what
you did so far -- Julia will definitely be in my toolbox from now on.
Now to the complaints about syntax and indexing:
* Syntax is not as important as many programmers think, its semantics
you should care more about.
Syntax is just something to get used to and if you don't think so
learn more programming languages until you don't.
What I'm not saying is that thinking about syntax is not important for
creating a new language. I'm only talking about using. I see the
same arguments again and again in different programming languages
mailing lists, from small tweaks wanted to the syntax sucks totally.
But even the small tweaks usually come with hidden pitfalls.
That said there is one property of programming language syntax I care
about: it shouldn't be too verbose. I see this like how much of my
finger joints do I have to use up to get a piece of code done (If you
don't know what I'm talking about: you will if you stay coding through
your life).
To conclude this rambling: you did well with the syntax IMHO, not too
verbose, concise and powerful.
* Indexing: well there are arguments for 0 based arrays and there are
arguments for 1 based arrays. But there is definitely no *right* way
to do this. One possible cop out is having it both ways or even have
arrays with indices from 2:5 or even negative.
0-based: good for some things, pretty hard to translate a numerical
algorithm to 0-based arrays. Getting a numerical algorithm implemented
right is hard enough as it is.
1-based: good for other things -- especially mathematical
calculation as mentioned, maybe a bit of confusion when interfacing to
the external (non Fortran/Matlab) world but this interfacing doesn't
have to be done with arrays if this should be a problem.
n-based-settable: well thats the cop out but is it really worth the
confusion? Never knowing where the arrays start ...
Talking about 0-based "like all the other languages" is not very
accurate since there are plenty of languages that have 1-base arrays.
Or different bases for different data structures like 1-based of list
and 0-based for buffers.
Again with indices like Julia the way it is and I don't want all my
programming languages the same ... otherwise why learn a new one.
Keep up the good work.
Cheers
-- Peer
-viral
Putting jokes on indexing aside, ranges in julia (matlab also) do include the endpoints. In fact, for writing scientific codes, I find this to be rather convenient.
julia> for i=1:5; println(i); end
1
2
3
4
5
julia> for i=Range1(1,5); println(i); end
1
2
3
4
5
Although many people like you and I are perfectly comfortable with 0-based or 1-based indexing, I suspect that people who only use very high level languages such as Matlab and R with no other exposure to programming otherwise will find 0 based indexing a bit unnatural. At least this is my experience working with scientists from various domains.
I do agree with the other comment by Konrad about forking the codebase and having too many dialects. I would request if we can try this out for some time. It would be great if we could figure out a way to keep everyone happy - and there seem to be some suggestions made on the list.
-viral
Just want to say that it won't be me or the original julia developers that will choose, but it should be community driven. We have just started this journey, and I am sure that the julia community will come up with its own style of making decisions, implementing features, etc.
-viral
This is related to the reddit discussion, which brought Dijkstra's argument that zero-length slices are ugly in inclusive ranges.
Not if you want the language to be popular. Look at Python. The
performance sucks, yet because of the stylish syntax it is one of the
most popular languages out there. In general people don't care about
performance, and even worse, sometimes can't tell the difference.
Flexibility and familiar syntax mean far more to a lot of people.
My opinion is different of course. Performance is everything for me!
Trendy syntax reminds me of a friend of mine who whilst watching Star
Trek exclaimed, "Oh look at me, I am an alien. I have superficial
cosmetic differences"!
Having said that, the established computer science trend (for general
purpose languages) is 0-based indexing and the mathematical trend (for
technical languages like Fortran and Matlab) is 1-based. Twenty years
ago when I seriously started programming I found 0-based indexing odd.
Now I have a 20 year habit that I'm not going to be able to kick and
large code bases all using 0-based indexing. Most experienced
programmers coming from general purpose languages will think
similarly.
I personally feel like the answer might be something like 0-based and
1-based modes, the 1-based mode being the default in line with the
original target of the Julia language as a Matlab replacement. I
haven't look at quite how the language is implemented, but this would
presumably only affect the parser.
Bill.
> I personally feel like the answer might be something like 0-based and
> 1-based modes,
That way lies madness. Imagine tracking a weird numerical bug through
several routines, some of which use 0-based "mode" and some of which use
1.
I'm not just speculating here; I work at an HPC centre and I see a lot
of modern fortran code, some of which uses the adjustable lower bounds
option, where you can specify the lower bound on an array-by-array basis.
It seems like such a useful, reasonable feature on paper, especially for
things like applying stencils, etc. But I promise you I've never seen
use of this feature solve more readability problems than it causes.
I think the julia prime movers have to pull a Steve Jobs on this one;
make a design choice and stick with it. People telling you what they
think they want is useless; if people knew what they wanted, they'd have
it already.
0 or 1 is fine by me. (Anyone who has passionate feelings about how a
language labels the first item of a list/array needs to rethink how they
live their lives.) If you are going for the underserved
scientific/technical computing market, then 1 is a perfectly sensible
choice; as someone upthread succinctly put it, they're indices, not
offsets. If you want people to pull their matlab or R code over to
julia -- and again, working at an HPC centre, every time you convert
someone away from matlab, an angel gets their wings -- then 1 is a good
tactical choice. If you're more worried about looking pythonic, then 0
works best.
- Jonathan
--
Jonathan Dursi <ljd...@gmail.com>
0-based indexing isn't the heart of python's stylish syntax.
As far as python, it would make more sense to talk about a
julia<->python calling interface instead of changing core decisions to
match up with python.
On 26 Feb 6:51PM, Bill Hart wrote:That way lies madness. Imagine tracking a weird numerical bug through several routines, some of which use 0-based "mode" and some of which use 1.
I personally feel like the answer might be something like 0-based and
1-based modes,
Yes, I see the problem now. The base of the indexing needs to be a
property of the array, not of the [].
>
> 0-based indexing isn't the heart of python's stylish syntax.
No, obviously not. I only meant that (unfortunately) it is often
stupid irrelevant things that make languages popular. Aiming to be
really popular is perhaps not a worthy aim. Lots of really terrible
things are popular.
>
> As far as python, it would make more sense to talk about a
> julia<->python calling interface instead of changing core decisions to
> match up with python.
I don't personally use Python regularly, but I could see the sense in
that. It's become a fairly popular language for technical computing.
Bill.
In support of Stefan's point, porting MATLAB code with intent to
contribute to the ecosystem is exactly what I'm doing right now, and
it's going quite smoothly thus far, even if what I have isn't
completely idiomatic Julia yet. (WIP:
https://github.com/pao/Julia/tree/odefuns)
I've attempted to use NumPy, and it feels wrong--but not because of
indexing; I have no problem with the indexing while writing Python. As
a whole NumPy just feels tacked on (which, in a sense, it is). I know
plenty of people are perfectly happy with it, but at work when you
have MATLAB sitting there waiting to be used with its spiffy IDE and
better performance and it's somewhere I'm comfortable, there's no
contest.
But if I can trivially port that code to Julia, and start
parallelizing it without giving MathWorks more money? Or write
performance-critical code without limiting myself to the Embedded
MATLAB Subset, then forgetting to regenerate and recompile when I make
a change? Interface with existing C code without generating a wrapper
with Legacy Code Tool or writing tedious MEX boilerplate? That's a
pretty good business plan. That's what excites me about Julia.
--
Patrick O'Leary
patrick...@gmail.com
I'd like to offer my two cents on this simultaneously trivial and
important issue (you can never spend too much time arguing about
arbitrary conventions). To be perfectly clear, the following is not
meant to suggest that Julia should do one thing or the other, but just
to provide some arguments and ideas that don't seem to have been
brought up yet.
I have used Matlab and Mathematica (another language using one-based
indexing) as well as C and Python, all fairly extensively, and feel
comfortable with both conventions.
My general opinion is that zero- or one-based indexing doesn't matter.
What's most important is that the language provides constructs that
are powerful and consistent with the convention used. If the standard
way to denote a range of numbers is the Matlab style 1:n, then that
will obviously make 1-based indexing seem slightly nicer because you
don't need to subtract 1 from the length to write the range. But in
Python, range(n) or [:n] works out just as nicely. (Saying that one
set of notations/functions is a poor fit for the other's indexing
convention is rather begging the question.)
I'd like to be able to use functional constructs like map, reduce and
list comprehensions to avoid indices when possible; when indices are
necessary, something like Python's enumerate(), perhaps with slightly
more compact syntax, would be nice. Often, when iterating over a
collection of items, what you want to express is "all the indices of
this collection", not 1:n or 0:n-1 specifically -- with "all the
indices", the same code might even work automatically with hash
tables, sparse arrays, etc. as input. Functions like first(), rest()
and last() can provide some level of abstraction for common indexing
operations. Of course, you can't avoid indices entirely, and quite
often plain indices are much clearer than any other way of writing the
code.
I happen to agree with Dijkstra's argument, although I don't think
it's strong enough to make the case by itself. I can think of two
other, practical arguments in favor of zero-based indexing.
The first is that indexing strided, blocked, or multidimensional data
is more elegant. Adding one dimension is just a mul-add, and removing
one dimension is just a div-rem:
array[i*n+j]
array[i/n][i%n]
With one-based indexing, you need extra adjustments:
array[(i-1)*n+j]
array[(i-1)/n+1][(i-1)%n+1]
This is a vital simplification in a language like C where you often
end up doing pointer arithmetic manually. It's somewhat less of a
nuisance in a language with high-level constructs for working with
multidimensional arrays, slices, etc., though it's still common enough
to have to do such indexing manually.
A related, convenient property is that the ranges of unsigned integer
types neatly correspond to the indices of arrays of the same size. A
rather common operation is to use a byte to look something up in a
table of size 256. Or, as an extreme case, indexing a table of
false/true cases can be done on the truth value (or parity of an
integer) directly (in a language where false == 0, true == 1). Again,
high-level constructs make such concerns somewhat less important.
The second argument is that for many types of data, the first index
corresponds to the value 0 in a natural way. When the data *does*
happen to start at 1, one can often just add a dummy entry in a
zero-based array to avoid making ugly adjustments to formulas; the
converse in one-based indexing is obviously not possible. Some
examples:
-- In histograms or similar, one is likely to include a zero count of something.
-- The first entry in a list of coefficients of a polynomial or Taylor
series is the coefficient of x^0. Ditto for many other mathematical
objects. (Actually, Matlab represents polynomials in reverse order,
wich is objectively in poor taste. Presumably, making the order
entirely wrong makes it seem like less of a wart than having them in
the right order but just off by one Any language aiming for
compatibility with this convention will be tainted, but oh well. ;-)
-- In signal processing or simulations, the first sample often
corresponds to time t = 0 or position x = 0, and the first entry in
the DFT of the same data corresponds to the frequency f = 0.
Translating between indices and coordinates (in real or Fourier
space), then, is just scaling by a factor, rather than a scaling and
an offset. This often simplifies formulas.
Those things said, as recently as yesterday, I was writing some C code
for solving a four-dimensional recurrence equation, which had been
formulated with 0 as the starting coordinates. But to accommodate
boundary conditions, it turned out to be convenient to insert extra
planes corresponding to the position -1. So zero-based indexing didn't
save me from having to insert a few offsets in the end. Another
problem was that the domain was triangular along one axis, so
allocating a single large cube would have used twice as much memory as
necessary. I ended up using nested arrays instead of just making a
single large array; I could have allocated a large array of half the
size and done triangular offsets with pointer arithmetic, but it would
probably have been tedious to debug.
At times like that, what would be convenient is to have some
higher-level functionality to be able to declare "arrays" in terms of
more general index sets than the usual hyperrectangles in N^n. For
example, it would be nice to be able to use arbitrary affine
constraints over the integers, writing something like
array(indices=(x,y,z: -1 <= x <= N, -1 <= y <= N, -1 <= z <= x))
and having the implementation take care of packing the data together
and doing all the offset calculations.
Of course, for prototyping one can just overallocate or use a hash
table, but often this isn't good enough (in my case, overallocating
would have used twice as much memory, and a hash table would have
taken several times as much space). The point is to have the data
packed together contiguously in memory, to get the performance
benefits of a plain array, but with complex indexing done
automatically and efficiently, taking advantage of JIT compilation.
Again, this is probably well beyond the scope of a basic array type
and semantics thereof, and more an idea for additional high-level
functionality (that might solve some issues related to zero vs one
based indexing). In Matlab, a common idiom is to use an arbitrary
array as an index set for another array. This is very general and
actually quite elegant, but also obviously inefficient. One can think
about ways to achieve similar functionality without the drawbacks.
If you support "arrays with an offset", you do end up with extra cases
that need handling. Most code shouldn't really need to become more
complicated, as the language should provide constructs for iteration
and indexing that make the offset irrelevant. But there will still
probably be ambiguous cases -- what happens if you attempt a pointwise
addition of two arrays of the same size but different offsets, for
example? (Though, making the offset explicitly part of the type can be
argued to be a good thing in such a case, to catch errors.) It's
probably not worth trying to provide such functionality for the basic
array type. When in doubt, simple is better than complex.
Anyway, as I said at the start of this long rant, zero or one based
indexing is just an arbitrary convention; a language should stick to
one convention and do it well. The most common use is to iterate from
one point to another, and off-by-one errors occur no matter what.
Besides, C and Python users are likely to be better programmers than
Matlab users, so they will have less trouble porting their code ;-)
Fredrik
The convincing argument is your closing sentence: "Besides, C and
Python users are likely to be better programmers than Matlab users, so
they will have less trouble porting their code". :-)
-viral
--
-viral
Besides, C and Python users are likely to be better programmers than
Matlab users, so they will have less trouble porting their code ;-)