continued from "return type declarations (issue #1090)" (julia and static types)

683 views
Skip to first unread message

Jeff Bezanson

unread,
Sep 8, 2016, 4:35:25 PM9/8/16
to juli...@googlegroups.com
In reply to https://github.com/JuliaLang/julia/issues/1090#issuecomment-245709651

> if you want efficiency, you would better be in this category of languages

I agree, and julia does in fact use static type information for
performance. It's just that we don't use this information, or our
inability to determine it accurately, to report errors before run
time. One great virtue of this approach is that we're free to change
and improve type inference as much as we want without changing the set
of valid programs, or changing how they behave (except for performance
of course).

> If, on the other hand, the language allows you to write simple code that everyone understands how it translates to machine code

The key word here is "allows". Julia certainly *allows* you to write
simple code with a clear translation to machine code, and aids this
directly by providing tools like code_typed, code_llvm, and
code_native. The problem is that in the past, language designers
picked what they considered to have a sufficiently simple translation
to machine code, and *mandated* it.

> "clever" things that makes it harder for one person to actually understand the performance characteristics of the code

This is true to some extent; there is a technical notion of
"predictable performance" that Julia arguably lacks. But I see it as a
tooling problem: you can use profiling, code_typed, TypeCheck.jl, etc.
to understand what's going on in bottlenecks. I don't think a language
can or should try to mandate writing fast code.

Yichao Yu

unread,
Sep 8, 2016, 4:58:23 PM9/8/16
to Julia Dev
On Thu, Sep 8, 2016 at 4:35 PM, Jeff Bezanson <jeff.b...@gmail.com> wrote:
In reply to https://github.com/JuliaLang/julia/issues/1090#issuecomment-245709651

> if you want efficiency, you would better be in this category of languages

I agree, and julia does in fact use static type information for
performance. It's just that we don't use this information, or our
inability to determine it accurately, to report errors before run
time. One great virtue of this approach is that we're free to change
and improve type inference as much as we want without changing the set
of valid programs, or changing how they behave (except for performance
of course).

> If, on the other hand, the language allows you to write simple code that everyone understands how it translates to machine code

The key word here is "allows". Julia certainly *allows* you to write
simple code with a clear translation to machine code, and aids this
directly by providing tools like code_typed, code_llvm, and
code_native. The problem is that in the past, language designers
picked what they considered to have a sufficiently simple translation
to machine code, and *mandated* it.

> "clever" things that makes it harder for one person to actually understand the performance characteristics of the code

This is true to some extent; there is a technical notion of
"predictable performance" that Julia arguably lacks. But I see it as a

FWIW, julia has much more predictable performance compare to other languages (that I know of) with a JIT. If it was not obvious enough from the source code, the few tools (many of them builtin) can give a pretty accurate sense of what the performance should be.

Lucian Radu Teodorescu

unread,
Sep 8, 2016, 6:20:16 PM9/8/16
to julia-dev


On Thursday, September 8, 2016 at 11:35:25 PM UTC+3, Jeff Bezanson wrote:
In reply to https://github.com/JuliaLang/julia/issues/1090#issuecomment-245709651

> if you want efficiency, you would better be in this category of languages

I agree, and julia does in fact use static type information for
performance. It's just that we don't use this information, or our
inability to determine it accurately, to report errors before run
time. One great virtue of this approach is that we're free to change
and improve type inference as much as we want without changing the set
of valid programs, or changing how they behave (except for performance
of course).

But, on the long run, it will make the users unable to see how their code is mapped to the target hardware (or, in this case, LLVM assembly). When this happens you'll start to drift away from the efficiency goal; I argue that the majority of users will not be able to write efficient code.
 

> If, on the other hand, the language allows you to write simple code that everyone understands how it translates to machine code

The key word here is "allows". Julia certainly *allows* you to write
simple code with a clear translation to machine code, and aids this
directly by providing tools like code_typed, code_llvm, and
code_native. The problem is that in the past, language designers
picked what they considered to have a sufficiently simple translation
to machine code, and *mandated* it.

Sorry for the confusion, I meant a different thing: the ability to express simple concepts that gets translated into efficient code, and that the users will be able to understand how that code translates into machine code (to a certain degree), and they can easily detect if the code is efficient or not.

Let me give you an example:
The following Sparrow code sums the squares of all the odd numbers up to 100:
1..100 filter \isOdd map \sqr sum

After a short period of getting used to this way of writing code, you will acknowledge that this is pretty straight-forward to write. But how efficient will this be? Again, after a little bit of experience, 
or by looking at the definition of filter and map, you can infer that this code is similar to the following C++ code:
int s = 0
for ( int i=1; i<100; i++ ) {
    if ( !isOdd(i) ) continue; 
    int i2 = sqr(i) 
    s += i2
, which is probably the most efficient way of doing the computation (if you don't completely rewrite the problem)

If the language allows you to make this inference, then you can inspect your code for efficiency. Doing so tends to keep the performance high.

Jeff Bezanson

unread,
Sep 8, 2016, 8:38:45 PM9/8/16
to juli...@googlegroups.com
I'm all in favor of functional programming, but historically
functional languages (even statically typed ones) did *not* always
compile maps, filters, and lambdas to the equivalent of hand-written
fast C code. To get that, you need a certain amount of compiler magic,
which I'm also all in favor of. That involves subtle factors, e.g.
inlining heuristics. Inlining heuristics can be very important for
performance, but aren't captured by type systems (as far as I know).

> But, on the long run, it will make the users unable to see how their code is mapped to the target hardware

Except that julia has simple tools to show you exactly that, from the REPL.

It's popular to point out (and I agree) that programs spend 80% of
their time in 20% of the code (or maybe even 90/10). This is then used
to justify languages that are consistently slow, by design. We use it
instead to justify some amount of performance opacity. Yes, a lot of
your code might be slow, but it won't matter. You use tools to find
the 20% that matters, and then make small tweaks to speed it up, as
opposed to rewriting it in a lower-level language. Being able to
express fast code in the same language after reading the manual
chapter on performance and using a couple simple tools really makes a
difference. I believe it turns out to be drastically easier than
rewriting in C or C++.

However I don't think this scenario is the common case. Many people
get significant speedups the first time they port code from another
dynamically-typed language to julia.

Data Pulverizer

unread,
Sep 9, 2016, 5:09:51 AM9/9/16
to julia-dev
It would seem from this post (http://www.johnmyleswhite.com/notebook/2015/11/28/why-julias-dataframes-are-still-slow/) that the Any type is problematic in Julia. Pherphs there needs to be room for breaking atomicity in a arrays or for the compiler to recognise heretogeneous array and label them as a some mixed typed array instead of type hiding with Any. Any type should be less about type hiding and more about type genericism. Tuples in Julia do not provide array operations, they are essentially immutable. So even if you supply a fix for dataframes you are still essentially stuck with the same problem when you need type heterogenicity or recognition in another situation.


> Except that julia has simple tools to show you exactly that, from the REPL.

I guess this means that people that want to fully optimise code will have to learn assembly.

Mauro

unread,
Sep 9, 2016, 5:16:54 AM9/9/16
to juli...@googlegroups.com
On Fri, 2016-09-09 at 10:39, Data Pulverizer <data.pu...@gmail.com> wrote:
>> Except that julia has simple tools to show you exactly that, from the
> REPL.
>
> I guess this means that people that want to fully optimise code will have
> to learn assembly.

That is the same for any language though, no?

Chris Rackauckas

unread,
Sep 9, 2016, 5:31:34 AM9/9/16
to julia-dev
The Any type will be "performance problematic" in any language which has something like that. There's no way anyone can really optimize it, because by definition it can be, well, anything. For example, an array of Any could have a Float16 after a Float64, so you can't know where the ith value is since they could have different sizes (even if you could make this a contiguous array). However, it's a necessary feature for scientific computing because it allows you to make the tradeoff when necessary. 

For example, it may take days on a super computer to generate some of these values, but using Any you can gather values which are Ints, Floats, BigFloats, etc., plot them, print them, display them in a Notebook, all without changing the types. Is it not the most performant thing? Sure, but I don't care about the milliseconds vs microseconds when plotting, I care about the days vs weeks part from the computation. I won't use Any in the computation, I will use it for print/display where the performance is always good enough and I want things to just work. It's nice to have that option.

However, Julia gives you the tools to work as efficiently as possible on things like Array{Any}, that's type-specialization though Julia's main feature: multiple-dispatch. I detailed this the other day on StackOverflow. This is the type genericism on arrays that you're thinking about. In v0.5 you can do this without the loop by using the broadcasting syntax: if a is an Array{Any}, you write f which dispatches on the different types in a, and then f.(a) will specialize on the types it encounters. I should probably add that...

Tim Holy

unread,
Sep 9, 2016, 5:51:41 AM9/9/16
to juli...@googlegroups.com
On Friday, September 9, 2016 1:39:39 AM CDT Data Pulverizer wrote:
> It would seem from this post
> (http://www.johnmyleswhite.com/notebook/2015/11/28/why-julias-dataframes-are
> -still-slow/) that the Any type is problematic in Julia. Pherphs there needs
> to be room for breaking atomicity in a arrays or for the compiler to
> recognise heretogeneous array and label them as a some mixed typed array
> instead of type hiding with Any. Any type should be less about type hiding
> and more about type genericism. Tuples in Julia do not provide array
> operations, they are essentially immutable. So even if you supply a fix for
> dataframes you are still essentially stuck with the same problem when you
> need type heterogenicity or recognition in another situation.

"Hiding" and "genericism" are the same thing. CPUs have one set of
instructions for adding two Float64s, and another for adding two Int64s. Thus,
if you're iterating over a container where you *know* that every element is a
Float64, your loop can be compiled down to calls that just involve Float64; if
in contrast you have a mixed container, then your inner loop needs to compile
down to something that conceptually looks like this:

# x is the latest element you've pulled out of the container
if isa(x, Float64)
# CPU does the Float64 version
elseif isa(x, Int64)
# CPU does the Int64 version
end

This is true for any language, from assembly to the slowest interpreted
language you can come up with. Mostly it's a question of just how big a hit
you take from code like this (and it's fair to say that Julia could do more to
optimize how it handles this case).

> > Except that julia has simple tools to show you exactly that, from the
> > REPL.
>
> I guess this means that people that want to fully optimise code will have
> to learn assembly.

Out of curiosity, why do you think that? You should check out @code_warntype,
which does not involve assembly at all.

Moreover, why do you try to twist a good thing---the availability of
introspection tools---into a bad one?

Methinks you folks greatly underestimate just how good julia is at generating
optimized code, and over what a wide range of constructs it produces fully-
optimized code. It's fair to say that over time I've learned a few tips about
writing efficient Julia code, but for reference: I can often write a half day's
worth of code, without having to check the introspection tools once, and not
have a single type problem that matters for performance. And the performance
is awesome. :-) (As Jeff has said, it's really nice to be able to write type-
unstable code sometimes when it doesn't hurt performance and not get yelled at
by the compiler, and I take advantage of that freedom when I want/need to.)

Best,
--Tim

Simon Byrne

unread,
Sep 9, 2016, 6:01:11 AM9/9/16
to julia-dev
On Thursday, 8 September 2016 23:20:16 UTC+1, Lucian Radu Teodorescu wrote:
Let me give you an example:
The following Sparrow code sums the squares of all the odd numbers up to 100:
1..100 filter \isOdd map \sqr sum

I have no experience with sparrow, but before Julia I spent a bit of time playing around with Haskell, and I would disagree. Functional programming, while often elegant, often makes it too opaque to reason about in terms of performance. The canonical example here is QuickSort: it's incredibly easy to write in a functional language, but how do you know if the compiler will actually make it efficient by doing it in place as opposed to allocating a new array at each step?


Data Pulverizer

unread,
Sep 9, 2016, 7:26:45 AM9/9/16
to julia-dev
@Tim Holy

>> I guess this means that people that want to fully optimise code will have
>> to learn assembly.
> Moreover, why do you try to twist a good thing---the availability of
> introspection tools---into a bad one?

It was not my intention to twist the availability of introspection tools into a bad thing. I can only guess at the the hard work that has gone into creating Julia and its ecosystem. I definitely want to see Julia amongst other languages succeed. When people talk about code profiling in Julia I always assume that they mean assembly since outputting assembly instructions in Julia has been a headline feature of its profiling. Arch D. Robinson's presentation on high performance computing in Julia (https://www.youtube.com/watch?v=szE4txAD8mk) has now educated me on the listing of the available methods.

From the same presentation, I suspect that writing high performance code in Julia will look very much like like working with a static programming language. The presentation reveals that type discipline is important if you want performance. Speculatively speaking Julia may be converging to something like where Sparrow is.

Thanks for your comments on genericism and type hiding.


> Methinks you folks greatly underestimate just how good julia is at generating
> optimized code, and over what a wide range of constructs it produces fully-
> optimized code.

I think this may be likely, and I agree Julia has a lot to offer. More presentations like Arch's would be really great to educational the community. I can see that there are some other interesting videos in JuliaCon 2016 that I have yet to watch.

I also think the topics in this thread are very interesting: https://groups.google.com/forum/#!topic/julia-dev/hUv9z5NsWDc

Steven G. Johnson

unread,
Sep 9, 2016, 8:02:45 AM9/9/16
to julia-dev


On Friday, September 9, 2016 at 7:26:45 AM UTC-4, Data Pulverizer wrote:
 When people talk about code profiling in Julia I always assume that they mean assembly since outputting assembly instructions in Julia has been a headline feature of its profiling.
 
No, code profiling means identifying performance-critical code sections through timing measurements.

And one of the most important tools in identifying performance problems is @code_warntype, which is also not at the assembly level.

I suspect that writing high performance code in Julia will look very much like like working with a static programming language. The presentation reveals that type discipline is important if you want performance.

Type discipline is important (though it is not the only consideration — avoiding temporary array allocations in inner loops is also a common consideration), but it is mainly about writing code where type-inference can succeed.  It doesn't require static type declarations, or code that only works with a predetermined set of types.

(That being said, there are also static languages that involve type-inference-based polymorphism.  As Jeff commented elsewhere, the key tradeoff is whether a failure of type inference is a compile-time error or a runtime performance problem, i.e. whether you are *only* allowed to write "fast" strongly typed code.)

The main thing, of course, is that writing fast code (in any language) requires some understanding of how a computer and a compiler actually works.   If you don't understand why Matlab or Python code is slow, it is hard to write Julia code that is fast.

Yichao Yu

unread,
Sep 9, 2016, 8:04:27 AM9/9/16
to Julia Dev
On Fri, Sep 9, 2016 at 7:26 AM, Data Pulverizer <data.pu...@gmail.com> wrote:
@Tim Holy
>> I guess this means that people that want to fully optimise code will have
>> to learn assembly.
> Moreover, why do you try to twist a good thing---the availability of
> introspection tools---into a bad one?

It was not my intention to twist the availability of introspection tools into a bad thing. I can only guess at the the hard work that has gone into creating Julia and its ecosystem. I definitely want to see Julia amongst other languages succeed. When people talk about code profiling in Julia I always assume that they mean assembly since outputting assembly instructions in Julia has been a headline feature of its profiling. Arch D. Robinson's presentation on high performance computing in Julia (https://www.youtube.com/watch?v=szE4txAD8mk) has now educated me on the listing of the available methods.

As I already mentioned at other places, `@code_native` should not be the first option to predict the performance of a julia function. Most of the time the typed ast (`code_typed` and `code_warntype`) and the LLVM IR (`code_llvm`) gives enough information about the performance of the code and should be much easier to understand. On top of the llvm code, the assembly codd (`code_native`) does give some more info in rare cases but it's not generally needed.

Tim Holy

unread,
Sep 9, 2016, 8:23:25 AM9/9/16
to juli...@googlegroups.com
Great, thanks for the clarifications.

Best wishes,
--Tim

Erik Schnetter

unread,
Sep 9, 2016, 9:45:02 AM9/9/16
to juli...@googlegroups.com
On Fri, Sep 9, 2016 at 5:16 AM, Mauro <maur...@runbox.com> wrote:
On Fri, 2016-09-09 at 10:39, Data Pulverizer <data.pu...@gmail.com> wrote:
>> Except that julia has simple tools to show you exactly that, from the
> REPL.
>
> I guess this means that people that want to fully optimise code will have
> to learn assembly.

That is the same for any language though, no?

These days, writing assembler code isn't good enough to get good performance. The CPU performs a large number of code transformations, in manners quite opaque to the assembler programmer.

To get fast code, you need to know about caches, SIMD vectorization, and multi-threading. And you need to be able to easily restructure large parts of the code, including how data structures are defined and how they are traversed. Assembler is the opposite of that, because that makes it very difficult to restructure your code. Instead, you need a high-level language that can manipulate code. I'm using Julia for its macros, because that allows me to write efficient code.

-erik

--

Lucian Radu Teodorescu

unread,
Sep 10, 2016, 7:36:59 AM9/10/16
to juli...@googlegroups.com
I agree with you that Haskell is not the most friendliest programming language. Yes, you can write very concise code, but that is not the actual problem. In my view, naturalness, is not that about the number of characters you type, but rather how simple is for a programmer to write, read and reason about a piece of code. And here is the point where Haskell fails IMHO. Yes, it’s true that long code tends to be harder to write/read/reason about, but let’s not turn the naturalness goal into a less typing goal.

QuickSort is just one of many examples you can find to prove that Haskell is terrible at efficiency too.

In my view, a great language needs to integrate this “naturalness” with (at least partial) the efficiency goal. For both of these goals, understanding what the code does is very important. Coming back to my example, as soon as you learn how operators work (basically you read things from left to right), you understand to what this is equivalent, then basically you can achieve both goals: naturalness + efficiency.
Reply all
Reply to author
Forward
0 new messages