Toys! New Toys!
On a quick look--very nice feature set and presentation
(website/docs). Two questions--
- who are "you", e.g. the team that developed this? what is the
history of it? it seems pretty far along already
- under what license is this distributed?
Thanks
Patrick
One thing I really wish Java would
solve is the keyword problem - for example if a Fan class declared a
method called "import", then it wouldn't be usable in Java. C# solves
with the @ symbol.
Why?
My take as a JVM engineer (which is a limited but interesting
perspective) is that any of the good JVMs provides C-level
performance for many interesting Java codes, while the CLR provides
early-Java-level performance. The JVMs have been competing with each
other on performance for a decade, and it shows.
Performance isn't everything, but it often turns out to be important.
More thoughts here: http://blogs.sun.com/jrose/entry/
bravo_for_the_dynamic_runtime
I'd love to see a Boo-like thing for the JVM someday. I enjoy
languages which cleverly integrate a small number of high-leverage
features, rather than juxtapose a bunch of shallow hacks.
-- John
Thanks for the thoughtful response. I can definitely see your point.
> How do you handle line numbers in Boo? Are we just missing something,
> or do you really have to generate a pdb file to disk?
>
Yes, pdb/mdb files are necessary. The System.Reflection.Emit API takes
care of everything though so it's just a matter of calling
ILGenerator.MarkSequencePoint at the appropriate times.
Cecil is also a great way of reading and writing .net assemblies and
can automatically handle debugging info generation as well.
But again I see your point. Java line table attributes provide a
simpler solution indeed.
Best wishes,
Rodrigo
That day is coming. :)
Thanks, John.
Rodrigo
Brian
Portability between the two platforms was never a high priority for
Scala. I'd say the main goal was to bring together object-oriented
programming and functional programming in a modern, type-safe
language. Given that most people agree now that the functional and
object-oriented paradigms should go together, Scala was quite
successful in its first goal.
I would say there is no way to write really portable Scala code.
Sooner or later, one would need to use platform libraries, even for
things as basic as I/O. The language designer favored 'seamless'
interoperability with the platform to give access to tons of existing
libraries. Of course, there could/should be Scala abstractions on top
of that, but we don't have them. It's not that it's impossible, or
even very hard, but it didn't happen. And since we're at that, how do
you deal with platform-dependent code? Say a class on top of File,
which uses either java.io.File or the .NET equivalent behind the
scenes? Conditional compilation?
> I think any new language on the JVM will probably be multi-paradigm -
> both OO and functional. In fact even Java and C# seem to be moving
> that way. So I don't think that will end up being a distinguishing
> feature. Rather key differences will be in features like the type
> system and libraries. I'm a framework guy, so over the next few years
> you'll see most of my effort in the libraries versus the language
> syntax (which we hope to keep relatively simple and stable). For
> example, my experience has been that maintaining an indexed database
> of installed types effects how you design libraries (such as a webapp
> framework) even more than language syntax does. So I think the
> divergence really happens in the upper layers of the stack.
The part that I find really interesting in Fan is the approach to
concurrency. Can you explain a bit more about how it is implemented?
Is it mostly in the libraries, or it has some compiler support (for
message passing, for instance). Is there a way to pattern match on
messages, like in Erlang?
Thanks,
Iulian
--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais
Running the SciMark benchmark on my 32-bit WinXP Athlon64 X2 4400+ 2Gb RAM
machine:
Sun JDK 6: 385
.NET 3.5: 367
Here .NET is 5% slower than the JVM.
Running my ray tracer benchmark in Java vs F#:
Sun JDK 6: 4.930s
.NET 3.5: 4.690s
Here, .NET is 5% faster than the JVM.
Indeed, I have never been able to reproduce any benchmark results that
substantiate your claim that "the CLR provides early-Java-level performance".
> I'd love to see a Boo-like thing for the JVM someday. I enjoy
> languages which cleverly integrate a small number of high-leverage
> features, rather than juxtapose a bunch of shallow hacks.
Until the JVM is brought up to date with respect to basic functionality like
tail calls, I'm afraid you won't be seeing any production-quality innovation
along the lines of F#.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
I hadn't actually noticed that the .NET port of SciMark was written by a Java
programmer who had crippled it by inserting unnecessary locks in the code.
Removing these locks for a fairer comparison, I get:
Sun JDK 6: 385
.NET 3.5: 396
So .NET is not slower at all.
As a historical note, my understanding was that it was a Microsoft JVM
that introduced JIT and associated performance improvements in a JVM
platform. I think it's safe to say that there are some smart guys at
Redmond working on those sorts of technologies.
Cheers,
James
> That allowed keywords to be escaped by surrounding them with an
> underscore, e.g. the Java keyword class could be escaped via _class_.
And _class_ would be escaped as __class__, and so on?
--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures
I quoted the combined figures for all benchmarks. The individual figures are:
Java:
FFT 326
Jacobi 499
Monte C 71.8
Sparse 446
LU 579
C# .NET:
FFT 325
Jacobi 505
Monte C 96.5
Sparse 415
LU 629
As you can see, the Monte Carlo benchmark is several times faster (was 27.0)
without the unnecessary lock and the performance is basically identical
between Java and C#.
> Note the authors of
> this paper used an identical, non-synchronised random number generator
> for all languages, therefore your comments about syncronization are
> addressed by their approach.
They benchmarked an extremely old version of .NET that predated generics.
I think it's safe to say that there are some smart guys at
Redmond working on those sorts of technologies.
My benchmark results disproved your belief.
> Case in point: Some of the CLR customers at Lang.NET were asking for
> vectorized loops. Nobody could help them, because nobody was working
> on loops in the JIT. Meanwhile, HotSpot recently improved its
> benchmark scores in part by vectorizing some common loops.
Then why has HotSpot's performance not improved?
> According to people who use it in CLR, tailcall is uncomfortably
> slow.
That has not been true for some time now.
> Serrano's CLR version of BigLoo turns tailcalls off by default
> as a result. Looks like a neglected stepchild feature to me.
If it were neglected, they would not have drastically improved its performance
in the latest .NET release.
They also improved the efficiency of structs which, apparently, the JVM
doesn't even have.
> As far as tailcall on the JVM goes, I know at least one researcher
> who is working on it; I wish we had it yesterday.... See my blog for
> how it will probably work.
Assuming the JVM does eventually get tail calls, how many years will it be
before their performance catches up with the CLR?
>> Actually, since I was in Redmond last January for three days talking
>> personally to those guys at Lang.NET, I feel safe to say they have
>> shelved the JIT for several years, and their optimizations have not
>> kept pace with those in the JVM. (See my blog entry previously
>> mentioned.)
>
> My benchmark results disproved your belief.
for what it's worth, in PyPy we discovered that hotspot produces much
better code than the CLR when the bytecode doesn't follow the standard
pattern produced by the java/c# compilers.
In particular, we heavily use exceptions to model control flow in our
RPython program (e.g., every "for" loop needs to catch StopIteration),
but the CLR JIT it not able to optimize such a case, and thus the first
versions of the CLI backend produced very slow code; to have
reasonable performances, we rely on our own inliner/malloc
removal/exception inliner, which gave a speedup of something like 30x, IIRC.
On the other hand, hotspot produces much better code[1], and moreover we
get faster code if we *disable* our own optimizations, since if we use
them it results in more code to analyze because of the inlining.
[1] http://blogs.sun.com/jrose/entry/longjumps_considered_inexpensive
ciao,
Anto
You generated code that turned out to be less efficient on the CLR in this
particular case but you cannot validly generalize that to all "non-standard
code".
Indeed, we know that is wrong because tail calls have the exact opposite
performance characteristics because you have to work around their complete
absence (not just inefficiency) on the JVM.
> In particular, we heavily use exceptions to model control flow in our
> RPython program (e.g., every "for" loop needs to catch StopIteration),
> but the CLR JIT it not able to optimize such a case, and thus the first
> versions of the CLI backend produced very slow code; to have
> reasonable performances, we rely on our own inliner/malloc
> removal/exception inliner, which gave a speedup of something like 30x,
> IIRC.
Why didn't you use tail calls instead?
> On the other hand, hotspot produces much better code[1], and moreover we
> get faster code if we *disable* our own optimizations, since if we use
> them it results in more code to analyze because of the inlining.
>
> [1] http://blogs.sun.com/jrose/entry/longjumps_considered_inexpensive
Sure but it looks as though you are unnecessarily applying a workaround for
the absence of tail calls on the JVM to the CLR when you could have just used
tail calls on the CLR. Moreover, they are easier to use and much faster than
anything equivalent on the JVM.
> You generated code that turned out to be less efficient on the CLR in this
> particular case but you cannot validly generalize that to all "non-standard
> code".
right, I can't generalize to all non-standard code, but it's surely true
for the kind of non standard code pypy generates :-).
The exception inlining was only an example, there are other areas where
the CLR jit was worse, like code that makes an heavy use of temp
variables instead of leaving the values on the stack.
[cut]
> Why didn't you use tail calls instead?
I honestly don't see how tail calls could help here; could you show me
an example please?
ciao,
Anto
Sure. Generating exceptions unless absolutely necessary will be a very bad
idea on the CLR but it will also be a bad idea on the JVM because its
exception handling is slow.
> The exception inlining was only an example, there are other areas where
> the CLR jit was worse, like code that makes an heavy use of temp
> variables instead of leaving the values on the stack.
That's interesting.
> > Why didn't you use tail calls instead?
>
> I honestly don't see how tail calls could help here; could you show me
> an example please?
Sure. Consider the loop:
void run() {
for (int i=0; i<3; ++i)
if (foo(i) == 0) break;
bar();
baz();
}
Sounds like you were translating that into something like (F# code):
exception StopIteration
let run() =
try
for i=0 to 2 do
if foo i=0 then raise StopIteration
with StopIteration ->
()
bar()
baz()
But you could have translated it into:
let rec run_1 i =
if foo i=0 then run_2() else
if i<3 then run_1 (i + 1) else run_2()
and run_2() =
bar()
baz()
let run() =
run_1 0
Where both calls to the continuation "run_2" inside the body of the "run_1"
function are tail calls.
Tail calls have lots of advantages here. The JIT is likely to generate a
simple branch but it may well spot that the code blocks can be rearraged to
avoid even the branch! For example, it might rewrite the code into:
let rec run_1 i =
if foo i<>0 && i<3 then run_1 (i + 1) else
bar()
baz()
You can pass as many values as arguments to a continuation as you like and
they are highly likely to be kept in registers wherever your control flow
takes you (what were the exceptional and non-exceptional routes are now
symmetric) for the best possible performance. This facilitates lots of
subsequent optimizations by the JIT.
Doing a quick benchmark on this code, I find that 10^6 iterations using your
exception-based technique gives:
CLR: 24s
JVM: 1.3s
Holy smokes, the JVM is 18x faster!
Now try the tail calls (only available on the CLR):
CLR: 0.025s
Holy smokes, the CLR is 52x faster!
Optimizing exception handling in the JVM before implementing tail calls was
premature optimization, IMHO.
Could you please provide the source code for this performance comparison.
Richard Warburton
> Sure. Generating exceptions unless absolutely necessary will be a very bad
> idea on the CLR but it will also be a bad idea on the JVM because its
> exception handling is slow.
not always; it's entirely possible that I recall wrongly, but I remember
that we had no penalty in using exception vs. a plain for loop.
> Sounds like you were translating that into something like (F# code):
>
> exception StopIteration
>
> let run() =
> try
> for i=0 to 2 do
> if foo i=0 then raise StopIteration
> with StopIteration ->
> ()
> bar()
> baz()
yes, more ore less
> But you could have translated it into:
>
> let rec run_1 i =
> if foo i=0 then run_2() else
> if i<3 then run_1 (i + 1) else run_2()
> and run_2() =
> bar()
> baz()
>
> let run() =
> run_1 0
>
> Where both calls to the continuation "run_2" inside the body of the "run_1"
> function are tail calls.
well, but doing such a translation is not straightforward; it's doable,
but you need to write it, exactly as we wrote our own exception inliner
that compiles the original code into a plain for loop.
Honestly, I doubt that a tail call can be faster/much faster than a
plain for loop, but we should do some benchmark of course.
> Doing a quick benchmark on this code, I find that 10^6 iterations using your
> exception-based technique gives:
>
> CLR: 24s
> JVM: 1.3s
>
> Holy smokes, the JVM is 18x faster!
>
> Now try the tail calls (only available on the CLR):
>
> CLR: 0.025s
>
> Holy smokes, the CLR is 52x faster!
how does this compare with the first version of the loop you wrote?
void run() {
for (int i=0; i<3; ++i)
if (foo(i) == 0) break;
bar();
baz();
}
ciao,
Anto
Sure. The Java:
public class test
{
int foo(int n)
{
return n - 1;
}
void bar()
{
}
void baz()
{
}
void run()
{
Exception e = new Exception("");
try
{
for (int i=0; i<3; ++i)
{
if (foo(i) == 0) throw e;
}
}
catch (Exception e2)
{
}
bar();
baz();
}
public static void main(String[] args)
{
for (int n=0; n<10; ++n)
{
long start = System.currentTimeMillis();
for (int i=0; i<1000000; ++i)
(new test()).run();
System.out.println(System.currentTimeMillis() - start);
}
}
}
The F# (for both techniques):
#light
let foo n = n-1
let bar() = ()
let baz() = ()
exception StopIteration
let run1() =
try
for n=0 to 2 do
if foo n=0 then raise StopIteration
with StopIteration ->
()
bar()
baz()
let run2() =
let rec run_1 n =
if foo n=0 then run_2() else
if n<3 then run_1(n+1) else run_2()
and run_2() =
bar()
baz()
run_1 0
do
let t = new System.Diagnostics.Stopwatch()
t.Start()
for i=1 to 1000000 do
run1()
printf "Exceptions: %dms\n" t.ElapsedMilliseconds
t.Reset()
t.Start()
for i=1 to 1000000 do
run2()
printf "Tail calls: %dms\n" t.ElapsedMilliseconds
stdin.ReadLine()
public class Test
{
int foo(int n)
{
return n - 1;
}
void bar()
{
}
void baz()
{
}
final Exception e = new Exception("");
void run()
{
try
{
for (int i=0; i<3; ++i)
{
if (foo(i) == 0) throw e;
}
}
catch (Exception e2)
{
}
bar();
baz();
}
public static void main(String[] args)
{
Test t = new Test();
for (int n=0; n<10; ++n)
{
long start = System.currentTimeMillis();
for (int i=0; i<1000000; ++i)
//(new Test()).run();
t.run();
System.out.println(System.currentTimeMillis() - start);
}
}
}
--
Venlig hilsen / Kind regards,
Christian Vest Hansen.
See http://blogs.sun.com/jrose/entry/longjumps_considered_inexpensive
for some notes on exceptions and performance. Try this change:
public class test_reuse
{
public static final Exception EXCEPTION = new Exception("") { public
Throwable fillInStackTrace(){return null; } };
int foo(int n)
{
return n - 1;
}
void bar()
{
}
void baz()
{
}
void run()
{
//Exception e = new Exception("");
try
{
for (int i=0; i<3; ++i)
{
if (foo(i) == 0) throw EXCEPTION;
}
}
catch (Exception e2)
{
}
bar();
baz();
}
public static void main(String[] args)
{
for (int n=0; n<10; ++n)
{
long start = System.currentTimeMillis();
for (int i=0; i<1000000; ++i)
(new test_reuse()).run();
System.out.println(System.currentTimeMillis() - start);
}
}
}
also, baz() and bar() are noise in this benchmark, as they should be
dropped by the JVM as no-ops.
patrick@patrick-wrights-computer:~/tmp$
>java -cp . test
1572
1575
1570
1572
1576
1572
1603
1614
1575
1576
patrick@patrick-wrights-computer:~/tmp$
>java -cp . -server test
1480
1457
1468
1461
1470
1456
1459
1464
1455
1456
patrick@patrick-wrights-computer:~/tmp$
>java -cp . test_reuse
133
127
129
129
132
129
130
128
128
127
patrick@patrick-wrights-computer:~/tmp$
>java -cp . -server test_reuse
25
12
10
9
10
10
8
9
9
9
patrick@patrick-wrights-computer:~/tmp$
>java -version
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)
IMO, this discussion is becoming a bit of a food fight, which is
unnecessary and not really useful. You have points to make, Jon, but
everyone throwing benchmarks at each other usually just wastes time.
There are better forums to discuss these things.
Cheers!
Patrick
Whilst I don't disagree with your overall comment that the JVM should
implement Tail Call elimination, I'm not entirely sure why people are
so interested in using exceptions to implement their specific control
flow semantics of choice anyway. For example replacing the contents
of the run method with;
for (int i=0; i<3; ++i)
{
if (foo(i) == 0) break;
}
bar();
baz();
Yielded performance improvements in the order of 100-200 over the
exceptional control flow based approach. I would expect that writing
a similar program in c# would produce similar results. Since it was
only running for < 20 milliseconds, its hard to gauge an actually
accurate time. I can see why an interprocedural control flow sequence
would map nicely onto exceptions - but if you are moving around within
a method then surely it would be preferrably to stick to goto based
control flow. This would assume that you are using bytecode as your
preferred method of output, rather than java source, but I think thats
a reasonable assumption anyway.
Richard Warburton
Identical performance to using "break" here.
> Sure. The Java:
>
[cut]
> void run()
> {
> Exception e = new Exception("");
> try
> {
> for (int i=0; i<3; ++i)
> {
> if (foo(i) == 0) throw e;
> }
> }
> catch (Exception e2)
> {
> }
> bar();
> baz();
> }
>
> public static void main(String[] args)
> {
> for (int n=0; n<10; ++n)
> {
> long start = System.currentTimeMillis();
> for (int i=0; i<1000000; ++i)
> (new test()).run();
> System.out.println(System.currentTimeMillis() - start);
> }
> }
> }
this is not a good benchmark, for two reasons:
1) you are allocating a new object at every loop, but we are
benchmarking the loops, not the garbage collector :-); you should use
static methods instead, IMHO;
2) you are allocating a new exception every time; the optimization
described here [1] works only if the exception is pre-allocated.
[1] http://blogs.sun.com/jrose/entry/longjumps_considered_inexpensive
Here is my modified benchmark which tries to address these issues:
public class loop2
{
public static final Exception exc = new Exception("");
static int foo(int n)
{
return n - 1;
}
static void bar()
{
}
static void baz()
{
}
static void run()
{
try
{
for (int i=0; i<3; ++i)
{
if (foo(i) == 0) throw exc;
}
}
catch (Exception e2)
{
}
bar();
baz();
}
public static void main(String[] args)
{
for (int n=0; n<10; ++n)
{
long start = System.currentTimeMillis();
for (int i=0; i<1000000; ++i)
run();
System.out.println(System.currentTimeMillis() - start);
}
}
}
And here are the results:
antocuni@viper tmp $ java loop1
1923
2032
2032
2052
2031
2078
2058
2035
2067
2063
antocuni@viper tmp $ java loop2
9
3
1
1
1
1
1
1
1
1
Trying to interpret the number, I think that after the first iteration
hotspot decided to JIT compile the loop, and since it can inline the
exception it ends up with a completely empty loop which is thrown away.
ciao,
Anto
Actually I tried hoisting the allocation of "test" and it makes the code
consistently slower. I have no idea why.
> 2) you are allocating a new exception every time; the optimization
> described here [1] works only if the exception is pre-allocated.
> [1] http://blogs.sun.com/jrose/entry/longjumps_considered_inexpensive
I think that is not thread safe. Specifically, when the branch conveys
information (passed as arguments using a tail call, or embedded in the
exception) then you must use a locally allocated exception, right?
Yes. However, tail calls are not restricted to the body of a single method.
> > CLR: 0.025s
>
> how does this compare with the first version of the loop you wrote?
>
> void run() {
> for (int i=0; i<3; ++i)
> if (foo(i) == 0) break;
> bar();
> baz();
> }
25ms with tail calls drops to 7ms with "break" on the CLR, 10ms for break on
the JVM.
From TFA:
"A similar technique, not so widely used yet, is to clone a
pre-allocated exception and throw the clone. This can be handy if
there is information (such as a return value) which differs from use
to use; the variable information can be attached to the exception by
subclassing and adding a field. The generated code can still collapse
to a simple goto, and the extra information will stay completely in
registers, assuming complete escape analysis of the exception. (This
level of EA is on the horizon.)"
> > 2) you are allocating a new exception every time; the optimization
> > described here [1] works only if the exception is pre-allocated.
> > [1] http://blogs.sun.com/jrose/entry/longjumps_considered_inexpensive
>
> I think that is not thread safe. Specifically, when the branch conveys
> information (passed as arguments using a tail call, or embedded in the
> exception) then you must use a locally allocated exception, right?
Yes, you must. However, what makes allocating an exception expensive
is the fillInStack method, which has to walk the JVM stack. If you
override that in your exception class with a do-nothing method, then
locally allocating exceptions is very cheap.
On Sat, Apr 19, 2008 at 6:53 PM, John Rose <John...@sun.com> wrote:
>
> On Apr 19, 2008, at 3:16 PM, Rodrigo B. de Oliveira wrote:
>
> > On Sat, Apr 19, 2008 at 5:30 PM, Brian Frank
> > <brian...@gmail.com> wrote:
> >> ... I personally
> >> think the JVM is a much better platform for alternate languages
> >> than .NET.
> >
> > Why?
>
> My take as a JVM engineer (which is a limited but interesting
> perspective) is that any of the good JVMs provides C-level
> performance for many interesting Java codes, while the CLR provides
> early-Java-level performance. The JVMs have been competing with each
> other on performance for a decade, and it shows.
>
> Performance isn't everything, but it often turns out to be important.
>
> More thoughts here: http://blogs.sun.com/jrose/entry/
> bravo_for_the_dynamic_runtime
>
> I'd love to see a Boo-like thing for the JVM someday. I enjoy
> languages which cleverly integrate a small number of high-leverage
> features, rather than juxtapose a bunch of shallow hacks.
>
> -- John
>
>
>
> >
>
Actually that is exactly a case handled by tail calls: the method is
parameterized over the continuations that it will call. This is very common
in functional programming and is called continuation passing style (CPS).
Some functional compilers (e.g. SML/NJ) automatically do this to all code.
So it works if you do a CPS transformation on all your code leaving
your frames on the heap. In that case you can tail call a continuation
to simulate the exception. I am interested in this approach. I like
the flexibility that CPS style gives (perhaps different exception
models to the norm). However, wouldn't this approach have other
performance consequences (mainly the heap-based frames)?
Sorry to follow up my own post with more thoughts.
What I'm getting at is if CPS transformation and tailcalls were so
performant for exceptions then why bake exceptions into the CIL and
into the JVM bytecodes? I really appreciate Scheme because it provides
the primitives to implement high-level constructs like exceptions (and
coroutines, backtracking) but I figured because the "big guys" in
runtime systems baked in particular exception systems then it wasn't
considered fast enough for this common case.
--
On my machine, this benchmark is only 17% faster in Java for the small
problems and 0.04% faster for the large problems. That is well within the
variation between individual tests, the largest of which is the CLR being
2.44x faster on the polynomial multiplication test.
JRuby uses this technique since we frequently have flow-control
exceptions that contain different state. It's fast...very very fast. The
stack trace is basically *all* the cost, but John Rose's version also
eliminates the object allocation cost. For some of our exceptions we do
have a single instance.
- Charlie
-Tom
--
Blog: http://www.bloglines.com/blog/ThomasEEnebo
Email: en...@acm.org , tom....@gmail.com
It's worth mentioning that in order to implement non-local flow control,
IronRuby has to use exceptions just like JRuby. And any benchmarks
involving exceptions or non-local flow control are far slower on
IronRuby than on even the C version of Ruby. JRuby is consistently a lot
faster on all such benchmarks, largely because the cost of exceptions is
so low on the JVM.
- Charlie
I use exceptions like this too. I have an exception instance per
thread held in thread local storage. I currently leave the fillInStack
method alone so that the stack trace points to the creating location
which has a comment saying that if you get here via a printed stack
trace you have found a bug (As I should have caught it before you see
it).
As I only create one per thread it's not a significant performance hit.
John Wilson
For C++ and Java, I downloaded the source, compiled it with "make" and ran it.
For C#, I copied the source into a VS project, set it to "Release" mode,
built and ran it.
> I did not see version of arguments to JVM (or .Net runtime).
Neither did I. :-)
> You are using -server I assume...
Yes. That is the default here.
> 2008/4/24 Steven Shaw <ste...@gmail.com>:
> What I'm getting at is if CPS transformation and tailcalls were so
> performant for exceptions then why bake exceptions into the CIL and
> into the JVM bytecodes?
For CLR/CIL: exceptions (SEH) are an OS-level primitive on Windows, and
exception control flow can pass through other languages, including
native code (C++, C, Delphi etc.). If your CIL code is being called via
a callback from native code, you may want to be able to throw an
exception and catch it on the other side. Not highly recommended, of
course.
-- Barry
Yes.
> > In that case you can tail call a continuation
> > to simulate the exception.
Exactly.
> > I am interested in this approach.
You may like the book "Compiling with continuations" by Appel.
> > I like
> > the flexibility that CPS style gives (perhaps different exception
> > models to the norm). However, wouldn't this approach have other
> > performance consequences (mainly the heap-based frames)?
Yes. Current CPS implementations are typically ~50% slower for ordinary code
but there are lots of non-trivial trade-offs involved and lots of interesting
optimization potential elsewhere.
One important benefit is that moving stack frames onto the heap eliminates
stack overflows, simplifies the run-time (no stack to crawl!) and can improve
incrementality because the stack is often crawled atomically. You also get
callcc for free, which can be extremely useful in some circumstances.
On the other hand, the stack is often used for implicit thread-local storage
with concurrent GCs as an optimization and pushing everything onto the heap
burdens the GC. I believe CPS also complicates FFI.
> What I'm getting at is if CPS transformation and tailcalls were so
> performant for exceptions then why bake exceptions into the CIL and
> into the JVM bytecodes?
I believe there are two main reasons:
. Debugging: exceptions provide a lot of trace information. Without a stack,
you don't even get a stack trace with CPS.
. Business: industry value the old far more than they value the new and they
want to see minimal overhead added to old techniques. This was seen before in
C++: "you don't pay for what you don't use". The JVM and the CLR were
designed for business and largely adopted this mentality as a consequence.
Don't forget that, when they were introduced, many users feared garbage
collection let alone tail calls!
Microsoft have put considerable effort into features not found on the JVM,
like efficient tail calls, not only supporting them but even continuing to
aggressively optimize them.
> I really appreciate Scheme because it provides
> the primitives to implement high-level constructs like exceptions (and
> coroutines, backtracking) but I figured because the "big guys" in
> runtime systems baked in particular exception systems then it wasn't
> considered fast enough for this common case.
I believe the designs of the JVM and (to a lesser extent) the CLR were much
more backward looking than forward thinking because that is essential for
commercial success. Had the same effort been put into the best theoretical
design then I'm sure we could have had something much more productive (but
totally incompatible).
> > > So it works if you do a CPS transformation on all your code leaving
> > > your frames on the heap.
>
> Yes.
You don't have to use the heap. You can do what Chicken Scheme does:
CPS convert everything, but leave the calls as ordinary calls with the
call frames on the stack; then when the stack gets too big, long-jump
(fire an exception) to reset it and carry on. The calls never return,
so this is safe.
Chicken goes further: it allocates all objects on the stack as well,
and then when the stack is reset all live objects are copied to the
heap. This makes the stack function as the nursery generation of a
multigenerational heap.
See http://home.pipeline.com/~hbaker1/CheneyMTA.html for a brief explication.
Yes. And I have run them on more than one machine and obtained the same
results.
I also found a bug in the SciGMark code, specifically the C++ was printing the
wrong score for the MultPoly test.
> The claim that Java is faster than C# seems quite reasonable given this set
> of results
You are cherry picking one set of results from one machine that are dominated
by one test (LU). That is bad science.
> - why are yours different than other peoples?
I've Googled for SciMark benchmark results on similar hardware and everything
indicates that my results are perfectly representative.
You never described your machine and operating system precisely. Are you
comparing the JVM running on 64-bit Mac OS X with the CLR running in emulated
32-bit Windows?
> All, Java, C#, & C, tests are on Windows XP running under Parallels on
> a Mac Book Pro., 2.33 GHz Intel Core 2 Duo, 2 GB 667 MHz DDR2 SDRAM.
Benchmarking under a virtualized OS? Kirk just wrote recently about
that: <http://kirk.blog-city.com/can_i_bench_with_virtualization.htm>
Attila.
A Java-to-Java comparison (VMWare/Windows to Mac OS X) would be
interesting for VMWare aficionados.
Scimark is a good benchmark for basic CPU/FPU use. It is sensitive
to loop optimizations and array usage patterns, as well as to stray
oddities like how your random number generator is designed. The JVM
does well on loop opts., and there is always more to do (current
bleeding edge is SIMD).
A couple of scimark benchmarks use 2-D arrays (not surprising!) and
the JVM is a little weak there because of the lack of true 2-D
arrays. We have long known how to fix this under the covers, but as
we soberly prioritize our opportunities, we've chosen to work on
other things. An excellent outcome of the OpenJDK is that the
community can now vote with code about which optimizations are most
important.
At best, this sort of small benchmark will reach C++ levels of
performance on the JVM. (At least until we do really aggressive task
decomposition and use our virtualization freedom to lie about data
structure layouts. But at present the state of the art is to require
heavy input from the programmer for such things.)
At the risk of prolonging the benchmark battle, I have to admit that
scimark is not the sort of app. I had in mind when I was bragging
about the JVM earlier on this thread. (Sorry Fan guys. Major thread
hijack here. Your stuff looks cool, esp. the library agnostic part.)
The JVM's most sophisticated tricks (as enumerated elsewhere) have to
do with optimistic profile-driven optimizations, with deoptimization
backoffs. These show up when the system is large and decoupled
enough that code from point A is reused at point B in a type context
that is unknown at A.
At that point, the JVM (following Smalltalk and Self traditions) can
fill in missing information accumulated during warm-up, which can
drive optimization of point A in terms of specific use cases at point B.
All of this works best when the optimizations are allowed to fail
when the use cases at B change (due to app. phase changes, e.g.) or
when points C and D show up and causes the compilation of A's use
cases to be reconsidered. Key methods get recompiled multiple times
as the app. finds its hot spot.
It is these sorts of optimistic, online optimizations that makes the
JVM run faster than C++, when it does. (It does, e.g., when it
inlines hot interface calls and optimizes across the call boundary.)
Microsoft could do so with C# also, but probably not as long as the
C# JIT runs at application load time, which (as I am told by friendly
Microsoft colleagues) it does.
A final note about C# vs. Java on Intel chips. We have noticed that
the Intel (and AMD) chips are remarkably tolerant of junky object
code. Part of the challenge of JVM engineering is to find
optimizations that tend to make code run better across a range of
platforms with other core competencies (like many-core SPARC,
obviously for Sun).
I speculate that Hotspot has been driven to work harder on clever
optimizations not only because we have competed with other excellent
implementations (IBM J9, BEA JRockit), but also because Java needs to
run on a wider range of chips than C#; some of them are less
forgiving than x86. A way to quantify the "chip factor" would be to
compare the gap between server and client JITs on a range of Java
apps., especially simpler more "static" ones like scimark. More
forgiving chips would narrow the gap.
Best wishes,
-- John
Ah, I see. Can you cite a more suitable benchmark?
Ah, I see. Can you cite a more suitable benchmark?
On Apr 25, 2008, at 4:08 AM, Attila Szegedi wrote:
> On 2008.04.25., at 12:54, hlovatt wrote:
>
>> All, Java, C#, & C, tests are on Windows XP running under
>> Parallels on
>> a Mac Book Pro., 2.33 GHz Intel Core 2 Duo, 2 GB 667 MHz DDR2 SDRAM.
>
> Benchmarking under a virtualized OS? Kirk just wrote recently about
> that: <http://kirk.blog-city.com/can_i_bench_with_virtualization.htm>
A final note about C# vs. Java on Intel chips. We have noticed that
the Intel (and AMD) chips are remarkably tolerant of junky object
code. Part of the challenge of JVM engineering is to find
optimizations that tend to make code run better across a range of
platforms with other core competencies (like many-core SPARC,
obviously for Sun).
I speculate that Hotspot has been driven to work harder on clever
optimizations not only because we have competed with other excellent
implementations (IBM J9, BEA JRockit),
Best wishes,
-- John
Thanks Jon. I really appreciate your whole reply. Appel's book is on
my todo list :)
> I believe the designs of the JVM and (to a lesser extent) the CLR were much
> more backward looking than forward thinking because that is essential for
> commercial success. Had the same effort been put into the best theoretical
> design then I'm sure we could have had something much more productive (but
> totally incompatible).
Hopefully with the DVM (MLVM) we will see a more forward thinking VM
in the near future.
Perhaps some languages would benefit from a VM with heap-based frames
and primitive closures that could live on top on the DVM, optimising
heap-frames to stack-frames and other optimisations when possible.
Steve.
Ruby would! Ruby would!
- Charlie