Why do several methods on numbers involve multiple indirections including typeclasses and boxing?

Simon Ochsenreither

unread,

May 22, 2013, 7:42:36 PM5/22/13

to scala-i...@googlegroups.com

For instance, take 1L.abs:

Instead of just computing and returning the result (like it is done in RichInt), this is what happens:

RichLong inherits abs by extending ScalaNumberProxy[T]
ScalaNumberProxy[T] has an abstract protected implicit def num: Numeric[T]
This method is implemented in RichLong and points to scala.math.Numeric.LongIsIntegral
The actual method abs now calls num.abs(self)

Same story for doubleValue, floatValue, longValue, intValue, byteValue, shortValue, min, max, signum.

I have not benchmarked it, but I think this is a "bit" slower than just implementing the method.

Is there a reason why we don't just do that?

Thanks,

Simon

Rex Kerr

unread,

May 22, 2013, 9:34:05 PM5/22/13

to scala-i...@googlegroups.com

I haven't just benchmarked it but I recall that after warm-up it is just as fast. Not sure if it delays warm-up. Far worse is isNaN which doesn't even take the value class route.

--Rex

> --
> You received this message because you are subscribed to the Google Groups "scala-internals" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

Simon Ochsenreither

unread,

May 23, 2013, 7:16:33 AM5/23/13

to scala-i...@googlegroups.com

I haven't just benchmarked it but I recall that after warm-up it is just as fast. Not sure if it delays warm-up. Far worse is isNaN which doesn't even take the value class route.

Then I'm wondering what's the difference in RichInt is...

Rex Kerr

unread,

May 23, 2013, 11:26:30 AM5/23/13

to scala-i...@googlegroups.com

Just benchmarked it and it *does* make a difference--almost 50% on abs.

floatValue and kin are simply dreadful. They produce the same answer as .toFloat etc. but are 5x slower if you box through java.lang.Long like you do if you just .floatValue on something, and another 50% slower again if you use runtime.RichLong explicitly.

Do you want to open a ticket? Given 2.11's performance focus, this seems like a relevant "bug".

--Rex

P.S. Evidence for slowdown:

// Ints

scala> def fa = { var i,s = 0; while (i<10000) { s += (i*198237515).abs; i += 1 }; s }
fa: Int

scala> def fb = { var i,s = 0; while (i<10000) { s += math.abs(i*198237515); i += 1 }; s }
fb: Int

scala> def fc = { var i,s = 0; while (i<10000) { val x = i*198237515; s += (if (x<0) -x else x); i += 1 }; s }
fc: Int

scala> th.pbenchOff()(fa)(fb)
Benchmark comparison (in 907.0 ms)
Not significantly different (p ~= 0.9188)
Time ratio:    0.99971   95% CI 0.99398 - 1.00544   (n=20)
    First     9.738 us   95% CI 9.698 us - 9.777 us
    Second    9.735 us   95% CI 9.696 us - 9.774 us
res4: Int = -41212594

scala> th.pbenchOff()(fa)(fc)
Benchmark comparison (in 908.0 ms)
Not significantly different (p ~= 0.4564)
Time ratio:    1.00376   95% CI 0.99370 - 1.01382   (n=20)
    First     9.758 us   95% CI 9.689 us - 9.828 us
    Second    9.795 us   95% CI 9.726 us - 9.864 us
res5: Int = -41212594

scala> th.pbenchOff()(fb)(fc)
Benchmark comparison (in 892.9 ms)
Not significantly different (p ~= 0.6652)
Time ratio:    0.99876   95% CI 0.99306 - 1.00447   (n=20)
    First     9.734 us   95% CI 9.694 us - 9.773 us
    Second    9.722 us   95% CI 9.682 us - 9.761 us
res6: Int = -41212594

// Longs

scala> def ga = { var i,s = 0L; while (i<10000) { s += (i*1982375159817318957L).abs; i += 1 }; s }
ga: Long

scala> def gb = { var i,s = 0L; while (i<10000) { s += math.abs(i*1982375159817318957L); i += 1 }; s }
gb: Long

scala> def gc = { var i,s = 0L; while (i<10000) { val x = i*1982375159817318957L; s += (if (x<0) -x else x); i += 1 }; s }
gc: Long

scala> th.pbenchOff()(ga)(gb)
Benchmark comparison (in 823.5 ms)
Significantly different (p ~= 0)
Time ratio:    0.69559   95% CI 0.68437 - 0.70681   (n=20)
    First     76.82 us   95% CI 76.11 us - 77.53 us
    Second    53.43 us   95% CI 52.73 us - 54.14 us
res10: Long = 817698100697724512

scala> th.pbenchOff()(ga)(gc)
Benchmark comparison (in 560.6 ms)
Significantly different (p ~= 0)
Time ratio:    0.73010   95% CI 0.70759 - 0.75262   (n=20)
    First     74.02 us   95% CI 72.13 us - 75.90 us
    Second    54.04 us   95% CI 53.10 us - 54.98 us
Individual benchmarks not fully consistent with head-to-head (p ~= 2.553e-04)
    First     77.54 us   95% CI 75.12 us - 79.95 us
    Second    51.71 us   95% CI 51.54 us - 51.89 us
res11: Long = 817698100697724512

scala> th.pbenchOff()(gb)(gc)
Benchmark comparison (in 601.4 ms)
Significantly different (p ~= 0.0027)
Time ratio:    0.98614   95% CI 0.97753 - 0.99475   (n=20)
    First     52.18 us   95% CI 51.86 us - 52.50 us
    Second    51.46 us   95% CI 51.14 us - 51.78 us
res12: Long = 817698100697724512

On Thu, May 23, 2013 at 7:16 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

I haven't just benchmarked it but I recall that after warm-up it is just as fast. Not sure if it delays warm-up. Far worse is isNaN which doesn't even take the value class route.

Then I'm wondering what's the difference in RichInt is...

--

Simon Ochsenreither

unread,

May 23, 2013, 12:01:39 PM5/23/13

to scala-i...@googlegroups.com

I'm on it.

Simon Ochsenreither

unread,

May 23, 2013, 12:19:55 PM5/23/13

to scala-i...@googlegroups.com

https://issues.scala-lang.org/browse/SI-7511

Simon Ochsenreither

unread,

Jun 23, 2013, 11:41:07 AM6/23/13

to scala-i...@googlegroups.com

Hi Rex,

I created a pull request here: https://github.com/scala/scala/pull/2676

Can you have a look?

Thanks,

Simon

Rex Kerr

unread,

Jun 23, 2013, 12:10:17 PM6/23/13

to scala-i...@googlegroups.com

Looks reasonable to me. Two things I spotted:

(1) Chars are unsigned. abs should just return itself.

(2) For types that must be widened to Int (i.e. Byte, Char, Short), min and max are faster if you do the direct comparison: def min(that: Byte) = if (self < that) self else that so that you don't need to cast back to the smaller type.

--Rex

Simon Ochsenreither

unread,

Jun 23, 2013, 12:18:35 PM6/23/13

to scala-i...@googlegroups.com

(1) Chars are unsigned. abs should just return itself.

Fixed.

(2) For types that must be widened to Int (i.e. Byte, Char, Short), min and max are faster if you do the direct comparison: def min(that: Byte) = if (self < that) self else that so that you don't need to cast back to the smaller type.

Are you sure? I'm assuming here that the JVM's intrinsics are magnitudes faster than leveraging user-level code with branches.

Rex Kerr

unread,

Jun 23, 2013, 12:42:22 PM6/23/13

to scala-i...@googlegroups.com

You underestimate the effectiveness of the JIT compiler. I did test.

But the difference is small at most and not always present. I guess it's not worth worrying about. It's a little harder to mess up typing min vs. max than typing < vs >. An ideal JIT compiler will fix it regardless of which way it's done, and the Oracle JVM is at least very close in this regard.

--Rex

Paul Phillips

unread,

Jun 23, 2013, 12:45:37 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 9:42 AM, Rex Kerr <ich...@gmail.com> wrote:

Are you sure? I'm assuming here that the JVM's intrinsics are magnitudes faster than leveraging user-level code with branches.

You underestimate the effectiveness of the JIT compiler. I did test.

What are these jvm intrinsics? java.lang.Math.max is in effect "if (x < y) y else x". I'm not sure how inlining that expression can be orders of magnitude slower.

Rex Kerr

unread,

Jun 23, 2013, 1:10:15 PM6/23/13

to scala-i...@googlegroups.com

Apparently on JRockit from May 2010, it can be 3x slower!

$ /home/kerrr/pkg/jrmc-4.0.1-1.6.0/bin/java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Oracle JRockit(R) (build R28.0.1-21-133393-1.6.0_20-20100512-2126-linux-x86_64, compiled mode)
$ /home/kerrr/pkg/jrmc-4.0.1-1.6.0/bin/java -cp /jvm/scala-library.jar:/jvm/Ichi.jar:. TestMax
Benchmark comparison (in 5.915 s)

Significantly different (p ~= 0)

Time ratio:    3.22987   95% CI 3.16799 - 3.29175   (n=30)
    math.max 787.8 ns   95% CI 773.4 ns - 802.2 ns
    if-else   2.545 us   95% CI 2.530 us - 2.559 us

And I see no difference on j9 or jdk7. (jdk6 has if-else a tiny bit faster in some cases).

Anyway, enough testing. Leave it as max/min.

--Rex

Simon Ochsenreither

unread,

Jun 23, 2013, 2:49:17 PM6/23/13

to scala-i...@googlegroups.com

What are these jvm intrinsics? java.lang.Math.max is in effect "if (x < y) y else x". I'm not sure how inlining that expression can be orders of magnitude slower.

If you refer to the Java implementation, that's correct. The difference is that most JVM's these days replace the implementation of java.lang.Math and parts of java.lang.Integer/java.lang.Long with branch-less processor-specific instructions, so the actual Java implementation is never run.

Ismael Juma

unread,

Jun 23, 2013, 2:49:44 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 5:45 PM, Paul Phillips <pa...@improving.org> wrote:

What are these jvm intrinsics? java.lang.Math.max is in effect "if (x < y) y else x". I'm not sure how inlining that expression can be orders of magnitude slower.

Actually min and max are HotSpot intrinsics. See lines 644-645:

http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/file/fc8a1a5de78e/src/share/vm/classfile/vmSymbols.hpp

Ismael

Ismael Juma

unread,

Jun 23, 2013, 2:51:14 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 7:49 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

What are these jvm intrinsics? java.lang.Math.max is in effect "if (x < y) y else x". I'm not sure how inlining that expression can be orders of magnitude slower.

If you refer to the Java implementation, that's correct. The difference is that most JVM's these days replace the implementation of java.lang.Math and parts of java.lang.Integer/java.lang.Long with branch-less processor-specific instructions, so the actual Java implementation is never run.

That's right.

Best,

Ismael

Paul Phillips

unread,

Jun 23, 2013, 2:53:29 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 11:49 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

If you refer to the Java implementation, that's correct. The difference is that most JVM's these days replace the implementation of java.lang.Math and parts of java.lang.Integer/java.lang.Long with branch-less processor-specific instructions, so the actual Java implementation is never run.

What's the general strategy for taking this sort of thing into account? Do you guys sit around reading hotspot code?

(I'm all too aware that scala includes its share of similar "how would anyone know" performance factors.)

Ismael Juma

unread,

Jun 23, 2013, 3:05:04 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 7:53 PM, Paul Phillips <pa...@improving.org> wrote:

What's the general strategy for taking this sort of thing into account? Do you guys sit around reading hotspot code?

A few things:

1. Check the file I referenced. It includes most, if not all, of the intrinsics for HotSpot.

2. Call the relevant Java method for low-level things that are likely to have hardware-level instructions (this typically applies to methods in Math, primitives, String, crypto, concurrency, etc.). As new instructions are added to CPUs, the relevant methods in the Java standard library are replaced by intrinsified versions. In other cases, the JIT just uses the new instructions without code changes (for example, Intel Haswell has a fused multiply-add instruction which is likely to be used inside loops automatically if applicable).

Best,

Ismael

Paul Phillips

unread,

Jun 23, 2013, 2:58:34 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 11:49 AM, Ismael Juma <ism...@juma.me.uk> wrote:

Actually min and max are HotSpot intrinsics. See lines 644-645:

Would it be unreasonable to ask that java.lang.Math bytecode then not include this:

* public static int max(int, int);

Code:

0: iload_0

1: iload_1

2: if_icmplt 9

5: iload_0

6: goto 10

9: iload_1

10: ireturn

Maybe something more like

public static native int max(int, int);

I understand native and intrinsic aren't the same thing, but "native" must be closer than "here's some bytecode which never runs."

Simon Ochsenreither

unread,

Jun 23, 2013, 3:07:21 PM6/23/13

to scala-i...@googlegroups.com

What's the general strategy for taking this sort of thing into account? Do you guys sit around reading hotspot code?

From the documentation:

Code generators are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of Math methods.

In general, I just consider all math/bit-related methods in Math, Integer, Double to be intrinsic and try to re-use them as much as possible, instead of writing my own code.

Ismael Juma

unread,

Jun 23, 2013, 3:07:47 PM6/23/13

to scala-i...@googlegroups.com

The standard library is used by many JVMs, not just HotSpot. And a JNI-based method would be slower than the pure Java one. I agree with you that a better way to declare instrinsics would be welcome.

Best,

Ismael

Simon Ochsenreither

unread,

Jun 23, 2013, 3:18:53 PM6/23/13

to scala-i...@googlegroups.com

The standard library is used by many JVMs, not just HotSpot. And a JNI-based method would be slower than the pure Java one. I agree with you that a better way to declare instrinsics would be welcome.

There is no reason why native methods in the JDK would have to pay the JNI tax (and I'm pretty sure they don't).

Anyway, I agree with Paul, for the JDK, there is not a big distinction between native methods and non-native, intrinsic methods, although non-native methods have one nicety:
If you are running on a JVM which lacks better implementations or you run your code on a Pentium1, then the JDK comes with a default implementation of it, for free!

Ismael Juma

unread,

Jun 23, 2013, 3:47:02 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 8:18 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

There is no reason why native methods in the JDK would have to pay the JNI tax

You have just described the difference between an intrinsic method and a normal native method.

(and I'm pretty sure they don't).

Do you have any evidence for this? As far as I know, there is no other mechanism about from intrinsic methods to avoid the JNI tax.

Anyway, I agree with Paul, for the JDK, there is not a big distinction between native methods and non-native, intrinsic methods, although non-native methods have one nicety:
If you are running on a JVM which lacks better implementations or you run your code on a Pentium1, then the JDK comes with a default implementation of it, for free!

The point is that if the intrinsic is missing, then you will run whatever is implemented by normal means. If that is a native method, then it will rely on JNI. If that is slower than the pure Java version, then there is little benefit as it's also more work to implement.

Best,

Ismael

√iktor Ҡlang

unread,

Jun 23, 2013, 3:49:25 PM6/23/13

to scala-i...@googlegroups.com

Well, isn't the sad story that the JVM does the JDK favors it won't do for other *DKs?

--

Rex Kerr

unread,

Jun 23, 2013, 4:03:03 PM6/23/13

to scala-i...@googlegroups.com

Most JVMs _also_ recognize that bytecode emitted by if (x < y) y else x can be reduced to something more efficient and do so. JRockit is notoriously bad at dealing with primitives.

The whole premise of JIT compilation is that it's able to convert bytecode into something that takes reasonable advantage of what's available on the machine. If you make it do something like

if (x < y) x-1 else y+1

then you suddenly get 4x slower, as you should.

On Sun, Jun 23, 2013 at 2:49 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

What are these jvm intrinsics? java.lang.Math.max is in effect "if (x < y) y else x". I'm not sure how inlining that expression can be orders of magnitude slower.

If you refer to the Java implementation, that's correct. The difference is that most JVM's these days replace the implementation of java.lang.Math and parts of java.lang.Integer/java.lang.Long with branch-less processor-specific instructions, so the actual Java implementation is never run.

--

Simon Ochsenreither

unread,

Jun 23, 2013, 4:04:16 PM6/23/13

to scala-i...@googlegroups.com

Well, isn't the sad story that the JVM does the JDK favors it won't do for other *DKs?

The good news is that you can implement all the optimizations you ever wanted here: https://github.com/ReadyTalk/avian

On a related note, with https://github.com/scala/scala/pull/2675 and https://github.com/scala/scala/pull/2678 reviewed (*hint*) and merged (*hint* *hint*), we are one failing test case away from gaining a second runtime with various features lacking from HotSpot and friends.

Simon Ochsenreither

unread,

Jun 23, 2013, 4:41:30 PM6/23/13

to scala-i...@googlegroups.com

Do you have any evidence for this? As far as I know, there is no other mechanism about from intrinsic methods to avoid the JNI tax.

I just asked someone who is supposed to know it, answer: Native methods inside the OpenJDK don't pay JNI overhead.

Paul Phillips

unread,

Jun 23, 2013, 4:59:02 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 1:41 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

I just asked someone who is supposed to know it, answer: Native methods inside the OpenJDK don't pay JNI overhead.

Here is what appears some support for simon's information, at least in the specific case of sun.misc.Unsafe - though even if true, I don't know if it generalizes.

http://stackoverflow.com/questions/11174231/compare-direct-and-non-direct-bytebuffer-get-put-operations

There is some debate in comments about exactly when the JNI tax is collected.

Paul Phillips

unread,

Jun 23, 2013, 5:06:25 PM6/23/13

to scala-i...@googlegroups.com

Ah, the same guy in another question confirms my suspicion about Unsafe-specificity: Normal native methods are not inlined. The native methods in Unsafe can be.

http://stackoverflow.com/questions/7823665/why-jni-call-to-native-method-is-slower-than-similar-in-sun-misc-unsafe/7825191

And the other guy responds:

Yes, you was right. Unsafe actually inlined as intrinsic. Code of getInt easy to find in vm/prims/unsafe.cpp DEFINE_GETSETOOP(jint, Int); then GET_FIELD(obj, offset, jboolean, v); and then<br/> #define GET_FIELD(obj, offset, type_name, v) \ oop p = JNIHandles::resolve(obj); \ type_name v =(type_name)index_oop_from_field_offset_long(p, offset)

Paul Phillips

unread,

Jun 23, 2013, 5:10:37 PM6/23/13

to scala-i...@googlegroups.com

Now getting around to examining

http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/file/fc8a1a5de78e/src/share/vm/classfile/vmSymbols.hpp

Boy, if they'd put a little snowman on the javadoc for those intrinsified methods wouldn't that be a nice assist. There must be some way to squeeze the knowledge out of the jvm at runtime though... hopefully something short of logging the generated assembly and analyzing it.

Simon Ochsenreither

unread,

Jun 23, 2013, 5:13:19 PM6/23/13

to scala-i...@googlegroups.com

You could put a breakpoint on them and observe how it isn't triggered? :-)

Paul Phillips

unread,

Jun 23, 2013, 5:15:36 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 1:41 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

I just asked someone who is supposed to know it, answer: Native methods inside the OpenJDK don't pay JNI overhead.

According to my no-horse-in-the-race googling, this appeal to unknown authority should be presumed false unless it is bolstered. It should not be terribly difficult to prove or disprove via reproducible demonstration; but what's-happening-on-the-metal is not my specialty so I defer to others.

Ismael Juma

unread,

Jun 23, 2013, 5:25:54 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 9:41 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

Do you have any evidence for this? As far as I know, there is no other mechanism about from intrinsic methods to avoid the JNI tax.

I just asked someone who is supposed to know it, answer: Native methods inside the OpenJDK don't pay JNI overhead.

This is not evidence. :) For what is worth, I know for a fact that methods inside OpenJDK do pay JNI overhead. There are countless issues about this in Sun's bug DB. Including things like gzip compression.

Best,

Ismael

Ismael Juma

unread,

Jun 23, 2013, 5:27:23 PM6/23/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 9:59 PM, Paul Phillips <pa...@improving.org> wrote:

On Sun, Jun 23, 2013 at 1:41 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

I just asked someone who is supposed to know it, answer: Native methods inside the OpenJDK don't pay JNI overhead.

Here is what appears some support for simon's information, at least in the specific case of sun.misc.Unsafe - though even if true, I don't know if it generalizes.

http://stackoverflow.com/questions/11174231/compare-direct-and-non-direct-bytebuffer-get-put-operations

Unsafe is special indeed. It is another mechanism besides intrinsics. I should have mentioned that.

Best,

Ismael

Paolo G. Giarrusso

unread,

Jun 23, 2013, 8:15:56 PM6/23/13

to scala-i...@googlegroups.com

On Sunday, June 23, 2013 11:13:19 PM UTC+2, Simon Ochsenreither wrote:

You could put a breakpoint on them and observe how it isn't triggered? :-)

So you assume you can observe a bug of the debugger, or you tried that out? I've not tried that out, but I assume otherwise.

JVM debugging is not C debugging: the design rule (from Self onwards) is that if optimizations change the behavior visible under debugging, they are undone. But only where you can see them, so that the rest of the code runs at full speed. There's some mind-bogging technology working for you during JVM debugging.

By your reasoning, you could observe that optimizers reorder instructions on the debugger, by seeing the line pointer jumping back and forth on source without any loops. When I programmed in C (circa 2004-2006) and debugged optimized code, that was the rule.

Disclaimer: I took a course with Lars Bak, who ages ago worked on Hotspot and made us read some papers about its ancestor.

Simon Ochsenreither

unread,

Jun 24, 2013, 6:02:13 AM6/24/13

to scala-i...@googlegroups.com

JVM debugging is not C debugging: the design rule (from Self onwards) is that if optimizations change the behavior visible under debugging, they are undone. But only where you can see them, so that the rest of the code runs at full speed. There's some mind-bogging technology working for you during JVM debugging.

By your reasoning, you could observe that optimizers reorder instructions on the debugger, by seeing the line pointer jumping back and forth on source without any loops. When I programmed in C (circa 2004-2006) and debugged optimized code, that was the rule.

It's not an optimization! The JIT never sees the "real code". It happens far earlier than that.

Johannes Rudolph

unread,

Jun 24, 2013, 6:03:04 AM6/24/13

to scala-i...@googlegroups.com

Here are the intrinsics implemented in hotspot:

http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/file/fc8a1a5de78e/src/share/vm/opto/library_call.cpp

Look in the method starting at line 1893. It seems min and max usually generate a comparison. However, there's also some optimization code which tries to infer the result from "dominating comparisons". So, the cases where the intrinsic is actually faster than the manual version would probably more complex ones where the result can be inferred from the context.

It also contains this comment

     1911   // %%% This folding logic should (ideally) be in a different place.

     1912   // Some should be inside IfNode, and there to be a more reliable

     1913   // transformation of ?: style patterns into cmoves.  We also want

     1914   // more powerful optimizations around cmove and min/max.

which I read as that maybe the intrinsic wouldn't be needed if the folding logic would be more general (which, in fact, may be the case, since you never know if comments like this get cleaned up when something is improved at another place).

I've run an example with -XX:+PrintAssembly, here's the result (that's on an intel I7 3840QM on 64-bit linux):

https://github.com/jrudolph/math-intrinsics/blob/master/output.txt

It seems with this simple example both custom and intrinsic translate into something similar as this:

0x00007f0b6ce5a2ec: vucomisd %xmm1,%xmm0

0x00007f0b6ce5a2f0: jbe 0x00007f0b6ce5a2fe ;*dreturn

; - TestIntrinsics$::myMax@11 (line 5)

0x00007f0b6ce5a2f2: add $0x10,%rsp

0x00007f0b6ce5a2f6: pop %rbp

0x00007f0b6ce5a2f7: test %eax,0xa450d03(%rip) # 0x00007f0b772ab000

; {poll_return}

0x00007f0b6ce5a2fd: retq

0x00007f0b6ce5a2fe: vmovapd %xmm1,%xmm0

0x00007f0b6ce5a302: jmp 0x00007f0b6ce5a2f2

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net

Simon Ochsenreither

unread,

Jun 24, 2013, 6:07:31 AM6/24/13

to scala-i...@googlegroups.com

I think this is what I meant: http://stackoverflow.com/questions/15085294/java-lang-math-log-replaced-by-intrinsic-call-why-not-java-lang-math-exp

Johannes Rudolph

unread,

Jun 24, 2013, 6:09:34 AM6/24/13

to scala-i...@googlegroups.com

On Mon, Jun 24, 2013 at 12:03 PM, Johannes Rudolph <johannes...@googlemail.com> wrote:

Here are the intrinsics implemented in hotspot:

http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/file/fc8a1a5de78e/src/share/vm/opto/library_call.cpp

I should say that's the server JIT implementation, I'm not sure if all of the other variants in hotspot (interpreter, c1, zero, shark) implement min/max as an intrinsic.

Johannes Rudolph

unread,

Jun 24, 2013, 6:27:18 AM6/24/13

to scala-i...@googlegroups.com

On Sun, Jun 23, 2013 at 11:25 PM, Ismael Juma <ism...@juma.me.uk> wrote:

This is not evidence. :) For what is worth, I know for a fact that methods inside OpenJDK do pay JNI overhead. There are countless issues about this in Sun's bug DB. Including things like gzip compression.

That sounds like you could easily do better in general. You always need something like JNI if a native library that doesn't know about the VM should work on data structures (= memory locations) defined inside the VM. Unsafe's methods are obviously special because their implementations know everything about data structure in the VM but still can't be implemented in Java code so they have to be declared native in the classfiles but don't need the JNI overhead because they don't actually call out of the VM.

Ismael Juma

unread,

Jun 24, 2013, 6:43:16 AM6/24/13

to scala-i...@googlegroups.com

On Mon, Jun 24, 2013 at 11:27 AM, Johannes Rudolph <johannes...@googlemail.com> wrote:

On Sun, Jun 23, 2013 at 11:25 PM, Ismael Juma <ism...@juma.me.uk> wrote:

This is not evidence. :) For what is worth, I know for a fact that methods inside OpenJDK do pay JNI overhead. There are countless issues about this in Sun's bug DB. Including things like gzip compression.

That sounds like you could easily do better in general. You always need something like JNI if a native library that doesn't know about the VM should work on data structures (= memory locations) defined inside the VM. Unsafe's methods are obviously special because their implementations know everything about data structure in the VM but still can't be implemented in Java code so they have to be declared native in the classfiles but don't need the JNI overhead because they don't actually call out of the VM.

You can definitely do better. And Unsafe methods are that, as you say. But there are plenty of methods inside the JDK that do pay the JNI tax. And there have been countless discussions about ways to reduce that. Java 2D and compression (gzip) have been two cases where the JNI penalty has been an issue. Over time, new methods were added to Unsafe that helped in some cases, but definitely not all (or even most).

And the name of Unsafe is that for a reason. An example of code with a memory corruption bug from Gil Tene:

https://groups.google.com/d/msg/mechanical-sympathy/X-GtLuG0ETo/xy21CqkOW9IJ

Best,

Ismael

Reply all

Reply to author

Forward