Java Generics Performance Puzzler

376 views
Skip to first unread message

Martin Thompson

unread,
Jun 13, 2013, 8:45:14 AM6/13/13
to mechanica...@googlegroups.com
Lovely little Java performance issue with Generics.  Thanks to Gil Tene for pointing this out.


What a shame Generics were not reified and done right!

Martin....

Simone Bordet

unread,
Jun 13, 2013, 8:55:44 AM6/13/13
to Martin Thompson, mechanica...@googlegroups.com
Hi,
Seems more a non optimal JIT implementation to me.
The JIT should be able to remove the checkcast bytecode once it sees
that's not needed, no ?

I mean the JRockit JIT compiler was able to inline to a constant even
3 MethodHandler calls (see
http://medianetwork.oracle.com/video/player/589206011001, at 12:30) ,
would not HotSpot be able to just remove an unneeded checkcast ?

--
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless. Victoria Livschitz

Martin Thompson

unread,
Jun 13, 2013, 9:00:29 AM6/13/13
to mechanica...@googlegroups.com, Martin Thompson
In the following case how would a JVM be able to remove the check cast?

   while (idxSrc < NUMOBJS) {
      MyClass myc = aListSrc.get(idxSrc++);
      if (idxSrc % 2 == 0)
         aListDest1.add(myc);
      else
         aListDest2.add(myc);
   }

Because of type erasure the map has nothing to say what is inside it other than Objects and you need to assign them to a MyClass.  Hotspot is removing the check cast in the other case because the put takes an Object after type erasure.

Simone Bordet

unread,
Jun 13, 2013, 9:02:03 AM6/13/13
to mechanica...@googlegroups.com
On Thu, Jun 13, 2013 at 2:55 PM, Simone Bordet <simone...@gmail.com> wrote:
> Hi,
>
> On Thu, Jun 13, 2013 at 2:45 PM, Martin Thompson <mjp...@gmail.com> wrote:
>> Lovely little Java performance issue with Generics. Thanks to Gil Tene for
>> pointing this out.
>>
>>
>> http://developer.amd.com/community/blog/a-java-generics-performance-puzzler/
>>
>> What a shame Generics were not reified and done right!
>
> Seems more a non optimal JIT implementation to me.
> The JIT should be able to remove the checkcast bytecode once it sees
> that's not needed, no ?
>
> I mean the JRockit JIT compiler was able to inline to a constant even
> 3 MethodHandler calls (see
> http://medianetwork.oracle.com/video/player/589206011001, at 12:30) ,
> would not HotSpot be able to just remove an unneeded checkcast ?

The final part (that I overlooked) talks about not being possible to
remove the checkcast because it _may_ throw an exception, but I find
the reasoning similar to not being able to remove the null checks -
yet there are ways to do this.

Plus, is not the semantic broken ?
If I were able to put something that was not of the right type into
the source list, then

destList.add(sourceList.get(i))

should throw the same ClassCastException, no ?

Martin Thompson

unread,
Jun 13, 2013, 10:55:26 AM6/13/13
to mechanica...@googlegroups.com

On Thursday, June 13, 2013 2:02:03 PM UTC+1, Simone Bordet wrote:
On Thu, Jun 13, 2013 at 2:55 PM, Simone Bordet <simone...@gmail.com> wrote:
> Hi,
>
> On Thu, Jun 13, 2013 at 2:45 PM, Martin Thompson <mjp...@gmail.com> wrote:
>> Lovely little Java performance issue with Generics.  Thanks to Gil Tene for
>> pointing this out.
>>
>>
>> http://developer.amd.com/community/blog/a-java-generics-performance-puzzler/
>>
>> What a shame Generics were not reified and done right!
>
> Seems more a non optimal JIT implementation to me.
> The JIT should be able to remove the checkcast bytecode once it sees
> that's not needed, no ?
>
> I mean the JRockit JIT compiler was able to inline to a constant even
> 3 MethodHandler calls (see
> http://medianetwork.oracle.com/video/player/589206011001, at 12:30) ,
> would not HotSpot be able to just remove an unneeded checkcast ?

The final part (that I overlooked) talks about not being possible to
remove the checkcast because it _may_ throw an exception, but I find
the reasoning similar to not being able to remove the null checks -
yet there are ways to do this.

Plus, is not the semantic broken ?
If I were able to put something that was not of the right type into
the source list, then

destList.add(sourceList.get(i))

should throw the same ClassCastException, no ?

Generics are erased from the source when bytecode is generated by the compiler, therefore the runtime has no means of telling what can be put in a collection.  So how would it check what has should put in?  Generics are really only a compile time hack to the language that did not reify to remain backwards compatible.   If they were reified they would be a lot more powerful.

Wojciech Kudla

unread,
Jun 13, 2013, 10:59:00 AM6/13/13
to Martin Thompson, mechanica...@googlegroups.com
Correct me if I'm wrong but using some simple heuristics version 2 (with local variable) could be optimised by JIT or even bytecode compiler to version 1 thus eliminating checkcast. Or am I missing something? 


2013/6/13 Martin Thompson <mjp...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jakub K

unread,
Jun 13, 2013, 3:02:45 PM6/13/13
to mechanica...@googlegroups.com, Martin Thompson
Exactly - that's the point. I've checked same versions by Caliper, and results are totally different from ones presented in the article:


Kind regards,
Kuba


2013/6/13 Martin Thompson <mjp...@gmail.com>
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Simone Bordet

unread,
Jun 13, 2013, 3:49:05 PM6/13/13
to Jakub K, mechanica...@googlegroups.com, Martin Thompson
Ahh, Martin,

see what you've done ?

Now we're all running with -XX:+PrintOptoAssembly to figure out what's
happening.

:)

On Thu, Jun 13, 2013 at 9:02 PM, Jakub K <jkubr...@gmail.com> wrote:
> Exactly - that's the point. I've checked same versions by Caliper, and
> results are totally different from ones presented in the article:
>
> https://microbenchmarks.appspot.com/runs/6f0ffd35-b084-471d-af3a-20de9e6b0121
>
> Kind regards,
> Kuba

--
Simone Bordet
----
http://cometd.org
http://webtide.com
http://intalio.com
Developer advice, training, services and support
from the Jetty & CometD experts.
Intalio, the modern way to build business applications.

Kirk Pepperdine

unread,
Jun 13, 2013, 4:22:35 PM6/13/13
to Jakub K, mechanica...@googlegroups.com, Martin Thompson
wow, a bad microbenchmark, what a surprise!

-- Kirk

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Martin Thompson

unread,
Jun 13, 2013, 4:34:35 PM6/13/13
to mechanica...@googlegroups.com, Jakub K, Martin Thompson, sbo...@intalio.com
The blog is 4 years old.  Be interesting to see if the JVM has changed behaviour with regard to this in this time.  Anyone got a JVM build from before this to test with?

If the check cast is being optimised away be good to know the scope in which that is possible.  

Fascinating little example :-)

Jakub Kubrynski

unread,
Jun 13, 2013, 4:39:53 PM6/13/13
to mechanica...@googlegroups.com, Jakub K, Martin Thompson, sbo...@intalio.com
As I know last big changes in this area were done in 1.6.14 which was released in the end of may 2009. So we've to run this benchmark on 1.6.13 and 1.6.14 to make sure they were not right ;)

Kuba

Jakub Kubrynski

unread,
Jun 13, 2013, 5:07:19 PM6/13/13
to mechanica...@googlegroups.com, Jakub K, Martin Thompson, sbo...@intalio.com
Ok,

I've checked my benchmark on 1.6.0_13 - in fact first version is about 3-5% faster but this could be measurement inaccuracy.

Kind regards,
Kuba

Gil Tene

unread,
Jun 13, 2013, 8:36:29 PM6/13/13
to mechanica...@googlegroups.com
Guys, measuring speed and seeing if the ubenchmark numbers stayed consistent over 4 years is missing the point. I wasn't going after "this is x% slower". I was going after "there is an unavoidable check cast operation, with the associated header access and potential extra cache miss).we can emphesize the penalty of this by using a fat (e.g. 300 byte) object and accessing a field at the end of it after the get.

The point can be seen by looking at the generated assembly code, and seeing the checkcast staring back at you. The caller is required to place a checkcast there because it is casting an Object to a specific class and is about to use the object as such. I don't care if it's for a type specific put, a getfield, or an invoke, it can't do any of those safely without the check, and must through an exception if the class is wrong (which can easily be made to happen in valid unchevked ways by storing a wrong object type into the collection).

So if someone can show generated assembly for this sort of thing (a get from a generic collection that had a million things deposited in it before, followed by a getfield on the thing you got) which somehow does not perform the check cast, I'm
very interested.

And BTW, AFAIK the reason null checks can be optimzed away when this cant is simple: null checks are implicitly (hardware) supported on all access through virtual memory : low address memory is protevted such that an access through a null will always SEGV, and the JVMs are smart enough to catch the SEGV and realize what happened ("I was accessing this thing through a nul pointer"). There is no equivalent hardware assist for detecting "I accessed a field on a wrong object type" - that just results in undetected data corruption.

Heath Borders

unread,
Jun 13, 2013, 9:33:53 PM6/13/13
to mechanica...@googlegroups.com
Couldn't javac avoid the cast by moving the while body into a method?

static <T> split(List<T> src, List<T> odd, List<T> even, int index) {
T t = src.get(index);
if (index % 2 == 0)
even.add(t);
else
odd.add(t);
}

-Heath
From my iTouch5
> --
> You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Gil Tene

unread,
Jun 14, 2013, 1:09:09 PM6/14/13
to mechanica...@googlegroups.com
Lets make it simple so we can focus on where the checkcast is sticking our like sore thumb but is unavoidable:

   ArrayList<MyClass> aListSrc, aListDest1, aListDest2;
   ...
   // Assume lists are constructed, and that some other thread
   // populates aListSrc with a bunch of MyClass objects
   .... 
   while (idxSrc < NUMOBJS) {
      MyClass myc = aListSrc.get(idxSrc++);
      if (myc.getOddOrEven() % 2 == 0)
         aListDest1.add(myc);
      else
         aListDest2.add(myc);
   }

Knowledge of the class of myc is actually needed to access it's fields or call it's methods. The JVM can only do these things if it knows for sure that the type is actually a MyClass. Since the generic MyClass type in "ArrayList<MyClass> aListSrc" was erased, the JVM cannot assume the list has anything more specific than objects in it. So Even though the source code clearly indicates things in aListSrc are all off type MyClass, the JVM can't know that. As far as the JVM is concerned, the thing it got back from myc = aListSrc.get() is only known to be an Object, and nothing more specific than that. This results in an unavoidable checkcast after the collection's get() call, and that checkcast accesses the header in myc.

Heath Borders

unread,
Jun 14, 2013, 1:18:17 PM6/14/13
to Gil Tene, mechanica...@googlegroups.com

Thanks. I get why the JVM needs to pull the MyClass header if the local variable bytecode is emitted, and doubly so if you actually call a method on the local variable.

My question was why can't javac detect that the local variable is just used as Object (as in the original example, not yours where you call getOddOrEven) and emit bytecode equivalent to my static method? Since this is inside a method body, it should be safe.

--

Jean-Philippe BEMPEL

unread,
Jun 14, 2013, 4:19:20 PM6/14/13
to mechanica...@googlegroups.com
Hello Guys !

I have simplified the code to be more understandable under disassembly:

Version 1:

public class TestJIT
{
    private static final int NUMOBJS = 1000*1000;
   
    static ArrayList<MyClass>  aListSrc = new ArrayList<TestJIT.MyClass>();
    static ArrayList<MyClass> aListDest1 = new ArrayList<TestJIT.MyClass>();
     
    public static void bench()
    {
        int idxSrc = 0;       
        aListDest1.add(aListSrc.get(idxSrc++));
    }
   
    public static void main(String[] args) throws Throwable
    {
        for (int i = 0; i < NUMOBJS; i++)
        {
            aListSrc.add(new MyClass());
        }
        for (int i = 0; i < 10000; i++)
        {           
            bench();
        }
        Thread.sleep(1000);
    }
   
    private static class MyClass
    {
       
    }
}

Version 2:

    public static void bench()
    {
        int idxSrc = 0;
        MyClass myc = aListSrc.get(idxSrc++);
        aListDest1.add(myc);
    }


Here is the PrintAssembly output in i386 (yeah i am on Windows, and having a hard time to build the amd64 version of hsdis :/)
Note: I have use the -XX:-Inline in order to make it more readable. I do noth ink it changes fundamentally the fact that there is a checkcast anyway.

  # {method} 'bench' '()V' in 'com/bempel/sandbox/TestJIT'
  #           [sp+0x20]  (sp of caller)
  0x02529f60: mov    DWORD PTR [esp-0x3000],eax
  0x02529f67: push   ebp
  0x02529f68: sub    esp,0x18           ;*synchronization entry
                                        ; - com.bempel.sandbox.TestJIT::bench@-1 (line 26)
  0x02529f6e: mov    ebx,0x150
  0x02529f73: mov    ecx,DWORD PTR [ebx+0x568c588]
                                        ;*getstatic aListSrc
                                        ; - com.bempel.sandbox.TestJIT::bench@5 (line 27)
                                        ;   {oop('com/bempel/sandbox/TestJIT')}
  0x02529f79: test   ecx,ecx
  0x02529f7b: je     0x02529fb3         ;*invokevirtual get
                                        ; - com.bempel.sandbox.TestJIT::bench@12 (line 27)
  0x02529f7d: xor    edx,edx
  0x02529f7f: mov    ebx,0x154
  0x02529f84: mov    ebp,DWORD PTR [ebx+0x568c588]
                                        ;*getstatic aListDest1
                                        ; - com.bempel.sandbox.TestJIT::bench@2 (line 27)
                                        ;   {oop('com/bempel/sandbox/TestJIT')}
  0x02529f8a: nop
  0x02529f8b: call   0x0250d040         ; OopMap{ebp=Oop off=48}
                                        ;*invokevirtual get
                                        ; - com.bempel.sandbox.TestJIT::bench@12 (line 27)
                                        ;   {optimized virtual_call}
  0x02529f90: mov    ebx,DWORD PTR [eax+0x4]  ; implicit exception: dispatches to 0x02529ff0
  0x02529f93: cmp    ebx,0x568cc10      ;   {oop('com/bempel/sandbox/TestJIT$MyClass')}
  0x02529f99: jne    0x02529fd1         ;*checkcast
                                        ; - com.bempel.sandbox.TestJIT::bench@15 (line 27)
  0x02529f9b: test   ebp,ebp
  0x02529f9d: je     0x02529fc1
  0x02529f9f: mov    ecx,ebp
  0x02529fa1: mov    edx,eax
  0x02529fa3: call   0x0250d040         ; OopMap{off=72}
                                        ;*invokevirtual add
                                        ; - com.bempel.sandbox.TestJIT::bench@18 (line 27)
                                        ;   {optimized virtual_call}
  0x02529fa8: add    esp,0x18
  0x02529fab: pop    ebp
  0x02529fac: test   DWORD PTR ds:0x180000,eax
                                        ;   {poll_return}
  0x02529fb2: ret   
  0x02529fb3: mov    ecx,0xfffffff6
  0x02529fb8: xchg   ax,ax
  0x02529fbb: call   0x0250c700         ; OopMap{off=96}
                                        ;*invokevirtual get
                                        ; - com.bempel.sandbox.TestJIT::bench@12 (line 27)
                                        ;   {runtime_call}
  0x02529fc0: int3                      ;*invokevirtual get
                                        ; - com.bempel.sandbox.TestJIT::bench@12 (line 27)
  0x02529fc1: mov    ecx,0xfffffff6
  0x02529fc6: mov    ebp,eax
  0x02529fc8: xchg   ax,ax
  0x02529fcb: call   0x0250c700         ; OopMap{ebp=Oop off=112}
                                        ;*invokevirtual add
                                        ; - com.bempel.sandbox.TestJIT::bench@18 (line 27)
                                        ;   {runtime_call}
  0x02529fd0: int3                      ;*invokevirtual add
                                        ; - com.bempel.sandbox.TestJIT::bench@18 (line 27)
  0x02529fd1: mov    ecx,0xffffffde
  0x02529fd6: mov    DWORD PTR [esp],eax
  0x02529fd9: xchg   ax,ax
  0x02529fdb: call   0x0250c700         ; OopMap{ebp=Oop [0]=Oop off=128}
                                        ;*checkcast
                                        ; - com.bempel.sandbox.TestJIT::bench@15 (line 27)
                                        ;   {runtime_call}
  0x02529fe0: int3                      ;*invokevirtual get
                                        ; - com.bempel.sandbox.TestJIT::bench@12 (line 27)
  0x02529fe1: mov    ecx,eax
  0x02529fe3: jmp    0x02529fe7
  0x02529fe5: mov    ecx,eax            ;*invokevirtual add
                                        ; - com.bempel.sandbox.TestJIT::bench@18 (line 27)
  0x02529fe7: add    esp,0x18
  0x02529fea: pop    ebp
  0x02529feb: jmp    0x02528440         ;   {runtime_call}
  0x02529ff0: mov    ecx,0xfffffff4
  0x02529ff5: xchg   ax,ax
  0x02529ff7: call   0x0250c700         ; OopMap{ebp=Oop off=156}
                                        ;*checkcast
                                        ; - com.bempel.sandbox.TestJIT::bench@15 (line 27)
                                        ;   {runtime_call}
  0x02529ffc: int3                      ;*checkcast
                                        ; - com.bempel.sandbox.TestJIT::bench@15 (line 27)
  0x02529ffd: hlt   
  0x02529ffe: hlt   
  0x02529fff: hlt   
[Stub Code]
  0x0252a000: mov    ebx,0x0            ;   {no_reloc}
  0x0252a005: jmp    0x0252a005         ;   {runtime_call}
  0x0252a00a: mov    ebx,0x0            ;   {static_stub}
  0x0252a00f: jmp    0x0252a00f         ;   {runtime_call}
[Exception Handler]
  0x0252a014: jmp    0x02527180         ;   {runtime_call}
[Deopt Handler Code]
  0x0252a019: push   0x252a019          ;   {section_word}
  0x0252a01e: jmp    0x0250da40         ;   {runtime_call}
[Constants]
  0x0252a023: .byte 0x0


Version 2:

  # {method} 'bench' '()V' in 'com/bempel/sandbox/TestJIT'
  #           [sp+0x10]  (sp of caller)
  0x025778e0: mov    DWORD PTR [esp-0x3000],eax
  0x025778e7: push   ebp
  0x025778e8: sub    esp,0x8            ;*synchronization entry
                                        ; - com.bempel.sandbox.TestJIT::bench@-1 (line 26)
  0x025778ee: mov    ebx,0x150
  0x025778f3: mov    ecx,DWORD PTR [ebx+0x56dc5e0]
                                        ;*getstatic aListSrc
                                        ; - com.bempel.sandbox.TestJIT::bench@2 (line 27)
                                        ;   {oop('com/bempel/sandbox/TestJIT')}
  0x025778f9: test   ecx,ecx
  0x025778fb: je     0x02577933
  0x025778fd: xor    edx,edx
  0x025778ff: call   0x0255d040         ; OopMap{off=36}
                                        ;*invokevirtual get
                                        ; - com.bempel.sandbox.TestJIT::bench@9 (line 27)
                                        ;   {optimized virtual_call}
  0x02577904: mov    ecx,DWORD PTR [eax+0x4]  ; implicit exception: dispatches to 0x02577970
  0x02577907: cmp    ecx,0x56dcc38      ;   {oop('com/bempel/sandbox/TestJIT$MyClass')}
  0x0257790d: jne    0x02577951         ;*checkcast
                                        ; - com.bempel.sandbox.TestJIT::bench@12 (line 27)
  0x0257790f: mov    ecx,0x154
  0x02577914: mov    ecx,DWORD PTR [ecx+0x56dc5e0]
                                        ;*getstatic aListDest1
                                        ; - com.bempel.sandbox.TestJIT::bench@16 (line 28)
                                        ;   {oop('com/bempel/sandbox/TestJIT')}
  0x0257791a: test   ecx,ecx
  0x0257791c: je     0x02577941
  0x0257791e: mov    edx,eax
  0x02577920: xchg   ax,ax
  0x02577923: call   0x0255d040         ; OopMap{off=72}
                                        ;*invokevirtual add
                                        ; - com.bempel.sandbox.TestJIT::bench@20 (line 28)
                                        ;   {optimized virtual_call}
  0x02577928: add    esp,0x8
  0x0257792b: pop    ebp
  0x0257792c: test   DWORD PTR ds:0x180000,eax
                                        ;   {poll_return}
  0x02577932: ret   
  0x02577933: mov    ecx,0xfffffff6
  0x02577938: xchg   ax,ax
  0x0257793b: call   0x0255c700         ; OopMap{off=96}
                                        ;*invokevirtual get
                                        ; - com.bempel.sandbox.TestJIT::bench@9 (line 27)
                                        ;   {runtime_call}
  0x02577940: int3                      ;*invokevirtual get
                                        ; - com.bempel.sandbox.TestJIT::bench@9 (line 27)
  0x02577941: mov    ecx,0xfffffff6
  0x02577946: mov    ebp,eax
  0x02577948: xchg   ax,ax
  0x0257794b: call   0x0255c700         ; OopMap{ebp=Oop off=112}
                                        ;*invokevirtual add
                                        ; - com.bempel.sandbox.TestJIT::bench@20 (line 28)
                                        ;   {runtime_call}
  0x02577950: int3                      ;*invokevirtual add
                                        ; - com.bempel.sandbox.TestJIT::bench@20 (line 28)
  0x02577951: mov    ecx,0xffffffde
  0x02577956: mov    ebp,eax
  0x02577958: xchg   ax,ax
  0x0257795b: call   0x0255c700         ; OopMap{ebp=Oop off=128}
                                        ;*checkcast
                                        ; - com.bempel.sandbox.TestJIT::bench@12 (line 27)
                                        ;   {runtime_call}
  0x02577960: int3                      ;*invokevirtual get
                                        ; - com.bempel.sandbox.TestJIT::bench@9 (line 27)
  0x02577961: mov    ecx,eax
  0x02577963: jmp    0x02577967
  0x02577965: mov    ecx,eax            ;*invokevirtual add
                                        ; - com.bempel.sandbox.TestJIT::bench@20 (line 28)
  0x02577967: add    esp,0x8
  0x0257796a: pop    ebp
  0x0257796b: jmp    0x02578440         ;   {runtime_call}
  0x02577970: mov    ecx,0xfffffff4
  0x02577975: xchg   ax,ax
  0x02577977: call   0x0255c700         ; OopMap{off=156}
                                        ;*checkcast
                                        ; - com.bempel.sandbox.TestJIT::bench@12 (line 27)
                                        ;   {runtime_call}
  0x0257797c: int3                      ;*checkcast
                                        ; - com.bempel.sandbox.TestJIT::bench@12 (line 27)
  0x0257797d: hlt   
  0x0257797e: hlt   
  0x0257797f: hlt   
[Stub Code]
  0x02577980: mov    ebx,0x0            ;   {no_reloc}
  0x02577985: jmp    0x02577985         ;   {runtime_call}
  0x0257798a: mov    ebx,0x0            ;   {static_stub}
  0x0257798f: jmp    0x0257798f         ;   {runtime_call}
[Exception Handler]
  0x02577994: jmp    0x02577180         ;   {runtime_call}
[Deopt Handler Code]
  0x02577999: push   0x2577999          ;   {section_word}
  0x0257799e: jmp    0x0255da40         ;   {runtime_call}
[Constants]
  0x025779a3: .byte 0x0

For me in conclusion we have in both version the following :

  0x02529f90: mov    ebx,DWORD PTR [eax+0x4]  ; implicit exception: dispatches to 0x02529ff0
  0x02529f93: cmp    ebx,0x568cc10      ;   {oop('com/bempel/sandbox/TestJIT$MyClass')}
  0x02529f99: jne    0x02529fd1         ;*checkcast

which is basically an instanceof So I do not see why the performance will be different os worse in any cases.

Cheers,
Jean-Philippe

Simone Bordet

unread,
Jun 14, 2013, 5:02:56 PM6/14/13
to mechanica...@googlegroups.com
Hi,

On Fri, Jun 14, 2013 at 10:19 PM, Jean-Philippe BEMPEL
<jpbe...@gmail.com> wrote:
> For me in conclusion we have in both version the following :
>
> 0x02529f90: mov ebx,DWORD PTR [eax+0x4] ; implicit exception:
> dispatches to 0x02529ff0
> 0x02529f93: cmp ebx,0x568cc10 ;
> {oop('com/bempel/sandbox/TestJIT$MyClass')}
> 0x02529f99: jne 0x02529fd1 ;*checkcast
>
> which is basically an instanceof So I do not see why the performance will be
> different os worse in any cases.

I admit I am not a JIT guru, but I can't see the header access ?
Seems all registers and constants to me, but then again I'm not sure I
understand the assembly right (or if it is the right spot to look at).

Thanks!
Message has been deleted

Gil Tene

unread,
Jun 14, 2013, 7:01:33 PM6/14/13
to mechanica...@googlegroups.com
The very first instruction in the sequence (mov ebx,DWORD PTR [eax+0x4]) is the load of the klass field from the header.

Simone Bordet

unread,
Jun 14, 2013, 7:09:58 PM6/14/13
to Gil Tene, mechanica...@googlegroups.com


Il giorno 15/giu/2013 01:01, "Gil Tene" <g...@azulsystems.com> ha scritto:
>
> The very first instruction in the sequence (mov ebx,DWORD PTR [eax+0x4]) is the load of the klass field from the header.

Thanks !

Michael Barker

unread,
Jun 14, 2013, 7:34:48 PM6/14/13
to Simone Bordet, Gil Tene, mechanica...@googlegroups.com
BTW, there is a simple (and a little bit evil) way to avoid the checkcast:

static ArrayList<MyClass> aListSrc = new ArrayList<TestJIT.MyClass>();
static ArrayList<MyClass> aListDest1 = new ArrayList<TestJIT.MyClass>();
static ArrayList aListDest2 = aListDest1;

public static void bench()
{
int idxSrc = 0;
aListDest2.add(aListSrc.get(idxSrc++));
}

Obviously you'll get a warning and like most micro-optimisations,
probably pointless for most code. However, it was well encapsulated
and the method in question was at the top of the profiler hot list...

Note, I didn't actually check the assembler, but in the bytecode the
checkcast is removed:

Old:

public static void bench();
Code:
0: iconst_0
1: istore_0
2: getstatic #24 // Field
aListDest1:Ljava/util/ArrayList;
5: getstatic #22 // Field
aListSrc:Ljava/util/ArrayList;
8: iload_0
9: iinc 0, 1
12: invokevirtual #32 // Method
java/util/ArrayList.get:(I)Ljava/lang/Object;
15: checkcast #36 // class TestJIT$MyClass
18: invokevirtual #38 // Method
java/util/ArrayList.add:(Ljava/lang/Object;)Z
21: pop
22: return

Updated:

public static void bench();
Code:
0: iconst_0
1: istore_0
2: getstatic #27 // Field
aListDest2:Ljava/util/ArrayList;
5: getstatic #23 // Field
aListSrc:Ljava/util/ArrayList;
8: iload_0
9: iinc 0, 1
12: invokevirtual #35 // Method
java/util/ArrayList.get:(I)Ljava/lang/Object;
15: invokevirtual #39 // Method
java/util/ArrayList.add:(Ljava/lang/Object;)Z
18: pop
19: return

Mike.
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-symp...@googlegroups.com.

Gil Tene

unread,
Jun 14, 2013, 11:38:10 PM6/14/13
to mechanica...@googlegroups.com, Simone Bordet, Gil Tene
;-)

You can get the same without using generics at all, of course. But even then, when you actually want to manipulate methods or fields of something you got out of a (generic or non-generic) collection, you still cause an inevitable [implicit or explicit] cast and carry the runtime cost of a checkcast. That's sort of the point: generics (due to type erasure and types effectively not being visible at runtime) still leave you with the cost of casting, while the more strongly typed, non-generic object arrays that Java has had from the start, and that don't require runtime casting. This represent one of the key gaps in Java's runtime cost of accessing objects in collections (compared to environments where strong typing is carried through in some way that avoids runtime checkcasts, like those that support reified generics).

The point I was going for at the start of this was that we "wish" (and often assume) that:

ArrayList<MyClass> collection;
MyClass myc;
...
myc = collection.get(index);
myc.myMethod();

would have virtually the same runtime cost as: 

Myclass[] array;
MyClass myc;
...
myc = array[index];
myc.myMethod();

Especially when the get() happens in a loop of some sort. But instead, the first case has a cast of each get, which carries the cost of an extra memory access and quite potentially an extra cache miss per get() call.

Michael Barker

unread,
Jun 15, 2013, 12:22:05 AM6/15/13
to Gil Tene, mechanica...@googlegroups.com, Simone Bordet
Also I suspect that the reason why there used to be a marked
difference between the 2 versions on the original blog is that there
was a bug in javac that meant for version 1 it erroneously omitted the
checkcast byte code (just a guess though). So in up to date versions
of the JDK they're equally bad.

Mike.

Dr Heinz M. Kabutz

unread,
Jun 15, 2013, 1:19:24 AM6/15/13
to Michael Barker, Gil Tene, mechanica...@googlegroups.com, Simone Bordet
Gil, you talk a lot about the cost of the checkcast, but isn't it just
the object load that is killing the performance? Thus this is a
rather contrived example (although fascinating), because in most cases
we will want to do something with an object, which means we'd have to
load the object anyway.
--
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java(tm) Specialists' Newsletter"
Sun Java Champion
IEEE Certified Software Development Professional
http://www.javaspecialists.eu
Tel: +30 69 75 595 262
Skype: kabutz

Gil Tene

unread,
Jun 15, 2013, 1:57:29 AM6/15/13
to Dr Heinz M. Kabutz, Michael Barker, mechanica...@googlegroups.com, Simone Bordet
For any object larger than 1-2 cache lines in size, the checkcast will always cause an extra cache line to be loaded and missed on. Even for trivially sized objects (e.g. Long) you'll end up with the header and body not being in the same cache line one time out of 4.

Sent from my iPad

Gil Tene

unread,
Jun 15, 2013, 2:08:27 AM6/15/13
to Dr Heinz M. Kabutz, Michael Barker, mechanica...@googlegroups.com, Simone Bordet
To clarify: I'm pointing this out in the context of the common "the header isn't being looked at" code patterns. These include practically all field access (either directly or via getter/setter methods), and a large majority of method calls (all the ones that manage to bypass runtime checks, including static or final methods, and all CHA optimized virtual methods). In this context, the checkcast is additional memory access to a line that in many cases (dynamically counting) would not be otherwise looked at.

Sent from my iPad
> You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/ZjCiMHusUV4/unsubscribe.
> To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.

Mani Sarkar

unread,
Jun 15, 2013, 11:37:35 AM6/15/13
to mechanica...@googlegroups.com
Hi Jean-Philippe,

I hope I'm not misreading your example, but the below block of code only adds the first object from aListSrc to aListDest1 each time bench()i s called, as you are setting idxSrc to 0 and incrementing by 1:

    public static void bench()
    {
        int idxSrc = 0;        
        aListDest1.add(aListSrc.get(idxSrc++));
    }

Is that irrelevant here? Don't you intend to all all the 1m objects from aListSrc1 into aListDest1 ? Is this construct serving a different purpose?

Regards,
mani
Reply all
Reply to author
Forward
0 new messages