False sharing, cache line padding and field reordering

755 views
Skip to first unread message

Alexandru Nedelcu

unread,
Jun 27, 2014, 6:14:46 AM6/27/14
to mechanica...@googlegroups.com

Hi all,

I just read a nice article on false sharing by Nitsan Wakart:
http://psy-lob-saw.blogspot.ro/2014/06/notes-on-false-sharing.html?m=1

I asked if maybe those fields added for padding should be volatiles, because from what I understand, fields can be reordered by the JVM. Nitsan replied that the only field reordering fence is inheritance.

Martin Thomson in one of his articles does use volatile for padding. And I couldn’t find a reliable reference to how field reordering works other than looking at samples in blog posts.

The usage of inheritance has left me scratching my head for a bit. Lets take this example:

private volatile long p1, p2, p3, p4, p5, p6 = 7L

long myPaddedVar;

Questions:

  1. do those fields need to be inherited, or does the above work?
  2. do those fields used for padding need to be volatile?
  3. do those fields used for padding need to be public or will private do? This of course can be problematic as unused private fields could get eliminated, but I was under the impression that volatile private fields don’t get eliminated.
  4. the size of a cache line depends on the architecture, with 64 bytes being common for X86-64, but wouldn’t it make sense to assume the length of a cache line to be 128 bytes?

Thanks,

--
Alexandru Nedelcu
www.bionicspirit.com

PGP Public Key:
https://bionicspirit.com/key.aexpk

Alexandru Nedelcu

unread,
Jun 27, 2014, 6:19:18 AM6/27/14
to mechanica...@googlegroups.com
Also to add - I have no idea how to test and ensure that whatever I do doesn't lead to those fields being eliminated or reordered, other than to maybe run a benchmark. If I look at the generated bytecode, is that enough, or can this stuff happen at runtime? If you could give me some tips here, that would be great.

Thanks,

Martin Thompson

unread,
Jun 27, 2014, 6:29:55 AM6/27/14
to mechanica...@googlegroups.com
A lot of the examples you can find online are experiments over the years trying to avoid false sharing. Some are failed experiments as various JVMs changed their header and fields layout. Or just failed due to lack of understanding. I've found it even more variable with the non-Hotspot JVMs. Some early experiments tried using volatile and setting one of the fields to play on memory model semantics.

The only reasonably reliable method I've found to order field groups, not order individual fields within a class, is to use inheritance. For practical layout reasons a base class cannot have its fields reordered with a sub-class fields. Otherwise how would be treat a sub-class as a base-class via a base-class reference.

To discover field layout at runtime you could use Unsafe. A tool that wraps up this approach for you is (Java Object Layout):


If only we could have @Contended available to 3rd party libraries then we could avoid such shenanigans.

Martin...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vitaly Davidovich

unread,
Jun 27, 2014, 7:50:34 AM6/27/14
to mechanica...@googlegroups.com

I see Martin has given you some insight already, but just to answer a few of your other questions:

3) private fields are not eliminated/removed, so padding can stay private.

4) some processors use 2 (adjacent) cache lines as the coherence unit so writing to one line will invalidate the other as well.  So if you wanted to cater to all sorts of architectures and play it safe in general, 128 byte padding would be it; for practical purposes and not to waste memory, 64 bytes should be sufficient.

Sent from my phone

--

Alexandru Nedelcu

unread,
Jun 27, 2014, 8:08:59 AM6/27/14
to mechanica...@googlegroups.com
Thanks a lot Martin and Vitaly, this helps a lot.

It would be cool if @Contented also happened as a compiler plugin for older/current Java versions, otherwise it will be a long way before we'll start using it.

Martin Thompson

unread,
Jun 27, 2014, 8:34:02 AM6/27/14
to mechanica...@googlegroups.com
On 27 June 2014 13:08, Alexandru Nedelcu <al...@bionicspirit.com> wrote:

It would be cool if @Contented also happened as a compiler plugin for older/current Java versions, otherwise it will be a long way before we'll start using it.

Backwards compatibility is an interesting problem. It is good to target these things when other beneficial languages come along.  For example I can see benefits to targeting a Java 8 build that included the use of things like method handles and @Contended if publically available. We can then have a build for Java 8 that makes it compelling over previous versions and encourage migration. Maybe by Java 9 we can have a nice workable alternative to Unsafe and the likes of @Contended could be available at the same time.

Sometimes the reasons we use the likes of Unsafe and other low-level things are subtle. For example, I wanted to avoid the use of Unsafe in the Disruptor but had to give in and use it because of too many concurrency bugs in the AtomicLongFieldUpdater across the different JVMs, Unsafe was more reliable. Many think it was added for performance but that was not the case. Now many of these bugs are fixed but some people are on old JVMs so dropping that support is tricky.

As we go forward with the likes of Unsafe and the like it will be important to consider backwards compatibility if we are to be successful with migration to new JVMs.

Kirk Pepperdine

unread,
Jun 27, 2014, 9:04:29 AM6/27/14
to mechanica...@googlegroups.com
When to migrate.. common problems… at some point you have to cut the strings… but it’s tricky to decide when that point is….

Regards,
Kirk

Martin Thompson

unread,
Jun 27, 2014, 9:07:33 AM6/27/14
to mechanica...@googlegroups.com
It is a really good point that I think I rambled around too much in that last post.

What I meant to say is that it can be the right time when other compelling and related things happen at a language level that are worth targeting together.

Aleksey Shipilev

unread,
Jun 27, 2014, 10:05:18 AM6/27/14
to mechanica...@googlegroups.com
On 06/27/2014 02:29 PM, Martin Thompson wrote:
> The only reasonably reliable method I've found to order field groups,
> not order individual fields within a class, is to use inheritance. For
> practical layout reasons a base class cannot have its fields reordered
> with a sub-class fields. Otherwise how would be treat a sub-class as a
> base-class via a base-class reference.

The synthetic example is here in JOL samples:
http://hg.openjdk.java.net/code-tools/jol/file/b4bc510cbad0/jol-samples/src/main/java/org/openjdk/jol/samples/JOLSample_05_InheritanceBarrier.java

...and practical performance bench in JMH samples:
http://hg.openjdk.java.net/code-tools/jmh/file/251f914ff0c1/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_22_FalseSharing.java

> If only we could have @Contended available to 3rd party libraries then
> we could avoid such shenanigans.

-XX:-RestrictContended :]

-Aleksey.

Darach Ennis

unread,
Jun 27, 2014, 10:06:45 AM6/27/14
to mechanica...@googlegroups.com
-XX:-RestrictContended \o/


Howard Chu

unread,
Jun 27, 2014, 10:06:58 AM6/27/14
to mechanica...@googlegroups.com


On Friday, June 27, 2014 4:50:34 AM UTC-7, Vitaly Davidovich wrote:

4) some processors use 2 (adjacent) cache lines as the coherence unit so writing to one line will invalidate the other as well.  So if you wanted to cater to all sorts of architectures and play it safe in general, 128 byte padding would be it; for practical purposes and not to waste memory, 64 bytes should be sufficient.

Itanium used 128 byte cache lines. In my code we made cacheline size a compile-time conditional because of this but now that Itanium has been abandoned it's less of an issue.

Vitaly Davidovich

unread,
Jun 27, 2014, 10:25:24 AM6/27/14
to mechanica...@googlegroups.com

There's also the case where the line *is* 64 bytes but adjacent line prefetching is used, which will induce false sharing effects on writes within 128 byte region.

Sent from my phone

--

Francis Stephens

unread,
Jun 27, 2014, 10:28:30 AM6/27/14
to mechanica...@googlegroups.com
Vitaly, can you list a few CPU arches which employ this technique?
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,
Jun 27, 2014, 10:38:56 AM6/27/14
to mechanica...@googlegroups.com
I think some intel chips do it: https://software.intel.com/en-us/blogs/2009/08/24/what-you-need-to-know-about-prefetching.  Specifically, 3rd paragraph (emphasis mine):

Hardware prefetching is implemented by your processor and will be different depending on which processor you use. Most recent Intel processors have several different hardware prefetchers. The Core™ i7 processor and Xeon® 5500 series processors, for example, have some prefetchers that bring data into the L1 cache and some that bring data into the L2. There are also different algorithms – some monitor data access patterns for a particular cache and then try to predict what addresses will be needed in the future. Others use simpler algorithms, such as fetching 2 adjacent cache lines. The pattern matching and detection algorithms used by the set of hardware prefetchers on the Core i7 and Xeon 5500 is improved from our last generation, and we continue to optimize these algorithms with each new processor architecture. 

I believe this is part of the reason @Contended pads to 128 bytes, despite most common architectures (at least Hotspot supported ones) using 64 bytes.

Alexey may have more details on this.


To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Nitsan Wakart

unread,
Jun 27, 2014, 11:09:06 AM6/27/14
to mechanica...@googlegroups.com
Adjacent cache lines is a problem I hit at some point but never got round to blogging about. As Vitaly points out it make 64b cache lines act like 128b and so in my code I pad to 128. There is waste in memory here, but if the padding is not fetched than it is not hurting the CPU cache utilisation more than normal padding.

The post links to previous post on the memory layout topic:
http://psy-lob-saw.blogspot.com/2013/05/know-thy-java-object-memory-layout.html
Which covers the motivation for using the inheritance method and demonstrate use of JOL to discover object layout. A worth while exercise is running JOL for same object on several configurations e.g. 32bit/64bit+-compressedOops(In my latest play with JOL I also tried out the experimental estimator… it is still experimental, you can't rely on it being the same as actually testing in the different modes).

JOL has evolved since and now also lets you print out the graph layout in memory of a given object. A recommended exercise to all who believe that object allocated together (i.e. in program order) end up next to each other in memory ;-).

-XX:-RestrictContended is great, but not something I would rely on for library code unless you mean to discover the flag at runtime and throw/log an error when all your padding is gone.
Thanks for reading :-)

Martin Thompson

unread,
Jun 27, 2014, 1:26:48 PM6/27/14
to mechanica...@googlegroups.com
Unfortunately you cannot rely on this if producing a library. The default should have been reversed for usability. It is a shame the "security" reason for this was not shared. Security via obscurity is not a great way to go.
Reply all
Reply to author
Forward
0 new messages