StructuredArray and ObjectLayout

2,565 views
Skip to first unread message

Gil Tene

unread,
Jul 18, 2013, 3:09:59 AM7/18/13
to mechanica...@googlegroups.com
Martin and I have been working together on something we've been calling a StructuredArray, which is part of an  "ObjectLayout" project which is now up on github. I think the concept is becoming mature enough to have other people start throwing mud at it. So Here goes...

First, a bit of background:

StructuredArray originated with a friendly argument Martin and I were having some time in 2012. Martin was rightfully complaining about Java's lack of ability to represent the common "array of structs" data layout available in C/C++/C# and other languages, and the inefficiency of array objects and all other available collections that directly arises from both indirection costs and the lack of striding pattern to stream on. Martin was advocating the common position that there is a need for some form of structs (and arrays of them) to be added into the language as first class citizens. I was taking the contrarian view (as I often do for fun), claiming, "instinctively," that there has to be a way to address this problem through carefully stated semantics captured in regular "vanilla" Java class form, which can then be intrinsified by future JDK versions, but would not require ANY language changes. After all, java.util.concurrent and things like AtomicLong evolved in exactly that way in the pre Java 5 days. My take was that rather than a new struct thing, we can get regular java objects to play the needed role, and that the actual solution needs to come in the form of restricting semantics (through carefully selected Java class contracts), and not of expanding them with new language features or JVM specs.

Somehow, we both decided to actually explore this notion. Martin started off by capturing StructuredArray in his code examples code area on github, and I went off and brainstormed with other Azul engineers on how a JVM would/could intrinsify something that would provide the needed value. Together, we evolved StructuredArray over several months, discovering fundamental issues that needed to be worked around as we went, as well as fundamental properties that make StrcuturedArray much more powerful and useful than an array of structs could ever be (IMHO).

For example, we deduced early on that a StructuredArray should not have a public constructor, with instances created with static factory methods instead. This distinction is driven by a fundamental issue: In Java, all constructed objects have their allocation site (the new bytecode) occur before their initialization and construction code ever sees it's parameters. This would make it "hard" to intrinsify the construction of a variable sized object like a StructuredArray, as the needed allocation size is not known at the "new" time. A factory method, on the other hand can be easily be intrinsified such that allocation can consider the parameters that control the array size.

We further discovered that with first-class Java objects as members, liveness logic can be made significantly different from array-of-structs, and that this Java-natural liveness behavior has an elegance level that makes things "click" together elegantly (I credit Michael Wolf with coming up with that one). In arrays-of-structs-like things, the container must remain logically "live" as long as any logical reference to a member of it exists. This is necessary since "structs" are usually not full-blown objects, and most struct-supporting environments lack the ability to track individual struct liveness. However, when container members are full blown objects, they can also have (and actually "prefer" to have) individual liveness, which means that there is no need for live member objects to implicitly keep their containers alive. This realization not only makes an intrinsic implementation much simpler, it also makes StructuredArray much more generically useful. E.g. a member of a StructuredArray can also be placed in a HashMap or some other collection, and there is no new mess to deal with if it does.

At some point I started feeling that this ""Challenge Accepted!" experiment was showing enough promise that we may actually build something based on it. We moved  the code to the newly formed ObjectLayout project on github, and kept hacking at it. We added multi-dimensional support. We added generic construction support for member objects (so that e.g. even objects with final fields can be natural members of StructuredArrays), we went back and forth on various implementation details. 

StructuredArray was made part of a project we called ObjectLayout because it is not the only example of an intrinsically optimized object layout opportunity. You can think of StructuredArray as the first of three (so far identified) forms of object layout abstractions that can be useful for intrinsically improving memory layout in Java, all without requiring any language changes. The other two forms, not yet captured or completely figured out yet, are "structs with structs inside" (Objects with final object members that are initialized with a factory call) and "struct with array at the end" (Objects that inherit from arrays). For now, I'm focusing on StructuredArray both because I think it's the most useful form, and because [I think] I know exactly how to make a JDK intrinsify it, both in Zing and in future OpenJDKs.

My intent is to mature the vanilla definitions of StructuredArray as open source code, and to demonstrate their value with both fully supported Zing JDKs and some experimental vanilla-OpenJDK derivatives that would include intrinsic StructuredArray implementations that would lay it out as a flat "array of object structures" in the heap while maintaining the exact same class contract as the vanilla implementations will. When we can show and demonstrate the value of doing things "this way", and after working out the kinks, I would hope to drive it into some future Java SE version (10?) by properly upstreaming it into the OpenJDK project.

Before you jump in and look through the JavaDoc and code, it's probably important to note what ObjectLayout is about, and what it is NOT about:

ObjectLayout is focused on in-heap, POJO objects. It's goal is to allow normal Java objects and classes (including existing classes) to benefit from optimized heap layout arrangements, without restricting their ability to participate in other object graphs (both by referring to other objects, and by being referred to from other objects), and without restricting their ability to act as first-class citizens and be passed around to various libraries that would expect that ability (meaning that participating object can be still be synchronized on, for example, and can still have identity hash codes).

The ObjectLayout "vanilla" Java class implementations (such as the current StructuredArray) are NOT intended to provide the same performance and layout benefits as their eventual intrinsic implementations will. Instead, they are intended to provide a fully functional implementation of the contract represented by the class, such that code will run portably across all (Java SE 6 and above) JDKs, while being able to "scream" in newer JDKs that would recognize them and intrinsify them (think of AtomicLong running on Java SE 5 vs. "vanilla" Java 1.4.2).

ObjectLayout does NOT make any attempt to deal with the various needs for off-heap structured access. It is orthogonal to other work done for that purpose.

ObjectLayout does NOT try to pack memory or save memory. The way I see it, memory is super-cheap and abundant. To the point where our real problem is finding good creative ways to waste more of it in order to gain real benefit (not that ObjectLayout wastes much memory). So instead, ObjectLayout is focused on reducing indirection, improving locality, and making regular striding patterns exposed to hardware (or software) prefetchers.


So here it is.... Have at it. Comments (both sane and otherwise) are welcome: 


-- Gil.

Chris Vest

unread,
Jul 18, 2013, 4:23:43 AM7/18/13
to mechanica...@googlegroups.com
It sounds like there is some overlap between this, and the musings John Rose have been doing on "Arrays 2.0":

Interesting none the less. I can't say which approach is best, or if they both fit together nicely.



-- Gil.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sand Stone

unread,
Jul 18, 2013, 4:31:16 AM7/18/13
to mechanica...@googlegroups.com
This is cool hacking. 

For C#, besides struct or value type support, there is also a supported way to do "union" (http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute.aspx). Struct without union is less useful.  Personally I am not a fan of multi-dimentional arrays. Of course union with an array address will bring us into the "unsafe" world of off-GC heap. 

There are other problems with proper array style indexer support in Java:  [] could not be overloaded (C# does support), and of course the "int" array size in the "big memory" era (C# has the same problem). 

Talking about memory, it would be cool at the VM level to support virtual allocated array where the memory is committed on access or usage. 



-- Gil.

--

Gil Tene

unread,
Jul 18, 2013, 4:37:37 AM7/18/13
to mechanica...@googlegroups.com
There is certainly some overlap. We cover some similar goals and problems, and John does a great and very comprehensive job describing the various existing needs and wishes people have for better array support. E.g. he covers off heap needs, sparse array needs, variations on multi-demisionality, and many other things. There is also the IBM PackedObjects work (e.g.  http://www.slideshare.net/mmitran/ibm-java-packed-objects-mmit-20121120 ).

We probably differ in both scope and approach in several ways. E.g. ObjectLayout is looking to address in-heap (only) needs with something that requires absolutely no language changes, and can be run on older JVMs by adding some library classes (and hopefully and eventually to Java SE as a core library). It is also looking to completely avoid the notion of lighter-than-object structs, and leaves object mutability decisions to the individual member class chosen by the user.

Time (and work) will tell how these things fit together, and which way makes more sense where they contradict...

Martin Thompson

unread,
Jul 18, 2013, 4:58:11 AM7/18/13
to mechanica...@googlegroups.com
Gil might know a JVM that does this :-)

Gil Tene

unread,
Jul 18, 2013, 5:09:01 AM7/18/13
to
For C#, besides struct or value type support, there is also a supported way to do "union" (http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute.aspx). Struct without union is less useful.  Personally I am not a fan of multi-dimentional arrays. Of course union with an array address will bring us into the "unsafe" world of off-GC heap. 

The fact that this is an array of actual objects (and not of structs) changes things. And not just in small subtle ways.

For example, since members are proper objects, the equivalent of the C/C++/C# union part is quite simple. Member object contents access can (and probably should usually) be done with accessor methods, and your accessor methods can decide to interpret and manipulate the member object state in any way you want, including overlapping meanings (very useful for e.g. packet headers for varying protocols) or simply conserving space. 

Similarly, since sub-dimensions (array.getSubArray()) are also proper StructuredArray objects (even though they can be intrinsicly embedded in other StructuredArrays with no indirection), you can do things like include rows of an array in other collections while still maintaining optimal (flat) array layout even for multiple dimensions. Again, something you can't quite do with arrays of structs.

There are other problems with proper array style indexer support in Java:  [] could not be overloaded (C# does support), and of course the "int" array size in the "big memory" era (C# has the same problem). 

As you can see, StructuredArray provides semantic support for long indexes. We're not trying to mess with language features and certainly not with operator overloading, focusing on defined functionality instead. StructuredArray<T> is simply a generic container class with accessor methods T t = array.get(x);

Talking about memory, it would be cool at the VM level to support virtual allocated array where the memory is committed on access or usage. 

We haven't really tried to define or capture the various sparse and on-demand-allocation representations some people in the HPC world often use. John Rose's Arrays 2.0 talk covers variations of those in some detail. However, for such sparse array variations to include proper object members (that can point to other objects, for example) would introduce "interesting" practical problems we would still need to think a lot about. Maybe a fourth useful form when we figure it out.

Rüdiger Möller

unread,
Jul 18, 2013, 11:46:24 AM7/18/13
to
Well, I did not change my mind that much since last discussion ..

I think your API is useful as long no VM-backed "struct" equivalent is present in Java. Since the introduction of native "struct" support is urgently necessary i think we will get this some day.

So here is my piece of mud:

  • the API patches a problem, which would not be evident when introducing real structs
  • One has to rely heavily on VM magic, however in the high performance arena we need exact control
  • According to my benchmarks, its not only the layout of memory, but also the need for lots of dereferencing which hurts cache locality.
    I have implemented a byte-code-instrumentation backed struct emulation (it still lacks documentation+sanity checks etc., so its unreleased) 
    https://code.google.com/p/fast-serialization/wiki/StructsIntroduction
    if you look at the benchmarks 
    https://fast-serialization.googlecode.com/files/structiter.html you'll see, that using (pseudo)pointers brings the real performance kick, as it is not necessary to access different memory locations in order to iterate a data structure or parts of it. In fact, some benchmarks using structs are up to 3 times faster than their on heap equivalent, mostly because one can reduce the need of dereferencing using (safe)pointers. The slowness of some "offheap" (structs) access benchmarks is due to the insane code generated behind the scenes to relocate the actual memory an object is using.
    Its not meant as a general solution (would require VM backing), its just a giant hack to overcome the lack of structs in java.
As far I understood your API lacks the following advantages compared to a real struct language extension:
  • no interprocess communication/message encoding by just "copyMemory" (this also requires kind of from void* cast like "MyStructPointer p = (MyStructPointer)bytes[]")
  • no reduction of the number of object references. Using the struct approach, embedded objects are 2cnd class regarding GC. It should be possible to make this manageable like c# did. Even if OpenJDK would have Zing's C4 collector, a program still would profit from a reduction of allocation/GC work
  • no deterministic memory layout to work with, as each VM may choose a different representation.
  • no rewriting of embedded objects
However as long there is no struct support your design proposal looks like a useful approach. So if we can't have the cake, we'll probably embrace your cookie ;-)

Martin Thompson

unread,
Jul 18, 2013, 1:34:13 PM7/18/13
to mechanica...@googlegroups.com
Arrays of structures can mean many things to many people.  For me what is missing from Java is arrays of types that are not just primitives.  If structs are added to the language, like in C#, then structs are just one of the types of arrays that can be useful.  Arrays of objects are potentially much more useful for data structures.  For example with an array of objects laid out in a contiguous fashion I could build a much better performing version of HashMap that avoids layers of indirection and strides for access.  I could also build B/B+/B* trees that exhibit better locality per node and less indirection.  To really get the benefits our StructuredArray needs to be reified.  Normal Java does not allow for this because Generics are a compromise, by choosing erasure, over what they could have been.  However Gil has some nice ideas about how he can effectively reify many types of collection at runtime by tracking the most specific type that gets inserted.  The very act of thinking about how to make this work is suggesting a whole bunch of benefits to existing code patterns that goes beyond our initial goals which is super nice.

See embedded notes below:

On Thursday, July 18, 2013 4:28:08 PM UTC+1, Rüdiger Möller wrote:
Well, I did not change my mind that much since last discussion ..

I think your API is useful as long no VM-backed "struct" equivalent is present in Java. Since the introduction of native "struct" support is urgently necessary i think we will get this some day.

  • According to my benchmarks, its not only the layout of memory, but also the need for lots of dereferencing which hurts cache locality.
  • I have implemented a byte-code-instrumentation backed struct emulation (it still lacks documentation+sanity checks etc., so its unreleased) 
    https://code.google.com/p/fast-serialization/wiki/StructsIntroduction
    if you look at the benchmarks 
    https://fast-serialization.googlecode.com/files/structiter.html you'll see, that using (pseudo)pointers brings the real performance kick, as it is not necessary to access different memory locations in order to iterate a data structure or parts of it. In fact, some benchmarks using structs are up to 3 times faster than their on heap equivalent, mostly because one can reduce the need of dereferencing using (safe)pointers. The slowness of some "offheap" (structs) access benchmarks is due to the insane code generated behind the scenes to relocate the actual memory an object is using.
    Its not meant as a general solution (would require VM backing), its just a giant hack to overcome the lack of structs in java.
Having this memory layout removes dereferencing in the intrinsified case and a simple bump the pointer calculation is all that is required for iteration.  You get the benefits of off-heap contiguous layout with full support of in-built garbage collection.
 
As far I understood your API lacks the following advantages compared to a real struct language extension:
  • no interprocess communication/message encoding by just "copyMemory" (this also requires kind of from void* cast like "MyStructPointer p = (MyStructPointer)bytes[]")
This is explicitly outside the scope of what we are looking at as Gil highlighted.
 
  • no reduction of the number of object references. Using the struct approach, embedded objects are 2cnd class regarding GC. It should be possible to make this manageable like c# did. Even if OpenJDK would have Zing's C4 collector, a program still would profit from a reduction of allocation/GC work
Provided a type in the array has no onward references it would be an equal amount of work.  When the types are known with some light reification then this is a trivial optimisation. 
  • no deterministic memory layout to work with, as each VM may choose a different representation.
 The layout is deterministic for the array.  We are not at this stage looking for deterministic layout within the type.  The goal is to design better on-heap data structures and not focus on native data exchange which the IBM Packed Objects are doing a good job on.
  • no rewriting of embedded objects
We do plan to look at how objects can be co-located/embedded as a future extension.  That is why we are calling the project ObjectLayout and not just StructuredArray.
 

Rüdiger Möller

unread,
Jul 18, 2013, 3:16:43 PM7/18/13
to
I hope my post was not kind of offending, i am not a native speaker .. well at least Gil called for mud ;-)

I read that off-heaping and struct-emulation are out-of-scope, however I feel there could be a solution which covers most of the issues i listed, so a scope extension might be a good thing as its a pretty similar problem domain.
On point puzzles me: You highlight, that embedded objects are fully GC'ed 1st class objects. I don't think this is an advantage in general. For like 90% of data structures, when the "containing" object gets collected, all Strings and other embedded Objects also are likely to get collected. Their life-time is bound 99% of time with the lifetime of the container object. 
So the idea to *not* track embedded objects with GC (except outgoing refs ofc) could reduce the amount of GC work massively for a lot of applications. Since memory is cheap, we can live with the rare case, that an outside reference to a "subobject" keeps the "container" object alive. This way the "real" GC'ed object graph of an application is effectively reduced by up to 90% compared to vanilla allocation resulting in much less GC overhead.

I can emulate this with my struct library (linked in previous post) by allocating a single byte array for each "major" object (e.g. Trade or Contract) + an access wrapper. The byte array is then freed as soon no reference to the major object or its embedded object instances exists. However since the object graph was reduced massively by this kind of data representation (which behaves exactly like a normal heap object except "identityHashCode"-issues) GC overhead was reduced by a factor of >10 compared to a vanilla allocation solution. I still get GC, but the GC operates on a less fine grained object granularity. 

BTW: packed objects is a proprietary extension to the ibm vm, that might not help solve real issues anytime soon ..

Sand Stone

unread,
Jul 18, 2013, 4:43:49 PM7/18/13
to mechanica...@googlegroups.com

> Member object contents access can (and probably should usually) be done with accessor methods, and your accessor methods can decide to interpret and manipulate the member object state in any way >you want,
If I understand it correctly, you mean 
             class/struct AcmeData {
                   byte t; /* long, double, subarray, ... */

                    //union {
                   long  data1; 
                   double data2; 
                   AcmeData data3; 
                   // }

                   //accessor methods
                   byte getType() { return t;}; 
                   long getLong() { return data1; } 
                   double getDouble() { return data2; }
                   ...
             }
Without the actual union/overlapping, besides complicating the wire transfer story, the above struct tripled the memory. When one has 1M/B such elements, the "wasted extra" memory is not a small charge anymore. 




On Thu, Jul 18, 2013 at 1:59 AM, Gil Tene <g...@azulsystems.com> wrote:
For C#, besides struct or value type support, there is also a supported way to do "union" (http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute.aspx). Struct without union is less useful.  Personally I am not a fan of multi-dimentional arrays. Of course union with an array address will bring us into the "unsafe" world of off-GC heap. 

The fact that this is an array of actual objects (and not of structs) changes things. And not just in small subtle ways.

For example, sine members are proper objects, the equivalent of the C/C++/C# union part is quite simple. Member object contents access can (and probably should usually) be done with accessor methods, and your accessor methods can decide to interpret and manipulate the member object state in any way you want, including overlapping meanings that are very useful for overlapping meanings (e.g. packet headers for varying protocols) or simply conserving space. 

Similarly, since sub-dimensions (array.getSubArray()) are also proper StructuredArray objects (even though they can be intrinsicly embedded in other StructuredArrays with no indirection), you can do things like include rows of an array in other collections while still maintaining optimal (flat) array layout even for multiple dimensions. Again, something you can't quite do with arrays of structs.

There are other problems with proper array style indexer support in Java:  [] could not be overloaded (C# does support), and of course the "int" array size in the "big memory" era (C# has the same problem). 

As you can see, StructuredArray provides semantic support for long indexes. We're not trying to mess with language features and certainly not with operator overloading, focusing on defined functionality instead. StructuredArray<T> is simply a generic container class with accessor methods T t = array.get(x);
Talking about memory, it would be cool at the VM level to support virtual allocated array where the memory is committed on access or usage. 
We haven't really tried to define or capture the various sparse and on-demand-allocation representations some people in the HPC world often use. John Rose's Arrays 2.0 talk covers variations of those in some detail. However, for such sparse array variations to include proper object members (that can point to other objects, for example) would introduce "interesting" practical problems we would still need to think a lot about. Maybe a fourth useful form when we figure it out.

On Thursday, July 18, 2013 1:31:16 AM UTC-7, Sand Stone wrote:

--

Gil Tene

unread,
Jul 18, 2013, 6:15:47 PM7/18/13
to mechanica...@googlegroups.com
No. I meant something like this:

class AcmeData {
    byte t; /* data1, data2, data3... */

    long opaqueLongData0;
    long opaqueLongData1;
    //union {
    //    long  data1; 
    //    double data2;
    //    struct {
    //      int subData3;
    //      short subData4;
    //      short subData5;
    //      float subData6;
    //      int   subData7;
    //    } data3;   
    // }

    //accessor methods
    byte   getType()   { return t;}; 
    long   getData1()  { return UnionUtils.dataAsLong(opaqueLongData0); } 
    double getData2()  { return UnionUtils.dataAsDouble(opaqueLongData0); }
    int    getSubData3 { return UnionUtils.dataAsInt(opaqueLongData0, 0); }
    short  getSubData4 { return UnionUtils.dataAsShort(opaqueLongData0, 2); }
    short  getSubData5 { return UnionUtils.dataAsShort(opaqueLongData0, 3); }
    float  getSubData6 { return UnionUtils.dataAsFloat(opaqueLongData1, 0); }
    int    getSubData7 { return UnionUtils.dataAsInt(opaqueLongData1, 1); }
}

You get a nice external API to what is an effective union from a storage, access, and performance point of view.

UnionUtils [not shown here] can be built very simply using shifts/ands and safe type conversions where needed (e.g. Double.longBitsToDouble()), and (if done carefully) can get nicely JIT compiled down to the access ops you want (same access ops that a C struct/union would have ended up with). It's all very contained (it's a union, after all, not an array) and is not visible outside of the class. Alignment tendencies make it so any one field tends to only fits into a single opaqueLongDataX member, keeping the code simple.

I didn't do a sub-object because I don't know how to. I think that the sub-object thing (data3) you tried to represent below is something you can't do with a union either, i.e. I can't think of a reasonable way to have a union of a reference and scalar data regardless of data layout (or of two references for that matter).

While the internal state here is just as packed as hand-laid C struct/union would be, lets be clear that there is waste here, and I'm not trying to hide it. Since this is an object and it will have a header of some sort. In a StructuredArray. each member "wastes" header space (1 or two words depending on JVM) that creates a gap between elements in the array. I just don't see that as a big deal given the benefits involved, and when comparing to alternatives for on-heap representation of flat arrays of things.

Sand Stone

unread,
Jul 18, 2013, 9:47:51 PM7/18/13
to mechanica...@googlegroups.com
Thanks for the clarification. 

I thought long and hard about your struct below a couple of years ago for my problem space.  The issue is that one has to know the size of an object in advance, so it does not have the dynamic nature that I would need. 

In pseudo C, 
        struct Acme {
           char t; 
           union {
               long data1; 
               double data2;
               char data3[1];  //this allows me to put a variable size object 
           } 
        }
I thought it's impossible to do this in Java (even C# is not flexible enough), until I learned about Unsafe. I never looked back :-) 

In the Python world, one has something like Cython to allow one writes "unsafe but high performance" libraries. Why not Java? 



Gil Tene

unread,
Jul 18, 2013, 10:23:48 PM7/18/13
to <mechanical-sympathy@googlegroups.com>, mechanica...@googlegroups.com
This is what I (and others probably) call the "struct with array at the end" form. It is one of the three use forms we've identified for ObjectLayout so far.

The classic and semi-obvious use case is String representation, but construction semantics may come in the way of supporting that one transparently. A high performance string storage class I with slightly different semantics tha the current one is also possible, and other commonly identified use cases are messages and packets of all sorts. I'm sure many other example exist.

At present, the vanilla-java representation I have in mind for this form would involve a subclassable array class. The vanilla (non optimal layout) implementation is not hard, but we are still wrestling with semantics, like the wish to avoid public constructors contradicting with the wish to represent this as subclassable arrays. But I think we'll find our way through it.

For the intrinsic layout in an optimized JVM (a future Zing and a future Vanilla-OpenJDK based enhanced thing) I'm thinking of the array and the fields being on two opposite sides of the headers in layout (other runtimes have done that before). Such a layout would allow subclassing support on the presence of a variable length array...

But I'm also toying with a non-subclassable form that may be more straightforward to implement intrinsically...

Sent from Gil's iPhone
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/9PNuQKuWVa4/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.

Sand Stone

unread,
Jul 19, 2013, 12:57:23 AM7/19/13
to mechanica...@googlegroups.com

>"struct with array at the end" ... this form would involve a subclassable array class.
Hmm, this (partial) struct could be a scalar or an element of a container (could be recursive). Decades ago, it's called multi-list. 

Such a data structure with lots of elements (of different types) would be really hard for GC to walk them efficiently. Manual ref counting is better. 

Anyway, I used a different approach. I am open to see what you and Martin will come up with. Happy hacking.   

Kirk Pepperdine

unread,
Jul 19, 2013, 4:07:50 AM7/19/13
to mechanica...@googlegroups.com
Hi all,

Putting forth the argument that C#, the garbage can of language ideas, isn't one that I subscribe to easily. In my not so humble opinion, Java has suffered too long from inappropriate mixing of different ideas and responsibilities between the language, the class libraries, and the JVM. For example, the mixing of primitives with object exposes the inner workings of the JVM in a way that has complicated the language and resulted in other problems. We tried fixing these problems with generics and autoboxing, two steps in the wrong direction and look at the increase in complexity and problems that came with those decisions/implementation. And now we're talking about adding Struct's to continue futzing up the the line between the programmers API or what the classes are responsible for with what the JVM is responsible for. Again, we seem to be looking at a problem that needs to be solved yet the steps we're taking are moving us towards more complexity, IOWs, a step in the wrong direction. These steps, if not taken carefully, will start to tie specific implementations to specific hardware platforms and result in unnecessarily complex implementations which are currently impossible for most developers to debug. The ideas put forward are old ideas that were great for the languages in which they were developed but all in all, I don't believe they've been good ideas for Java.

When I was working at GemStone we also had a huge need to control of locality that didn't come naturally with in Smalltalk or Java runtimes like it did with DB technologies. To work within the constraints set Smalltalk and then Java we used a special class known as ClusterBucket. The class has special support built into the run time to ensure things were packed in way that made sense for the hardware we were running on. In other words, there was a good separation of responsibilities between the class hierarchy, application code and the JVM.

Sorry, this email deserves more time than I can give to it at the moment but I think we should be looking towards a Java solution to this problem and not a C++, C# etc.. solution.

Regards,
Kirk


Nitsan Wakart

unread,
Jul 19, 2013, 5:21:54 AM7/19/13
to mechanica...@googlegroups.com
>I can't think of a reasonable way to have a union of a reference and scalar data regardless of data layout (or of two references for that matter).

Funny you should say that. I recently read a paper on a queue implementation requiring this trick exactly where you have a union of adjacent data and pointer and you want to CAS both in on op (assume address is compressed -> fits in an int):
class Node{
    private static final longDATA_OFFSET;
    static {
        // a crude way of finding if I can fit a ref into an int
        int refScale = UnsafeAccess.UNSAFE.arrayIndexScale(Object[].class);
        if(refScale != 4){
            throw new RuntimeException("this is not going to work...");
        }
        try {
            DATA_OFFSET = UnsafeAccess.UNSAFE.objectFieldOffset(Node.class.getDeclaredField("data"));
        } catch (NoSuchFieldException e) {
            throw new RuntimeException();
        }
    }
    long data;
    Object object(){return UnsafeAccess.UNSAFE.getObject(this,DATA_OFFSET); }
    int counter(){return UnsafeAccess.UNSAFE.getInt(this,DATA_OFFSET+4);}
    void object(Object o){UnsafeAccess.UNSAFE.putObject(this,DATA_OFFSET,o); }
    void counter(int counter){UnsafeAccess.UNSAFE.putInt(this,DATA_OFFSET+4,counter);}
}

Rüdiger Möller

unread,
Jul 19, 2013, 6:47:58 AM7/19/13
to mechanica...@googlegroups.com
Kirk,

I agree on being very reluctant regarding language extensions, the convenience added by Generics wasn't worth the complexity and quirks resulting from it, bad trade. However the existence of primitive types in java was the reason why (and many others) started programming Java. I have not seen any VM implementation achieving reasonable performance using 1st class Objects for numbers as Smalltalk did. A language is used to create applications solving real-world problems, so it is perfectly reasonable to make concessions to the metal at language-design level. If we had to switch to C in order to do some realtime calculations (at least at half C-speed), a lot of Java applications would not exist.  

My major concern with the (well engineered without doubt) proposal of Gil is, that we end up having several cluttered solutions for a single major problem domain. One solution for On-Heap locality, one API for heap-offloading/GC-friendly data storage, one for fast (yes: hardware dependent) IPC.

To sum it up:
  • GC still is a major issue (also for client side Java, which is in fact not that dead, at least in Europe many in-house/mission critical client-applications are Java). Its not only the financial industry, there is an "Uber-trend" of "Going Realtime". Batch Processing systems are moved to realtime, query based application patterns are moved to realtime updated.
    => There is a need to represent mostly static (e.g. reference) data in a GC-neutral and convenient way. In nearly all my Java-projects there was the need to "hide" reference or other mass data from the VM (e.g. in byte[]) in order to reduce GC pauses/overhead.
  • Modern hardware architecture requires control over locality and memory access patterns (That's what Gil addresses) in order to perform well. We all know there is no such thing like "fast enough".
    => there is a need to control location and alignment of data structures (if this can be achieved by VM-magic only or if there is need for explicit hints is beyond my competence)
  • Growth in system size and increasingly 24x7 uptime requirements raise the need for fast inter process communication (hot failover, hot-update, distributed systems). Its crazy that I am hardly able to saturate an Infiniband network, because my Java apps spend 90% of the day encoding and decoding messages (the other 10% are GC ;-) ).
    => There is a need to interchange structured data quickly without the need for explicit encoding and decoding. This can be hardware dependent, as clusters are homogenous regarding hardware most of the time.
I think a "struct" oriented solution (which is not necessarily a language change, but might still require some VM backing) can solve all these issues. In my probably-somewhat-but-not-completely-humble opinion, solutions addressing only parts of the issue are bad long term from a Java-platform point of view.

regards,
Rüdiger  
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Nitsan Wakart

unread,
Jul 19, 2013, 6:51:00 AM7/19/13
to mechanica...@googlegroups.com
2 cents/JVM wishlist:
This falls under "stuff most developers using Java could care less about" and for them I agree with Gil's approach and in part with what Kirk is saying, namely we need to make it possible for the JVM to pick this up and fix it. The end result should look and smell and dance like objects, but the JVM should be able to cleverly workout locality and win the performance for us. I'm not holding my breath.
A middle ground can be found in the form of memory layout hints as annotations:
@Contended - coming in JDK8, but not for public consumption apparently
@Aligned - not coming :(. But would be great help if it was so I could ask the JVM to allocate my class aligned to the cache line. This is a pain for the JVM as requires collaboration with the GC mechanism.
@Inlined - not coming :(. Would serve to embed/flatten data structures into one another. Would cover array at the end of class, but also embedding classes into each other. In particular embedding types into arrays.
@FixedFieldOrder - not coming :(. Would server to stop the JVM from reordering fields, which is a part of what people want from structs.
If we had the above the need for structs/value types would decrease significantly. The nice thing about annotations is that they are just hints. If your JVM doesn't do anything with them, stuff still works. 
I can see how a solution like StructuredArray delivers some of the above and hides the fluff behind a factory method, and that would be very cool indeed.


________________________________
From: Kirk Pepperdine <ki...@kodewerk.com>


> And now we're talking about adding Struct's to continue futzing up the the line between the programmers API or what the classes are responsible for with what the JVM is responsible for. 

> ... we used a special class known as ClusterBucket. The class has special support built into the run time to ensure things were packed in way that made sense for the hardware we were running on. 

On 2013-07-18, at 7:34 PM, Martin Thompson <mjp...@gmail.com> wrote:

> For me what is missing from Java is arrays of types that are not just primitives... Arrays of objects are potentially much more useful for data structures. 

Peter Lawrey

unread,
Jul 19, 2013, 7:42:36 AM7/19/13
to mechanica...@googlegroups.com
Since we are talking about wish lists, I would add both struct like capabilities.  Ideally without changing the syntax, but I think that really is wishful thinking.  Also I would like to see the const keyword support immutable access. It is already a keyword, it just isn't valid to use.


Martin Thompson

unread,
Jul 19, 2013, 8:31:52 AM7/19/13
to mechanica...@googlegroups.com
I'm with you on the quality of GC we get from the standard JVMs.  You should try other runtimes like, Mono, Javascript, PHP, Python, etc. to see how GC can be even worse! :-)

I'm not sure your "struct" solves all the issues.

Take IPC for example.  Rather than introduce structs, add putOrderedX(), compareAndSetX(), and getAndAddX() atomics to ByteBuffer and now you have everything you need for a IPC implementation using memory mapped files and it is pure Java with no language changes.  For IPC we need to extend the memory model.  It is not about structures.  Just map a flyweight over the buffer to access your structures.  This would be a trivial addition to the core libraries.  I've been meaning to get a discussion going on this on the concurrency-interest list.

I get to profile a lot of applications and have to agree that encoding is one of the largest performance hits.  However it is often not an issue, even for some pretty extreme applications, if they use a binary format.  When using binary formats it is usually a modelling or data normalisation issue rather than encoding/decoding.  The big issue is encoding and decoding to Strings.  The Java libraries for this are quite simply crap.  They turn everything into UCS-2 strings, yet most protocols are ASCII, and generate huge amounts of garbage in the process.  For encoding/decoding the common types in typical protocols I can usually beat what Java has to offer by 10-40X. Why don't we have an efficient AsciiBuffer class with the ability to read and write primitives???

I've generally not found an issue saturating a 10GigE network connection from Java.  Data needs to be framed with sympathy for the underlying stack and a lot of the default kernel tunables for Linux require attention.  The lack of understanding in a mainstream programmer for networking is staggering.

The goals Gil and I have for this is quite modest.  If we have better control of memory layout then we can build much better performing collections and other data structures.  Every programmer does not need to understand how the intrinsics work but they can benefit from some significant performance improvement when their Maps, Trees, and other goodies get compacted and accelerated.

Martin...

Rüdiger Möller

unread,
Jul 19, 2013, 11:13:17 AM7/19/13
to mechanica...@googlegroups.com
see iniline comments ..
-Rüdiger


Am Freitag, 19. Juli 2013 14:31:52 UTC+2 schrieb Martin Thompson:
I'm with you on the quality of GC we get from the standard JVMs.  You should try other runtimes like, Mono, Javascript, PHP, Python, etc. to see how GC can be even worse! :-)


I did, sigh...
 
I'm not sure your "struct" solves all the issues.

Take IPC for example.  Rather than introduce structs, add putOrderedX(), compareAndSetX(), and getAndAddX() atomics to ByteBuffer and now you have everything you need for a IPC implementation using memory mapped files and it is pure Java with no language changes.  For IPC we need to extend the memory model.  It is not about structures.  Just map a flyweight over the buffer to access your structures.  This would be a trivial addition to the core libraries.  I've been meaning to get a discussion going on this on the concurrency-interest list.


That's exactly what "my structs" currently does. It supports creating and reading structured data, but with *arbitrary* Object nesting level, embedded arrays of any type and without the handcrafting (done using class generation at runtime). Access and reference to embedded objects is via fly-weight objects (i called pointers). Its just a tool to ease handling of 'flattened' data structures. Btw. interacts nicely with disruptor-like processing designs ;-).
 
I get to profile a lot of applications and have to agree that encoding is one of the largest performance hits.  However it is often not an issue, even for some pretty extreme applications, if they use a binary format.  When using binary formats it is usually a modelling or data normalisation issue rather than encoding/decoding.  The big issue is encoding and decoding to Strings.  The Java libraries for this are quite simply crap.  They turn everything into UCS-2 strings, yet most protocols are ASCII, and generate huge amounts of garbage in the process.  For encoding/decoding the common types in typical protocols I can usually beat what Java has to offer by 10-40X. Why don't we have an efficient AsciiBuffer class with the ability to read and write primitives???

I've generally not found an issue saturating a 10GigE network connection from Java.  Data needs to be framed with sympathy for the underlying stack and a lot of the default kernel tunables for Linux require attention.  The lack of understanding in a mainstream programmer for networking is staggering.


Our Infiniband is >40Gbit. The bottle neck is definitely encoding. Pls do not underestimate my experience, i've done a system processing and re-routing complete data of one of the largest derivative exchanges in the world (market data incl. order book depth of 200.000 instruments + all order + all trade transaction traffic) in realtime. So its not like I am talking out of my ass ..
 
The goals Gil and I have for this is quite modest.  If we have better control of memory layout then we can build much better performing collections and other data structures.  Every programmer does not need to understand how the intrinsics work but they can benefit from some significant performance improvement when their Maps, Trees, and other goodies get compacted and accelerated.


no problem with your approach, I am just convinced there exists a solution covering a larger part of the problem domain, instead of a fraction. Maybe you find ways to extend your existing design.

Martin Thompson

unread,
Jul 19, 2013, 11:27:21 AM7/19/13
to mechanica...@googlegroups.com
Not underestimating your experience at all.  Anyone trying to saturate IB is right on the frontier.

I was just illustrating the common problems I see.  At the extremes I encode/decode with Unsafe and deal with some of the largest feeds so I have empathy.  This requires not just a native binary method of encoding but also alignment, striding and branching considerations which I'm sure you well know :-) 

Thanks for the feedback!

On Friday, July 19, 2013 4:13:17 PM UTC+1, Rüdiger Möller wrote:
see iniline comments ..
-Rüdiger

Sand Stone

unread,
Jul 19, 2013, 12:23:29 PM7/19/13
to mechanica...@googlegroups.com
+Like 

Gil Tene

unread,
Jul 19, 2013, 12:46:48 PM7/19/13
to mechanica...@googlegroups.com
 
The goals Gil and I have for this is quite modest.  If we have better control of memory layout then we can build much better performing collections and other data structures.  Every programmer does not need to understand how the intrinsics work but they can benefit from some significant performance improvement when their Maps, Trees, and other goodies get compacted and accelerated.

no problem with your approach, I am just convinced there exists a solution covering a larger part of the problem domain, instead of a fraction. Maybe you find ways to extend your existing design.

I think that this question, of what parts of the problem domains to address in a single shot is centric to the discussion/argument/pondering that we have been doing. John Rose's Arrays 2.0 talk does a greta job of covering many things people want from structs, arrays, and layout issues in general, and for many different reasons. It is my opinion that it covers multiple disparate domains.

These domains are, in my opinion, often conflated together. And (again in my opinion) that conflation makes people look for common solutions that aim at too much of a lowest common denominator. Let me speak in specifics:

On-heap and Off-heap data is fundamentally (as in FUNDAMENTALLY) different in capability, benefit, and scope. They also have dramatically different motivations when it comes to end-user-programmers (as opposed to people who build infrastructure libraries).

On-heap stuff has clean, safe definitions that are supported by the full brunt of a strong specification, memory model, and tight verification (as in the JVM will refuse to accept code that breaks rules). This means that invariants are kept there that are very useful. Invariant like knowing where all the heap references are at all times, and knowing the difference between a heap reference and a scalar at all times and in all places. These invariants mean that (and are pretty much required for) people can use Classes, Objects, Collections, Weak References, and all sorts of nice objecty-things interchangeably in an automatic-memory-management world, crossing library and package boundaries safely, and providing a tremendous amount of code leverage. Simply put: on-heap stuff is easily useable in Java, using regular Java programming idioms.

Off-heap stuff is crisp, precise, and unsafe (as in needs new disciplines and lets you do unchecked stuff). It is also often "fast" (when the on-heap stuff lacks that fasteness in some way). It relies on discipline for correctness and for speed. It is absolutely necessary for some things (like IPC and handling OS and hardware things that are not Java). It is useful when absolute control gets you some extra speed. It is also useful to get around pesky problems that I (me, not all) attribute to inadequate JVM handling of on-heap behavior. But off heap is not for the off-heap-untrained. It is not Java with Java programming idioms. It has extra behaviors (for people who learn to expect automatic memory management) and extra burdens. And it is highly restricted in it's ability to use Objects. And when I say Objects I mean those things that subclass the Object class, can be referenced, hash-coded, locked, placed i collections, and passed around to the multitude of code that may or may not do something useful with them.

On-heap and Off-heap are not about good and bad. They are not even about slow and fast. They are two dramatically different domains both from a data representation and programming rules point of view. They both live together in many applications for good reason, and crossing the boundary between them is something many of you here have developed interesting art for.

But trying to come up with something that will address needs (like wanting memory layout things like structs) that would span both will leave both ends unsatisfied:

On the off-heap side, structs MUST provide precise placement and memory layout control. Off heap stuff is often shared with non-Java code and with other things (other processes, hardware, OS buffers, etc.) which means that the data layout needs to be precisely defined. This means that data storage sizes need to be exactly defined and agreed on, down to the number of bits in each field, the endian-ness of multi-byte (or mlti-bit) numbers, and the alignment expectations of structures. The lack of standard language and library support for memory layout description (in the form of structs, unions, arrays, etc.) for off-heap contents is clear. Wanting good libraries for that is legitimate, and many of you have developed interesting libraries (both via unsafe and by just overlaying ByteBuffers) that provide such functionality. Work like the IBM PackedObjects stuff and others is trying to address some of these, and throwing more effort behind them would be good.

On the On-heap side, opposite rules apply. Precise placement and layout control is BAD for on-heap behavior. It would dramatically restrict the ability to do certain things, like adapt to new architectures, optimizations, and algorithms under the hood. All you need to do to realize that is imagine what Java would have looked like today if precise placement control for on-heap data was provided when 64 bit CPUs and memory sizes bigger than 1GB were considered exotic. Instead, "strong hints" are much more useful approach for longevity. I'm all for @Contended and @Aligned hints, as well as various @KeepOrder hints, but I'm totally against anything that expect a reference field to have 64 bits in it, or expect a long (which has well defined signed 64 bit semantics) to be stored in exactly one aligned 64 bit word. I'm also obviously against things that assume cache lines are 64 bytes, or that pages are 4KB, that virtual memory does or does not exist. I also strongly object to things that carry a belief that objects can be expected to stay in one location for a duration longer than the current instruction, or that their internal layout representation will not change in the next instruction.

On the other hand, there are things On-heap needs and wants that would be BAD (as in probably impossible) to try to impose on Off-heap data. Those usually start with wanting to hold references to other On-heap objects inside these On-heap things, and to refer to On-heap stuff from other things. To do that (in our nice, automatically memory managed, don't have to write destructors or keep ref counts, etc. world), All data layout has to be SAFELY well described, preventing things like storage slots that are imprecise or un-resolveably (by the runtime) ambiguous about their scalar vs. reference nature. The benefit of keeping these qualities is clear just from looking at the tremendous success Java has had to date. New data layout things would benefit greatly from keeping those qualities and being to play in the world of real Objects, which is something that would be hard to do if the same layout stuff were to try to satisfy the (contradictory) Off-heap needs at the same time.

ObjectLayout is squarely focused on on-heap data layout. It is not (quite) ignoring the Off-heap world, it is just assuming that other work would address that world, and sees no contradiction with it. I would strongly encourage people working on the Off-heap problem to take a similar approach, and avoid trying to address the On-heap world in their solutions, as trying to do so is likely to either strongly restrict what they can do, or make unreasonable expectations about the behavior of On-heap mechanisms going forward.

We've discovered things that are much cooler than we thought about what happens when you give up on introducing a new construct (a struct) and use Objects in it's place. Some have to do with transparency (e.g. being able to improve the behavior of existing code and APIs), and some has to do with capability. The ability to place a member of an optimally-laid-out, struct-like object array in seventeen other collections and not worry about liveness rules turns out to be very useful. 

We've also found places and annoyances in current coding/speed where being to use existing classes as members seems like it will be very useful. E.g. the ability to hold an array of AtomicLongs that would be laid out flat will eliminate one of the common uses of atomic field updaters and their relative slowness (Current AtomicLongArray implementations have an extra indirection level, which makes many people roll their own). The ability to do the same for member fields in the future struct-in-struct like thing we are thinking of for ObjectLayout may give similar benefits. Eliminating an indirection step from the commonly used (and commonly hot) HashMap.get() call without changing any APIs is another. None of those would be possible if we didn't accept regular objects and existing classes as members in these new layouts...

Bottom line: I'm trying to make the claim that bifurcating the implementations (between On-heap and Off-heap domains) is a good thing...

Somehow I think this debate will not be over quickly. And I think that's a good thing.


Sand Stone

unread,
Jul 19, 2013, 2:08:02 PM7/19/13
to mechanica...@googlegroups.com
Gil, you have captured it well.  

>But trying to come up with something that will address needs (like wanting memory layout things like structs) that would span both will leave both ends unsatisfied:
I would like to add it's not a binary choice for people to pick one or the other (in regards to on-heap and off-heap). In non-trivial systems, programmers have to use both. And have to do interop. between the two heaps. 

Traditionally, the story is to use JNI. Besides it's pain to maintain a separate set of C/C++ code for different platforms, it's also hard not to try to leverage the cool features in the embedded portable C runtime in the JVM (OpenJDK in particular): threading, memory model, locks, ... 

 



--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rüdiger Möller

unread,
Jul 20, 2013, 6:11:05 AM7/20/13
to mechanica...@googlegroups.com
+1 very reasonable arguing, Gil :-)

maybe I have a somewhat narrowed view due to the issues I am facing with my current work. Imo there is a strong need to make offheap data more manageable. Current solutions are way too handcrafted and thinking of larger development teams, its hard to avoid misuse and confusion for people who choose to *not* spend most of their life in front of Computer Screens (actually I am in vacation, er .. should be). So to me it seems it is more important to find solutions in this area. Additionally this would cover locality issues at least partially. 

If you decide to separate onheap and offheap data as two completely separate concerns, there will always be the need to do costly "conversion" inbetween those memory models. So if the need for Offheap raises (IPC, big data) you get stuck in there and there is no cheap way back to the vanilla java heap, the whole code base might get infected with quirksy offheap structure handling. So if you develop a solution covering on heap locality and layout you should also have a pretty detailed idea (if not API) on how to represent off heap structures and how to bridge those worlds, as -you know that- tiny constraints made in the On-Heap spec might have huge impact on the producibility of a future offheap structure implementation. 

Another issue i'd like to point out is an -often seen- tendency to do "pseudo encapsulation". I mean: If the hints are rather vague (e.g. @Contented and similar stuff). Me as a "user" of the VM, will start to figure out how the VM I am currently using for my concrete project handles this stuff. I will not program against the API, but will target the "Implementation behind the API". This is not a good thing, e.g. I'd rather have an @inline annotation than having to split methods in order to have the JIT inline it (next JDK update will probably not, so retest required). From the POV of VM-Implementation you likely want as many freedom as possible, however from the POV of a java programmer, things get "magic" and you'll add another 100 entries to the "Best Java Performance Tricks" websites. So stronger assumptions on the underlying implementation can make sense even if this sacrifices some freedom of VM implementation.

regards,
Rüdiger

Rüdiger Möller

unread,
Jul 20, 2013, 6:35:40 AM7/20/13
to
Martin,

see inline comments ..


Am Freitag, 19. Juli 2013 17:27:21 UTC+2 schrieb Martin Thompson:
Not underestimating your experience at all.  Anyone trying to saturate IB is right on the frontier.


actually we don't have to saturate it in order meet requirements, but i tried and it was not possible. Most of the benchmarks focus on how to get byte[] throughput, however filling these bytes with structured data (de-/encoding) in a manageable way is often more of an issue ... Especially Decoding without Object allocation is hardly doable if you like to have kind of manageable business code layer.
 
I was just illustrating the common problems I see.  At the extremes I encode/decode with Unsafe and deal with some of the largest feeds so I have empathy.  This requires not just a native binary method of encoding but also alignment, striding and branching considerations which I'm sure you well know :-) 


There are some very interesting blogs out there covering this ;-) In practice its pretty hard to actually make use of locality in Java, usually the data to be encoded comes in right from the business layer and is cluttered across the heap, so you get cache misses anyway. Additionally locality is very fragile in java, tiny changes break it easily. However a more strategic approach to increase locality showed much better results (use open adressed Hashmaps, copy tiny Objects directly to the container etc.). This is why i think you should focus on keeping locality if objects are mutated in ObjectLayout. High locality after instantiation will help, but well .. objects change over time.
You are invited to provide a faster serialization than mine (without cutting corners) ..i have experimented somewhat with alignment and hot field placement, however the results were minimal. Major issue is the hashmap lookup in order to track references to identical objects. Flat primitives+strings can be encoded very fast using Unsafe and x86 number layout (always confuse Big and Little Endian).  bench: https://fast-serialization.googlecode.com/files/result-v1.13.html. However mechanical symphaty is not used that much. Additionally object size matters also, as if i can put more Objects/UDP packet it raises effective throughput as well, so alignment can be contraproductive overall ..

regards, Rüdiger 

Martin Thompson

unread,
Jul 20, 2013, 7:25:45 AM7/20/13
to mechanica...@googlegroups.com

On Saturday, July 20, 2013 11:33:02 AM UTC+1, Rüdiger Möller wrote:
Martin,

Am Freitag, 19. Juli 2013 17:27:21 UTC+2 schrieb Martin Thompson:
Not underestimating your experience at all.  Anyone trying to saturate IB is right on the frontier.


actually we don't have to saturate it in order meet requirements, but i tried and it was not possible. Most of the benchmarks focus on how to get byte[] throughput, however filling these bytes with structured data (de-/encoding) in a manageable way is often more of an issue ... Especially Decoding without Object allocation is hardly doable if you like to have kind of manageable business code layer.

Absolutely, for decoding you need to use a flyweight or callback pattern.  Object allocation in this path is too expensive.  My measurements have shown this is not due to actual decoding but more to do with cache invalidation due to washing the newly allocated objects through the cache.
 
 I was just illustrating the common problems I see.  At the extremes I encode/decode with Unsafe and deal with some of the largest feeds so I have empathy.  This requires not just a native binary method of encoding but also alignment, striding and branching considerations which I'm sure you well know :-) 


There are some very interesting blogs out there covering this ;-) In practice its pretty hard to actually make use of locality in Java, usually the data to be encoded comes in right from the business layer and is cluttered across the heap, so you get cache misses anyway. Additionally locality is very fragile in java, tiny changes break it easily. However a more strategic approach to increase locality showed much better results (use open adressed Hashmaps, copy tiny Objects directly to the container etc.). This is why i think you should focus on keeping locality if objects are mutated in ObjectLayout. High locality after instantiation will help, but well .. objects change over time.
You are invited to provide a faster serialization than mine (without cutting corners) ..i have experimented somewhat with alignment and hot field placement, however the results were minimal. Major issue is the hashmap lookup in order to track references to identical objects. Flat primitives+strings can be encoded very fast using Unsafe and x86 number layout (always confuse Big and Little Endian).  bench: https://fast-serialization.googlecode.com/files/result-v1.13.html. However mechanical symphaty is not used that much. Additionally object size matters also, as if i can put more Objects/UDP packet it raises effective throughput as well, so alignment can be contraproductive overall ..

You've hit on my main motivation for the object layout work.  Based on measurements I've found that cache missing using maps and other structures is the biggest performance bottleneck.    On output I do not generate objects but instead write events to the outgoing buffer via a flyweight pattern to keep the cache hot.  Encoding an event in binary format may take a few 10s of instructions but a single cache miss can cost high 100s of lost opportunity instruction cycles.

I like how you refer to the entire "encoding/decoding" process to not just be the bit twiddling to fill a buffer but the whole locate an object to gettings its representation into a buffer hot for sending to an IO device and vice versa. :-)

Martin... 

p.s. for some reason your links try to download rather than follow.

Gil Tene

unread,
Jul 20, 2013, 11:36:58 AM7/20/13
to mechanica...@googlegroups.com
The way I see it, there will forever be a conversion between On-heap and Off-heap. It is, by definition, unavoidable when Off heap storage is used.

Objects (things inherited from Object) can never reside in Off-heap memory. So things that are Off-heap are by definition not Objects. A different way to think about it is that the heap is not a range of memory addresses. It is the collection of all places in which something inherited from Object could possibly exist. So if you ever found a way to put an Object somewhere outside of the current heap, all you'd be doing is expanding the heap to also include that place, placing it under the heap's management, and losing the ability to do Off-heap things in it...

The things you put Off-heap are opaque bit-buckets as far as the On-heap stuff is concerned, and as far as the JVM is concerned. The two data spaces will never hold shared data, and there will always be a conversion step if the same data is represented in both On-heap and Off-heap spaces. The conversion may be "fast" (as in just copying the data with no interpretation), but it will still be there as a logical step for various simple reasons (like endian-ness). 

Those Opaque bit-buckets that are Off-heap memory are capable of providing the same controls and lifecycle management capabilities that a C/C++ environment does for all of memory, but no more. They currently have a very thin layer of (Java accessible) library code to manipulate them with, if you don't count JNI. As a result, Off-heap memory is usually used in a fairly static or very-slow-moving form (as in statically allocated storage, mapped files, and occasionally allocated large buffers). You could certainly build good "commonly used" libraries to manipulate that data space as it's own heap (malloc/free, new/delete, retain/release, ARC), and built layers and layers of Java-accesible code that would give you C/C++/Objective-C heap semantics for that bit-bucket space, but for some (good in my opinion) reason this practice has not emerged as a widely used one. I.e. where those libraries exist, they usually are not leveraged outside a specific project or small set of people.

Most of the "leveraged by others" libraries I've seen used for Off-heap manipulation (and there aren't many of those) expose one or more of the following three behaviors:
1. Convenient and explicit data layout control
2. Fast data manipulation capabilities
3. Some high level collection-style APIs and large storage capacity

#1 is usually motivated by Java code communicating with non-Java-data stuff, including wire protocols, hardware and OS buffers, and non-Java processes, all of which require a fixed, well agreed on data structures for the bits involved. Many people roll their own here (with variations of flyweights or other wrappers for opaque bit buckets), and some people have placed actual libraries out there that may be leverage-able by others to save them work. Some infrastructure work that may evolve into a future standard or de-facto standard ways to do this on JVMs is also being done (e.g. IBM PacketObjects).

#2 is usually motivated by speed, and is distinguished by it's use even when no non-Java code is interacting with the data. There are good reasons people do this (they can measure the benefit), but I always look at this practice (when only Java code ever accesses the data) as an indication of missing on-heap performance capabilities, as the code would have probably used on-heap stuff if the same (or very close) performance where possible there. This is where I draw my main motivation for ObjectLayout. As I see it, where ObjectLayout produces good, high performance data manipulation capabilities on-heap, code that would have needed to move to use off-heap data purely for speed reasons will be able to remain on-heap instead, and will be able to benefit from real Object capabilities and the more commonly understood Java programming idioms, not to mention the ability to leverage all that third-party code that doesn't seem to understand non-Object-dervied data.

#3 is usually motivated by wanting to move data volume off the heap (usually to collection stuff like caches, K/V storage, hash tables, etc.) in order to reduce pressure on GC and avoid the other negative impacts associated with GC in most JVMs (pausing, inefficiency). I call these libraries "EMS-based" collection frameworks. As you all probably know by now, my opinion is that this category will simply go away over time, as GC is now a provably solved problem both from a pause time and efficiency perspective (Zing being a practical, available-right-now existence proof point). While annoyingly pausing and non-efficient collector implementations are still commonplace right now in other server JVMs, I fully expect badly behaving GCs to be weeded out over the next decade, making workaround memory management solutions like these as rare for Java as EMS became after Windows95 came out.

So from an Off-heap perspective my ObjectLayout work is focused on category #2 above, since I see #3 as a non-problem (I've already solved that one), and I see #1 as a separate problem worth solving (by someone else).

From an On-heap perspective (which is where most of the Java world lives), I see ObjectLayout as a way to improve performance, period. That's because [currently] most people faced with the choice between living with the performance overhead (of extra indirection and non-streaming memory access) and moving to non-Object-based data simply choose to stay with lower-performing Object-based data. I'm hoping ObjectLayout gives them some ways to enjoy both.

Rüdiger Möller

unread,
Jul 21, 2013, 5:38:56 AM7/21/13
to mechanica...@googlegroups.com


Absolutely, for decoding you need to use a flyweight or callback pattern.  Object allocation in this path is too expensive.  My measurements have shown this is not due to actual decoding but more to do with cache invalidation due to washing the newly allocated objects through the cache.

I never thought about cache related issues regarding slowness of allocation .. however I have an empirical + benchmark backed aversion allocating objects ;-). it's evil. However sooner or later data received likely results in allocation if it is processed somehow by business logic. That's why I'd like to represent all important business data as 'structs' so the need for encoding/decoding (inside a homogenous cluster) vanishes. That might be one of the reasons i try to push ObjectLayout to my problem domain ;-)
 
 
You've hit on my main motivation for the object layout work.  Based on measurements I've found that cache missing using maps and other structures is the biggest performance bottleneck.    On output I do not generate objects but instead write events to the outgoing buffer via a flyweight pattern to keep the cache hot.  Encoding an event in binary format may take a few 10s of instructions but a single cache miss can cost high 100s of lost opportunity instruction cycles.


encoding cost still matters, copyMemory to put an int array is 5 to 10 times faster than a java-written loop converting Little- to Big-Endian. 
Will the intrinsic backing of Object Layout be avaiable as an OpenJDK patch or for Zing only ? 
 
I like how you refer to the entire "encoding/decoding" process to not just be the bit twiddling to fill a buffer but the whole locate an object to gettings its representation into a buffer hot for sending to an IO device and vice versa. :-)


I have an experimental open addressed HashMap working completely off-heap (after byte code instrumentation). With some additional ofheap-specific code tweaks (plain instrumented on-heap byte code is sub-optimal) it will likely outperform any possible onheap map implementation er.. except an ObjectLayout based one maybe.

>> p.s. for some reason your links try to download rather than follow

google code does that in order to prevent users hosting websites i assume ..

regards,
Rüdiger

Rüdiger Möller

unread,
Jul 21, 2013, 6:38:03 AM7/21/13
to


Am Samstag, 20. Juli 2013 17:36:58 UTC+2 schrieb Gil Tene:
The way I see it, there will forever be a conversion between On-heap and Off-heap. It is, by definition, unavoidable when Off heap storage is used.


You are right.
 
Objects (things inherited from Object) can never reside in Off-heap memory. So things that are Off-heap are by definition not Objects. A different way to think about it is that the heap is not a range of memory addresses. It is the collection of all places in which something inherited from Object could possibly exist. So if you ever found a way to put an Object somewhere outside of the current heap, all you'd be doing is expanding the heap to also include that place, placing it under the heap's management, and losing the ability to do Off-heap things in it...

The things you put Off-heap are opaque bit-buckets as far as the On-heap stuff is concerned, and as far as the JVM is concerned. The two data spaces will never hold shared data, and there will always be a conversion step if the same data is represented in both On-heap and Off-heap spaces. The conversion may be "fast" (as in just copying the data with no interpretation), but it will still be there as a logical step for various simple reasons (like endian-ness). 


Agree, i'd like to put emphasis on the word "fast" here ;-)
 
Those Opaque bit-buckets that are Off-heap memory are capable of providing the same controls and lifecycle management capabilities that a C/C++ environment does for all of memory, but no more. They currently have a very thin layer of (Java accessible) library code to manipulate them with, if you don't count JNI. As a result, Off-heap memory is usually used in a fairly static or very-slow-moving form (as in statically allocated storage, mapped files, and occasionally allocated large buffers). You could certainly build good "commonly used" libraries to manipulate that data space as it's own heap (malloc/free, new/delete, retain/release, ARC), and built layers and layers of Java-accesible code that would give you C/C++/Objective-C heap semantics for that bit-bucket space, but for some (good in my opinion) reason this practice has not emerged as a widely used one. I.e. where those libraries exist, they usually are not leveraged outside a specific project or small set of people.


There is some middle ground. Structs are actually not off heap ! 

Consider the following very common design pattern of "Value Classes"

class Person [extends Struct] {
String name;
TimeRange contractValidity;
Date birth;
...
}

It is obvious that locality of this class is likely to be severe (especially if it has been mutated several times). Business code operating on this would likely cause several cache misses. Additionally a lot of per object memory is used and GC overhead is created (regardless how efficient this GC implementation is). ObjectLayout will help here, however only in case the "Person" is immutable.

One could represent this as a single flat Person "bitbucket" with an "AccessHelper" Object wrapping it. The Person Object then is a first class heap object, while the embedded Value Objects are rewritable by-value. They are not first class Heap Objects and nobody will have a problem with that. Condition is, that you can't have references from such an opaque Object to the heap (can be enforced by requiring all struct embedded Objects to inherit a "Struct" class). However this is not really off-heap, "Person" Objects as a whole are subject to GC, no need to implement manual memory management. Locality is excellent and is kept regardless of mutation (which ofc has limits because of fixed size data layout)

A challenge is, to define semantics which do not break the simplicity and clearness of the Java language's Object Model.

 
Most of the "leveraged by others" libraries I've seen used for Off-heap manipulation (and there aren't many of those) expose one or more of the following three behaviors:
1. Convenient and explicit data layout control
2. Fast data manipulation capabilities
3. Some high level collection-style APIs and large storage capacity

#1 is usually motivated by Java code communicating with non-Java-data stuff, including wire protocols, hardware and OS buffers, and non-Java processes, all of which require a fixed, well agreed on data structures for the bits involved. Many people roll their own here (with variations of flyweights or other wrappers for opaque bit buckets), and some people have placed actual libraries out there that may be leverage-able by others to save them work. Some infrastructure work that may evolve into a future standard or de-facto standard ways to do this on JVMs is also being done (e.g. IBM PacketObjects).

I admit i have not looked into packed objects in detail .. anyway i can't wait that long. 

#2 is usually motivated by speed, and is distinguished by it's use even when no non-Java code is interacting with the data. There are good reasons people do this (they can measure the benefit), but I always look at this practice (when only Java code ever accesses the data) as an indication of missing on-heap performance capabilities, as the code would have probably used on-heap stuff if the same (or very close) performance where possible there. This is where I draw my main motivation for ObjectLayout. As I see it, where ObjectLayout produces good, high performance data manipulation capabilities on-heap, code that would have needed to move to use off-heap data purely for speed reasons will be able to remain on-heap instead, and will be able to benefit from real Object capabilities and the more commonly understood Java programming idioms, not to mention the ability to leverage all that third-party code that doesn't seem to understand non-Object-dervied data.

nothing to disagree here .. 

#3 is usually motivated by wanting to move data volume off the heap (usually to collection stuff like caches, K/V storage, hash tables, etc.) in order to reduce pressure on GC and avoid the other negative impacts associated with GC in most JVMs (pausing, inefficiency). I call these libraries "EMS-based" collection frameworks. As you all probably know by now, my opinion is that this category will simply go away over time, as GC is now a provably solved problem both from a pause time and efficiency perspective (Zing being a practical, available-right-now existence proof point). While annoyingly pausing and non-efficient collector implementations are still commonplace right now in other server JVMs, I fully expect badly behaving GCs to be weeded out over the next decade, making workaround memory management solutions like these as rare for Java as EMS became after Windows95 came out.


I remember David Ungar predicting similar things 30 years ago .. Even if you have a non-pausing GC, it still consumes CPU. On client side memory efficiency is and will be an issue forever.
 
So from an Off-heap perspective my ObjectLayout work is focused on category #2 above, since I see #3 as a non-problem (I've already solved that one), and I see #1 as a separate problem worth solving (by someone else).

From an On-heap perspective (which is where most of the Java world lives), I see ObjectLayout as a way to improve performance, period. That's because [currently] most people faced with the choice between living with the performance overhead (of extra indirection and non-streaming memory access) and moving to non-Object-based data simply choose to stay with lower-performing Object-based data. I'm hoping ObjectLayout gives them some ways to enjoy both.
 
Regarding "levered by others". I expect Java to replace C++ in many places in the future. Currently we see new fancy languages gain ground in the Productive-But-Slow arena, that's where Java dominated for a long time.At the same time, improved hardware and VM implementations enable the use of Java for applications, which were in the C/C++ domain. So I'd expect that some low-level shortcomings of Java will become much more important.

Anyway, discussion is stalling. I see your points and I can't come up with a grown up alternative or concrete idea of an ObjectLayout extension. So .. looking forward to a beta VM backed ObjectLayout implementation ;-) 

BTW: Will intrinsic support for ObjectLayout be avaiable for Zing only or are you considering an OpenJDK patch ?

regards,
Rüdiger

ps: managment has approved to evaluate Zing with release 2.2  

Martin Thompson

unread,
Jul 21, 2013, 6:30:20 AM7/21/13
to mechanica...@googlegroups.com


On Sunday, July 21, 2013 10:38:56 AM UTC+1, Rüdiger Möller wrote:
 
You've hit on my main motivation for the object layout work.  Based on measurements I've found that cache missing using maps and other structures is the biggest performance bottleneck.    On output I do not generate objects but instead write events to the outgoing buffer via a flyweight pattern to keep the cache hot.  Encoding an event in binary format may take a few 10s of instructions but a single cache miss can cost high 100s of lost opportunity instruction cycles.


encoding cost still matters, copyMemory to put an int array is 5 to 10 times faster than a java-written loop converting Little- to Big-Endian. 

If you are using ByteBuffer.order(ByteOrder.nativeOrder()) then this is not such an issue.

Will the intrinsic backing of Object Layout be avaiable as an OpenJDK patch or for Zing only ? 

The plan is to support both Zing and provide a patch for OpenJDK.
 

Gil Tene

unread,
Jul 21, 2013, 11:23:55 AM7/21/13
to
And, assuming we get enough interest going, id hope to develop this into an actual JEP and JSR and to try to push it into a future Java SE version. E.g. Java SE 10. Java SE 9 would be great, but I my guess is that its contents will be locked down too early for this to affect it unless things move very fast.

I do hope to get some external (non-Azul) involvement with carrying the Non-Zing stuff though, so any JVM hackers out there that want to play with this are welcome to chime in.

There is a lot of consideration under the hood here for the "how will a JVM intrinsify this stuff?" questions, which loop back into making sure semantics are carefully restricted to allow practical intrinsification. We've thought through how this would look for various collectors as well, as most of the JVM change has to do with teaching the collectors about this new thing, and adding some fairly simple intrinsics for the actual implementations of factory methods and get() methods.

So far, I've say we have a very good feel for how this will work both in Zing's C4 (obviously) and in ParallelGC, and neither is very challenging or complex. It is almost surprising how cleanly this fits into place and how "un-foreign" stuff is to current JVM code when full blown Object qualities are retained. E.g. there is little need for adding the performant tracking and backtracking mechanisms heaps would otherwise need when "references to the middle of an object" become possible, as all currently envisioned ObjectLayout things don't add those semantics to the heap. E.g. a marker has no need to traverse from a contained object to a container object (like a StructuredArray), because liveness semantics don't require a contained object to keep its container alive. That is a huge plus from an implementation point of view, and ends up being much more natural from a programmers point of view as well (other collections in Java share this quality already). Most of the collector-related work will be in teaching relocation code to move contained objects, such as those identified as contained by a marker reaching them directly from a container, by moving their entire container.

I believe that extending the work to the CMS case will be nearly trivial. CMS itself never relocates objects, so the relocation logic which will only occur in newgen and in FullGC will be identical to ParallelGC, and we just need the marker to mark contained objects as contained. I think the G1 case wouldn't be very hard either, but can't say that I've covered it in detail.


Gil Tene

unread,
Jul 21, 2013, 12:28:44 PM7/21/13
to mechanica...@googlegroups.com


On Sunday, July 21, 2013 3:24:40 AM UTC-7, Rüdiger Möller wrote:
...


There is some middle ground. Structs are actually not off heap ! 

Consider the following very common design pattern of "Value Classes"

class Person [extends Struct] {
String name;
TimeRange contractValidity;
Date birth;
...
}

It is obvious that locality of this class is likely to be severe (especially if it has been mutated several times). 

This is a good example for discussion.

First, it raises a good point about one of the use limitations of ObjectLayout. When ObjectLayout things provide a flat layout in which some object contains some other objects, as in StructuredArray, the references from the container to the contained object must by definition remain immutable (and in fact, must be knowable at container allocation time). This does in no way means that the contained object needs to be immutable (I expect the opposite to be common), but it does mean that using StructuredArrays with immutable element types will have relatively limited use. The same will likely be true in other use cases (such as the struct-in-struct use case we haven't put together yet).

As a result, in the above example, in which all member objects under Person are represented as mutable references to immutable types, will not fit well into ObjectLayout things, just as they won't fit well into any other struct-like thing. However, with ObjectLayout you would probably represent this sort of thing using immutable references to mutable objects. And since container objects will be full blown objects, you can mix and match embedded and non-embedded objects.

For example:

class Person {
  String fullNameThatIsRarelyUsed;
  final MutableString name = StructuredObject.newEmbeddedInstance(MutableString.class, 128, "Placeholder");
  final MutableTimeRange contractValidity = StructuredObject.newEmbeddedInstance(MutableTimeRange.class);
  final MutableDate birth = StructuredObject.newEmbeddedInstance(MutableDate.class);
...
}

If all you need is a good, completely flat value object and you are willing to converse about it's contents with only scalar values, flyweights and other access-method-based thing already provide that ability on top of objects, even without any ObjectLayout stuff. See the Union example I posted here earlier. 

It's when what you want is to be able to converse about the members in more-complex-than-scalar form, e.g. if you want pass that birth date thing around to other code without having it understand the containing Person structure, that's really where the struct vs. Object thing will come into play. I expect most concerns would be the same for either, e.g. the thing's contents will need to be mutable either way if you actually want to change it, and they can remain immutable [only] if they are truly final values from the Person structure's point of view.

I guess my take right now is that passing those things around as objects presents more value than the space savings that having a struct type would give. I see little semantic benefits to structs that go beyond the "save the wasted header space" argument.

... Business code operating on this would likely cause several cache misses. Additionally a lot of per object memory is used and GC overhead is created (regardless how efficient this GC implementation is). ObjectLayout will help here, however only in case the "Person" is immutable.

The above example would take care of the cache misses. It will not remove the per-object memory overhead, and depending on the intrinsic implementation  may or may not remove the header gaps between the member field (affecting locality). We've already thought up some quirky intrinsic ways to keep the headers away from the bodies in some cases if we really wanted to, some of which are practically performance neutral (paid for with extra memory that is never accessed and doesn't make it into the cache), but I doubt we'll go there in the near future. Having header gaps between flatly laid out instances is probably a secondary (cache locality mostly) concern.

... Additionally a lot of per object memory is used and GC overhead is created (regardless how efficient this GC implementation is). ...

I need to point out the GC overhead worry is not really an issue in my view. Or shouldn't be. GC efficiency is being traded off against pause times right now, and (almost) nothing else. For ALL current collectors available and actually used in JVMs, GC efficiency math is easy to control without resorting to flattening out objects (note I'm not talking about pauses, but about efficiency) - for every doubling of empty memory in the heap, you roughly double the efficiency of GC for the exact same code, data, and data layout. This is such a powerful (and cheap) lever that it makes GC overhead concerns around the GC cost of traversing references an arbitrarily small one. Of course, if your collector exhibits stop-the-world behavior that grows with heap size, this may present a problem, but it's a problem Zing doesn't have, and it's something I'd expect all collectors to have learn to do right over the next decade+.

So yes, GC markers will spend energy and effort traversing the references in live objects. But the good ones will do this in background threads that do not affect the latency of other work. The amount of time spent on this is can be made so miniscule as a percentage of overall work in the system that optimizing it further (e.g. reducing it by programming better structures) can always be made a seventh order concern. For most of our Zing customers, for example, total cycles spent on GC amount to less than 3% of the total cycles spent by the entire JVM process (that's what we usually recommend they size their heap, and in many logs I see it's less than 0.5%). And since non of that GC work time is spent in a stop-the world pause, application threads just don't care or feel it. When performance matters, people will happily burn a few extra GB of empty memory to keep those efficiency numbers where they want them. Unless all your CPUs are saturated it's a non-issue. And even then, you still control the % of system CPU time spent on GC by deciding how much empty memory to give it.

Rüdiger Möller

unread,
Jul 22, 2013, 5:45:12 AM7/22/13
to mechanica...@googlegroups.com


Am Sonntag, 21. Juli 2013 18:28:44 UTC+2 schrieb Gil Tene:


On Sunday, July 21, 2013 3:24:40 AM UTC-7, Rüdiger Möller wrote:
...

There is some middle ground. Structs are actually not off heap ! 

Consider the following very common design pattern of "Value Classes"

class Person [extends Struct] {
String name;
TimeRange contractValidity;
Date birth;
...
}

It is obvious that locality of this class is likely to be severe (especially if it has been mutated several times). 

This is a good example for discussion.

First, it raises a good point about one of the use limitations of ObjectLayout. When ObjectLayout things provide a flat layout in which some object contains some other objects, as in StructuredArray, the references from the container to the contained object must by definition remain immutable (and in fact, must be knowable at container allocation time). This does in no way means that the contained object needs to be immutable (I expect the opposite to be common), but it does mean that using StructuredArrays with immutable element types will have relatively limited use. The same will likely be true in other use cases (such as the struct-in-struct use case we haven't put together yet).

As a result, in the above example, in which all member objects under Person are represented as mutable references to immutable types, will not fit well into ObjectLayout things, just as they won't fit well into any other struct-like thing. However, with ObjectLayout you would probably represent this sort of thing using immutable references to mutable objects. And since container objects will be full blown objects, you can mix and match embedded and non-embedded objects.

For example:

class Person {
  String fullNameThatIsRarelyUsed;
  final MutableString name = StructuredObject.newEmbeddedInstance(MutableString.class, 128, "Placeholder");
  final MutableTimeRange contractValidity = StructuredObject.newEmbeddedInstance(MutableTimeRange.class);
  final MutableDate birth = StructuredObject.newEmbeddedInstance(MutableDate.class);
...
}

If all you need is a good, completely flat value object and you are willing to converse about it's contents with only scalar values, flyweights and other access-method-based thing already provide that ability on top of objects, even without any ObjectLayout stuff. See the Union example I posted here earlier. 

It's when what you want is to be able to converse about the members in more-complex-than-scalar form, e.g. if you want pass that birth date thing around to other code without having it understand the containing Person structure, that's really where the struct vs. Object thing will come into play. I expect most concerns would be the same for either, e.g. the thing's contents will need to be mutable either way if you actually want to change it, and they can remain immutable [only] if they are truly final values from the Person structure's point of view.


This is in effect pretty similar to a structs approach, basically my structs hackery separates the object header from the data (hit on locality then), so I create a "real" object on demand when i need to pass an embedded Object to outer code (with some drawbacks regarding identity and locking ofc). Unfortunately you get the same semantic problems with your approach: 
  • If the embedded objects are mutable, you better not pass references to other code as this likely may create hard to find bugs. 
  • If you make them immutable, you cannot mutate without losing locality. 
One approach is to read/write by value, however in this case you would not need object headers and are better off with a structs approach. Unfortunately this does not fit well into the java language and would require the introduction of at least new operators like ":=" (proposal of J9). Enforcing by-value access would also hurt other use-cases of ObjectLayout. 
So yes, some issues regarding locality are solvable, but there is a "workaroundish" smell. That's why I get a gut feeling, that kind of a language extension/modification would be required to solve this dilemma in a proper way.
 
I guess my take right now is that passing those things around as objects presents more value than the space savings that having a struct type would give. I see little semantic benefits to structs that go beyond the "save the wasted header space" argument.

... Business code operating on this would likely cause several cache misses. Additionally a lot of per object memory is used and GC overhead is created (regardless how efficient this GC implementation is). ObjectLayout will help here, however only in case the "Person" is immutable.

The above example would take care of the cache misses. It will not remove the per-object memory overhead, and depending on the intrinsic implementation  may or may not remove the header gaps between the member field (affecting locality). We've already thought up some quirky intrinsic ways to keep the headers away from the bodies in some cases if we really wanted to, some of which are practically performance neutral (paid for with extra memory that is never accessed and doesn't make it into the cache), but I doubt we'll go there in the near future. Having header gaps between flatly laid out instances is probably a secondary (cache locality mostly) concern.

... Additionally a lot of per object memory is used and GC overhead is created (regardless how efficient this GC implementation is). ...

I need to point out the GC overhead worry is not really an issue in my view. Or shouldn't be. GC efficiency is being traded off against pause times right now, and (almost) nothing else. For ALL current collectors available and actually used in JVMs, GC efficiency math is easy to control without resorting to flattening out objects (note I'm not talking about pauses, but about efficiency) - for every doubling of empty memory in the heap, you roughly double the efficiency of GC for the exact same code, data, and data layout. This is such a powerful (and cheap) lever that it makes GC overhead concerns around the GC cost of traversing references an arbitrarily small one. Of course, if your collector exhibits stop-the-world behavior that grows with heap size, this may present a problem, but it's a problem Zing doesn't have, and it's something I'd expect all collectors to have learn to do right over the next decade+.


Agree for server side. Memory efficiency is not an issue (for most applications). And still: a Java application needs like 4 to 5 times the heap size compared to a C++ application. That's still a disadvantage (minor at server side).
 
So yes, GC markers will spend energy and effort traversing the references in live objects. But the good ones will do this in background threads that do not affect the latency of other work. The amount of time spent on this is can be made so miniscule as a percentage of overall work in the system that optimizing it further (e.g. reducing it by programming better structures) can always be made a seventh order concern. For most of our Zing customers, for example, total cycles spent on GC amount to less than 3% of the total cycles spent by the entire JVM process (that's what we usually recommend they size their heap, and in many logs I see it's less than 0.5%). And since non of that GC work time is spent in a stop-the world pause, application threads just don't care or feel it. When performance matters, people will happily burn a few extra GB of empty memory to keep those efficiency numbers where they want them. Unless all your CPUs are saturated it's a non-issue. And even then, you still control the % of system CPU time spent on GC by deciding how much empty memory to give it.


Gil, i can understand it hurts your eyes seeing people complaining about solved problems. However imho you only focus on server side. There is still the client side and mobile/embedded devices, where memory and CPU constraints are much harder. Also a decade is a very long time from a commercial point of view. So its very worth to introduce optimizations, even if they get obsolete within some years. 

regards, 
Rüdiger

Regarding VM hackery: I'd be interested to get into VM programming (would be nice to be able to roll my own intrinsics), however adding this to my interests and activities on top of my job would require introduction of a 36 hours day, which would likely have severe impact on real world performance ..

Rüdiger Möller

unread,
Jul 22, 2013, 5:49:13 AM7/22/13
to mechanica...@googlegroups.com



If you are using ByteBuffer.order(ByteOrder.nativeOrder()) then this is not such an issue

bytebuffers suck ;-)
 

Simone Bordet

unread,
Jul 22, 2013, 5:54:23 AM7/22/13
to mechanica...@googlegroups.com
Hi,

On Mon, Jul 22, 2013 at 11:45 AM, Rüdiger Möller <moru...@gmail.com> wrote:
> Regarding VM hackery: I'd be interested to get into VM programming (would be
> nice to be able to roll my own intrinsics), however adding this to my
> interests and activities on top of my job would require introduction of a 36
> hours day, which would likely have severe impact on real world performance

I have this problem too.

I tried to clone myself to stay within the 24h timeframe, but then
performance sucked because of cache misses.

--
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless. Victoria Livschitz

Rüdiger Möller

unread,
Jul 22, 2013, 8:43:28 AM7/22/13
to mechanica...@googlegroups.com
Yes, cloning is expensive (did that 2 times) .. never underestimate the maintenance cost of the cloning device ;-) 

Mikhail Khludnev

unread,
Oct 24, 2014, 4:20:20 PM10/24/14
to mechanica...@googlegroups.com
Hello,

I've attended the talk about the subj this week. I really like it. Though, I'm not able to contribute into, I wonder if it's possible to avoid ConstructorMagic threadlocal? Really sorry about off-top. I suppose I found something relevant in my favorite book - API Design. 
I suppose we can apply this "friend code" trick, if we make StructuredArray final and move newInstance() to some frien "Accessor". Does it worth to look into? Or here is a bummer? 

Thanks

On Thursday, 18 July 2013 11:09:59 UTC+4, Gil Tene wrote:
Martin and I have been working together on something we've been calling a StructuredArray, which is part of an  "ObjectLayout" project which is now up on github. I think the concept is becoming mature enough to have other people start throwing mud at it. So Here goes...

First, a bit of background:

StructuredArray originated with a friendly argument Martin and I were having some time in 2012. Martin was rightfully complaining about Java's lack of ability to represent the common "array of structs" data layout available in C/C++/C# and other languages, and the inefficiency of array objects and all other available collections that directly arises from both indirection costs and the lack of striding pattern to stream on. Martin was advocating the common position that there is a need for some form of structs (and arrays of them) to be added into the language as first class citizens. I was taking the contrarian view (as I often do for fun), claiming, "instinctively," that there has to be a way to address this problem through carefully stated semantics captured in regular "vanilla" Java class form, which can then be intrinsified by future JDK versions, but would not require ANY language changes. After all, java.util.concurrent and things like AtomicLong evolved in exactly that way in the pre Java 5 days. My take was that rather than a new struct thing, we can get regular java objects to play the needed role, and that the actual solution needs to come in the form of restricting semantics (through carefully selected Java class contracts), and not of expanding them with new language features or JVM specs.

Somehow, we both decided to actually explore this notion. Martin started off by capturing StructuredArray in his code examples code area on github, and I went off and brainstormed with other Azul engineers on how a JVM would/could intrinsify something that would provide the needed value. Together, we evolved StructuredArray over several months, discovering fundamental issues that needed to be worked around as we went, as well as fundamental properties that make StrcuturedArray much more powerful and useful than an array of structs could ever be (IMHO).

For example, we deduced early on that a StructuredArray should not have a public constructor, with instances created with static factory methods instead. This distinction is driven by a fundamental issue: In Java, all constructed objects have their allocation site (the new bytecode) occur before their initialization and construction code ever sees it's parameters. This would make it "hard" to intrinsify the construction of a variable sized object like a StructuredArray, as the needed allocation size is not known at the "new" time. A factory method, on the other hand can be easily be intrinsified such that allocation can consider the parameters that control the array size.

We further discovered that with first-class Java objects as members, liveness logic can be made significantly different from array-of-structs, and that this Java-natural liveness behavior has an elegance level that makes things "click" together elegantly (I credit Michael Wolf with coming up with that one). In arrays-of-structs-like things, the container must remain logically "live" as long as any logical reference to a member of it exists. This is necessary since "structs" are usually not full-blown objects, and most struct-supporting environments lack the ability to track individual struct liveness. However, when container members are full blown objects, they can also have (and actually "prefer" to have) individual liveness, which means that there is no need for live member objects to implicitly keep their containers alive. This realization not only makes an intrinsic implementation much simpler, it also makes StructuredArray much more generically useful. E.g. a member of a StructuredArray can also be placed in a HashMap or some other collection, and there is no new mess to deal with if it does.

At some point I started feeling that this ""Challenge Accepted!" experiment was showing enough promise that we may actually build something based on it. We moved  the code to the newly formed ObjectLayout project on github, and kept hacking at it. We added multi-dimensional support. We added generic construction support for member objects (so that e.g. even objects with final fields can be natural members of StructuredArrays), we went back and forth on various implementation details. 

StructuredArray was made part of a project we called ObjectLayout because it is not the only example of an intrinsically optimized object layout opportunity. You can think of StructuredArray as the first of three (so far identified) forms of object layout abstractions that can be useful for intrinsically improving memory layout in Java, all without requiring any language changes. The other two forms, not yet captured or completely figured out yet, are "structs with structs inside" (Objects with final object members that are initialized with a factory call) and "struct with array at the end" (Objects that inherit from arrays). For now, I'm focusing on StructuredArray both because I think it's the most useful form, and because [I think] I know exactly how to make a JDK intrinsify it, both in Zing and in future OpenJDKs.

My intent is to mature the vanilla definitions of StructuredArray as open source code, and to demonstrate their value with both fully supported Zing JDKs and some experimental vanilla-OpenJDK derivatives that would include intrinsic StructuredArray implementations that would lay it out as a flat "array of object structures" in the heap while maintaining the exact same class contract as the vanilla implementations will. When we can show and demonstrate the value of doing things "this way", and after working out the kinks, I would hope to drive it into some future Java SE version (10?) by properly upstreaming it into the OpenJDK project.

Before you jump in and look through the JavaDoc and code, it's probably important to note what ObjectLayout is about, and what it is NOT about:

ObjectLayout is focused on in-heap, POJO objects. It's goal is to allow normal Java objects and classes (including existing classes) to benefit from optimized heap layout arrangements, without restricting their ability to participate in other object graphs (both by referring to other objects, and by being referred to from other objects), and without restricting their ability to act as first-class citizens and be passed around to various libraries that would expect that ability (meaning that participating object can be still be synchronized on, for example, and can still have identity hash codes).

The ObjectLayout "vanilla" Java class implementations (such as the current StructuredArray) are NOT intended to provide the same performance and layout benefits as their eventual intrinsic implementations will. Instead, they are intended to provide a fully functional implementation of the contract represented by the class, such that code will run portably across all (Java SE 6 and above) JDKs, while being able to "scream" in newer JDKs that would recognize them and intrinsify them (think of AtomicLong running on Java SE 5 vs. "vanilla" Java 1.4.2).

ObjectLayout does NOT make any attempt to deal with the various needs for off-heap structured access. It is orthogonal to other work done for that purpose.

ObjectLayout does NOT try to pack memory or save memory. The way I see it, memory is super-cheap and abundant. To the point where our real problem is finding good creative ways to waste more of it in order to gain real benefit (not that ObjectLayout wastes much memory). So instead, ObjectLayout is focused on reducing indirection, improving locality, and making regular striding patterns exposed to hardware (or software) prefetchers.


So here it is.... Have at it. Comments (both sane and otherwise) are welcome: 


-- Gil.

Gil Tene

unread,
Oct 24, 2014, 7:29:22 PM10/24/14
to mechanica...@googlegroups.com


On Friday, October 24, 2014 1:20:20 PM UTC-7, Mikhail Khludnev wrote:
Hello,

I've attended the talk about the subj this week. I really like it. Though, I'm not able to contribute into, I wonder if it's possible to avoid ConstructorMagic threadlocal? Really sorry about off-top. I suppose I found something relevant in my favorite book - API Design. 
I suppose we can apply this "friend code" trick, if we make StructuredArray final and move newInstance() to some frien "Accessor". Does it worth to look into? Or here is a bummer? 

The main reason we need some like the ConstructorMagic trick in both StructuredArray and the variations on AbstractPrimitiveArray is that we want those classes to be sub-classable (hence they cannot be declared final). Subclassing allows the arrays to also carry class-specific fields and behavior via an "is a" relationship, rather than incur the extra dereferencing cost of going through a "has a" relationship.

This allows for a much cleaner and object oriented coding style that promotes the right abstraction layers. E.g. look at the Octagons class in the examples src directory.

You can't support subclassing without exposing a constructor and making it possible for anyone (a subclass, or someone else) to "new" an instance of your class. But we can't allow that "new" to occur if we want the JVM to be able to lay out the object flat. E.g. because with "new", the information about the size needed for allocation is not known until the constructor is called, which happens *after* allocation has already happened. So we must force instantiation to only occur via the newInstance() and copyInstance() variants, and we must expose constructors. The way to do both of these is to make direct construction never actually work, by throwing exceptions on any construction that occurs without going though the prescribed factory methods. That's what ConstructorMagic achieves.

There may be other ways (other than ConstructorMagic, that is) to achieve this, but I don't think there is a way around the requirements themselves, which are to enable subclassing, and to prevent direct construction. 

Kirk Pepperdine

unread,
Oct 26, 2014, 5:35:18 AM10/26/14
to mechanica...@googlegroups.com
Hi Mikhail,

Maybe you could add a patch to see how it affects the code. The code is very reachable in that it’s very well structured and very readable so I don’t think you’ll have many problems in that regards.

Kind regards,
Kirk

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

signature.asc

Mikhail Khludnev

unread,
Oct 27, 2014, 4:22:30 PM10/27/14
to mechanica...@googlegroups.com
Thanks Gil!

Octagons make it clear.

public class Octagons extends StructuredArray<Octagon> {  
   
public Octagons() {   }
   public Octagons(Octagons source) {
       super(source);
}

Reply all
Reply to author
Forward
0 new messages