I am trying to understand what barriers, if any, exist to supporting C-
style structs in JVM languages. I have considered trying to add
support of such to Clojure for my own research programming (I work on
local search algorithms for bioinformatics applications). I assume
that the right way to do this will involve emitting byte code, but I
have not yet studied the byte code or its verification by the JVM in
enough detail. I thought this group might be able to answer the
following questions before I invest time in a fruitless search for a
good solution:
1) Is there any inherent limitation in the bytecode, or the JVM
verification algorithm, which makes it impractical to represent
structured, non-reference types? Intuitively it seems that functions
and operations on structs are reducible to functions and operations on
basic types (int, float and friends). But, while C# supports structs,
Java does not and I have not seen any indication that it will, and I
have not seen a JVM language which does either. (Maybe Scala does?
Not familiar enough with it to be sure). I can't tell if this is just
for "religious" reasons, but it seems to me an obviously useful
feature and there should be a good reason it isn't included.
2) If this has already been implemented for some language, can someone
provide a link to details of how it was implemented?
This feature is very important for certain algorithmic applications,
where one wants to store a very large number of small records -- the
memory/performance penalty (especially memory) of having one or more
extra pointers per record using reference types (Objects) is really
frustrating. I'm sure this must be well-trodden ground for JVM
language implementors but I haven't been able to track down a
comprehensive discussion online.
On Monday 07 December 2009 17:26:03 Joe.Whitney wrote:
> Hello,
> I am trying to understand what barriers, if any, exist to supporting C-
> style structs in JVM languages. I have considered trying to add
> support of such to Clojure for my own research programming (I work on
> local search algorithms for bioinformatics applications).
You would need to add support to the JVM as well, of course. Unless you are thinking of using Clojure's CLR backend.
> 1) Is there any inherent limitation in the bytecode, or the JVM
> verification algorithm, which makes it impractical to represent
> structured, non-reference types? Intuitively it seems that functions
> and operations on structs are reducible to functions and operations on
> basic types (int, float and friends). But, while C# supports structs,
> Java does not and I have not seen any indication that it will, and I
> have not seen a JVM language which does either. (Maybe Scala does? > Not familiar enough with it to be sure).
The JVM is incapable of expressing structs so no JVM-based language can support them. CLR-based languages supporting structs (e.g. C# and F#) simply delegate the work to the CLR.
> I can't tell if this is just > for "religious" reasons, but it seems to me an obviously useful
> feature and there should be a good reason it isn't included.
When the JVM was designed and built it was common to rely upon a uniform representation and efficient allocation by the runtime to obtain decent performance without having to support arbitrary value types. Languages like OCaml also do this.
Since then, the CLR has shown that value types (structs and primitive types) can relieve a lot of stress from the GC and that this is much more important in the context of multicores because concurrent GCs are (and will probably always be) inherently slow.
> 2) If this has already been implemented for some language, can someone
> provide a link to details of how it was implemented?
> This feature is very important for certain algorithmic applications,
> where one wants to store a very large number of small records -- the
> memory/performance penalty (especially memory) of having one or more
> extra pointers per record using reference types (Objects) is really
> frustrating. I'm sure this must be well-trodden ground for JVM
> language implementors but I haven't been able to track down a
> comprehensive discussion online.
Yes. In a recent discussion on this list, structs were cited as probably the single most important feature to add to the JVM.
I wouldn't hold your breath though: I waited for tail calls to be added to the JVM for years before giving up and writing my own VM with structs and tail calls specifically designed for scientific computing...
I'm very interested in this issue too.
We also want to hold very large list of small fixed structures in memory.
There is very large (somehow segmented) routing graph in our scenario.
Every segment has list of nodes with properties, list of edges with
properties, and maybe some additional info.
We want to create very large file on disk and mmap particular segments
of graph into memory (some JNI code).
Every node and edge has some fixed structure so the natural approach
in C-style is array of structures.
As JVM doesn't know anything about structures (AFAIK), our workarround
is structure (object) of primitive arrays.
So instead of accessing "nodes[i].property" we'll try to do it this
way: "nodes.getPropertyArray()[i]".
With a bit of JNI hacking in custom mmap library it could be possible.
I hope so. :)
But primitive structures would be much better for cache coherency, I guess.
On Mon, Dec 7, 2009 at 6:26 PM, Joe.Whitney <jwhit...@gmail.com> wrote:
> Hello,
> I am trying to understand what barriers, if any, exist to supporting C-
> style structs in JVM languages. I have considered trying to add
> support of such to Clojure for my own research programming (I work on
> local search algorithms for bioinformatics applications). I assume
> that the right way to do this will involve emitting byte code, but I
> have not yet studied the byte code or its verification by the JVM in
> enough detail. I thought this group might be able to answer the
> following questions before I invest time in a fruitless search for a
> good solution:
> 1) Is there any inherent limitation in the bytecode, or the JVM
> verification algorithm, which makes it impractical to represent
> structured, non-reference types? Intuitively it seems that functions
> and operations on structs are reducible to functions and operations on
> basic types (int, float and friends). But, while C# supports structs,
> Java does not and I have not seen any indication that it will, and I
> have not seen a JVM language which does either. (Maybe Scala does?
> Not familiar enough with it to be sure). I can't tell if this is just
> for "religious" reasons, but it seems to me an obviously useful
> feature and there should be a good reason it isn't included.
> 2) If this has already been implemented for some language, can someone
> provide a link to details of how it was implemented?
> This feature is very important for certain algorithmic applications,
> where one wants to store a very large number of small records -- the
> memory/performance penalty (especially memory) of having one or more
> extra pointers per record using reference types (Objects) is really
> frustrating. I'm sure this must be well-trodden ground for JVM
> language implementors but I haven't been able to track down a
> comprehensive discussion online.
> Many thanks in advance for your insights,
> Joe
> --
> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
> To post to this group, send email to jvm-languages@googlegroups.com.
> To unsubscribe from this group, send email to jvm-languages+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
This example can be made to work by statically expanding center into
circle and then rewriting c.center.* to access the expanded fields.
The following, however, cannot be made to work (without a data copy):
boolean foo(circle c1, circle c2) {
return contains(c1, c2.center);
}
The problem is the access to c2.center as a top-level structure.
Recall from the previous step that you had to expand its field's into
the top level structure. In the case where a struct contains at most
1 other struct, you can cheat and use inheritance to get the correct
behavior, but that is a total hack and does not work in the general
case. What you would need to do, is generate a method to pack
circle's center fields into a point struct on demand. This would
incur a data-copy when you directly reference a sub-struct, but the
common case of only accessing leaves would still be able to implement
efficiently. For more deeply nested structures, recurse ;-)
On Mon, Dec 7, 2009 at 1:01 PM, John Wilson <tugwil...@gmail.com> wrote:
> I don't think you can do this in a straightforward way (but possibly
> somebody will be long soon to enlighten both of us!).
> When I want a large number of identical "structures" in Java I use an
> array of arrays. Would this work for you?
> John Wilson
> --
> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
> To post to this group, send email to jvm-languages@googlegroups.com.
> To unsubscribe from this group, send email to jvm-languages+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
On Mon, Dec 7, 2009 at 2:19 PM, Jon Harrop <j...@ffconsultancy.com> wrote:
> The JVM is incapable of expressing structs so no JVM-based language can
> support them. CLR-based languages supporting structs (e.g. C# and F#) simply
> delegate the work to the CLR.
The obvious approach, though I haven't worked out all details, is to
expand structs into groups of variables, including any nested structs,
of course. This would nicely handle all uses of structs in local
variables, assignments, and procedure calls, though it wouldn't allow
methods on structs (do we really need those, given that struct
instance variables are public?) If you want to allow non-default
constructors, you have to be able to inline them, though, so they
can't be too complicated.
The struct still needs to be represented by a classfile at compile
time so that the compiler knows what to do, but the class never needs
to be loaded.
"For arrays-of-small-structs (e.g. arrays of Complex), you are
correct: Java's lousy there (and I added that to C's strengths). When
doing performance sensitive arrays-of-small-structs I turn the
implementation 90-degrees and implement a small-struct-of-arrays.
It's clearly a work-around over a clunky language issue... but the
performance is solidly there (both in memory footprint and in access
speed)."
On Monday 07 December 2009 18:31:55 John Cowan wrote:
> On Mon, Dec 7, 2009 at 2:19 PM, Jon Harrop <j...@ffconsultancy.com> wrote:
> > The JVM is incapable of expressing structs so no JVM-based language can
> > support them. CLR-based languages supporting structs (e.g. C# and F#)
> > simply delegate the work to the CLR.
> The obvious approach, though I haven't worked out all details, is to
> expand structs into groups of variables,
Yes. Structs are aggregate registers in the LLVM too.
> including any nested structs, > of course. This would nicely handle all uses of structs in local
> variables, assignments, and procedure calls, though it wouldn't allow
> methods on structs (do we really need those, given that struct
> instance variables are public?) If you want to allow non-default
> constructors, you have to be able to inline them, though, so they
> can't be too complicated.
Users will want custom comparison, equality, hashing and serialization for their structs.
On Mon, Dec 7, 2009 at 2:49 PM, Jon Harrop <j...@ffconsultancy.com> wrote:
> Users will want custom comparison, equality, hashing and serialization for
> their structs.
Equality can be structural, and hashing can be straightforward.
Serialization can be default. If you want structs, you have to play
nice. The more serious problem is putting a struct into a collection,
which wouldn't work under this interpretation.
Comparison, maybe, although Comparator would cover it.
> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
> To post to this group, send email to jvm-languages@googlegroups.com.
> To unsubscribe from this group, send email to jvm-languages+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
you could wrap the struct into the generated class, a la
java.lang.Integer and other boxed structures when needed. I suppose
for the simple cases (Comparable, hashCode, equals), escape analysis
would make the performance pretty good
On Mon, Dec 7, 2009 at 5:13 PM, John Cowan <johnwco...@gmail.com> wrote:
> On Mon, Dec 7, 2009 at 2:49 PM, Jon Harrop <j...@ffconsultancy.com> wrote:
>> Users will want custom comparison, equality, hashing and serialization for
>> their structs.
> Equality can be structural, and hashing can be straightforward.
> Serialization can be default. If you want structs, you have to play
> nice. The more serious problem is putting a struct into a collection,
> which wouldn't work under this interpretation.
> Comparison, maybe, although Comparator would cover it.
>> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
>> To post to this group, send email to jvm-languages@googlegroups.com.
>> To unsubscribe from this group, send email to jvm-languages+unsubscribe@googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
> To post to this group, send email to jvm-languages@googlegroups.com.
> To unsubscribe from this group, send email to jvm-languages+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
> Users will want custom comparison, equality, hashing and serialization for
> their structs.
This points out there are two (or three) inter-twined problem areas:
(1) The specification and properties of structs at the JVM level.
(2) The specification and properties of structs at the source level
(Java and other languages).
(3) Work in the JVM implementations are use structs efficiently.
Some design isues:
* Should the fields of a struct be 'final' - i.e. read-only?
Such "value structs" have a number of optimization advantages,
since it makes it easier for the VM to copy them around. And
most use-cases are fine with immutable value-structs - if you
need to modify a field, just assign a new value on top of the
old one. (This is cheap and optimzable since we don't need
allocation.) At least the HotSpot engineers have told me that
immutable structs are going to be easier to optimize and so
should be what we focus on first.
* How do structs interoperate with Object, if at all?
I.e. how do you box and unbox structs?
* Related to the above are issues of equality and identity.
Clearly a struct does not have "identity". Presumably the
== operator should be defined in terms of equals, which should
default to field-by-field equality. But now when the struct
is boxed, we suddenly get identity, and the meaning of
== changes.
We have the same problem with the existing primitive types,
where the meaning of == is "wrong" when they're boxed. A
fix is for valueOf to always intern; then == and equals are
always the same, and so does not break with boxing. (Of
course the interning can be conceptual and optimized away
for pure value types, as long as == is defined to be equals.)
-- --Per Bothner
p...@bothner.com http://per.bothner.com/
Not really what you're after, but in order to allow system native
methods to be implemented in Java on the iSeries we put in place a
mechanism to allow one object to be mapped over another, and then a
way to be able to specify the precise layout of fields. As a
simplistic example, the Double.doubleToLongBits method could be
implemented by creating a Double object, then mapping a Long object
over the top of it and extracting the long value. (This specific use
wasn't really useful, of course, since the cost of creating the Long
object would swamp any savings.) Unsafe as Hell, of course, since
fields could be mapped arbitrarily and unchecked casting performed,
along with arbitrary address arithmetic, but OK for "trustworthy" code
that was replacing unsafe C native methods anyway.
(Since this was pre-annotation, we used a String literal in the class
to inform us of the mappings, etc. The literal would only be looked
for and interpreted for classes loaded from a privileged directory.)
The major advantage was that the Java could then be inlined by the
JIT, resulting in the efficiency of JIT "built-ins" without the
attendant complexity and maintenance difficulties.
On Dec 7, 11:26 am, "Joe.Whitney" <jwhit...@gmail.com> wrote:
> I am trying to understand what barriers, if any, exist to supporting C-
> style structs in JVM languages. I have considered trying to add
> support of such to Clojure for my own research programming (I work on
> local search algorithms for bioinformatics applications). I assume
> that the right way to do this will involve emitting byte code, but I
> have not yet studied the byte code or its verification by the JVM in
> enough detail. I thought this group might be able to answer the
> following questions before I invest time in a fruitless search for a
> good solution:
> 1) Is there any inherent limitation in the bytecode, or the JVM
> verification algorithm, which makes it impractical to represent
> structured, non-reference types? Intuitively it seems that functions
> and operations on structs are reducible to functions and operations on
> basic types (int, float and friends). But, while C# supports structs,
> Java does not and I have not seen any indication that it will, and I
> have not seen a JVM language which does either. (Maybe Scala does?
> Not familiar enough with it to be sure). I can't tell if this is just
> for "religious" reasons, but it seems to me an obviously useful
> feature and there should be a good reason it isn't included.
> 2) If this has already been implemented for some language, can someone
> provide a link to details of how it was implemented?
> This feature is very important for certain algorithmic applications,
> where one wants to store a very large number of small records -- the
> memory/performance penalty (especially memory) of having one or more
> extra pointers per record using reference types (Objects) is really
> frustrating. I'm sure this must be well-trodden ground for JVM
> language implementors but I haven't been able to track down a
> comprehensive discussion online.
Various solutions using arrays will "work" for some situations, and I
have in fact used arrays, sparse matrices, etc to represent structured
information. Usually the transform is from struct.field to field
[indexofstruct]. This means I end up using integers (or short, long,
whatever) as the "type" of every (logical) structure. I think even a
set of macros to do this kind of transform would be better than
nothing, though it would be far from trivial to get right because you
have to re-invent memory management in some form, even if it's just
passing the burden explicitly to the user. Doing it "manually"
involves all sorts of bug-prone boilerplate littering my code.
For example I might have a logical struct implemented as a bunch of
arrays: int struct_field1[SIZE], double struct_field2[SIZE], etc. I
can use integers as "pointers", so that struct_field2[structp] can be
used in place of (C-style) p->struct_field2. But suppose I want to
create an array of structs sorted using some special predicate? If I
don't want every instance of struct to be in this new array, I have to
either create a separate set of arrays sorted_struct_field1[OTHERSIZE]
etc, or use an index,length pair to define this logical array in terms
of the "master" array, so that logically sorted_structs[index] is
represented as struct_field1[sorted_structs_offset+index]. Needless
to say, without having these logical entities reified more explicitly
the code becomes crazy-making fast. I have made a few attempts at
frameworks to make this less painful, but ultimately threw up my hands
because Java's treatment of basic types makes it impossible to encode
the relevant operations in a generic (unboxed) way.
I'm also convinced that my performance suffers in such a scheme even
when I maintain arrays manually, because I usually want to access
multiple fields of one struct, not the same field of multiple structs,
consecutively.
Hope the above rant makes sense, perhaps others have discovered a
disciplined way of simulating structs which does not drive them crazy.
-Joe
On Dec 7, 1:01 pm, John Wilson <tugwil...@gmail.com> wrote:
Thanks very much for the jrose link. It gives me a much better sense
of what the *specific* limitations are. Of those he mentions, the
lack of a "vreturn" opcode for returning multiple values seems the
most difficult to hack around without using intermediate boxing of
structs.
On Dec 7, 1:35 pm, Patrick Wright <pdoubl...@gmail.com> wrote:
> "For arrays-of-small-structs (e.g. arrays of Complex), you are
> correct: Java's lousy there (and I added that to C's strengths). When
> doing performance sensitive arrays-of-small-structs I turn the
> implementation 90-degrees and implement a small-struct-of-arrays.
> It's clearly a work-around over a clunky language issue... but the
> performance is solidly there (both in memory footprint and in access
> speed)."