int -> double[] map

29 views
Skip to first unread message

Nick Gerner

unread,
Apr 14, 2013, 12:31:15 PM4/14/13
to java-high-performance...@googlegroups.com
I'm currently using IntObjectOpenHashMap to implement a set of sparse vectors.

So far it's been great, but I'm finding the generics overhead to be far too high: the class casting is about twice as expensive as just iterating over elements, far more expensive than, say taking the log or exponentiating a double.

So I'm looking for a non-generic int -> double[] map.

My first thought was to fork the template processor and support double[] as a built in type, but that seems to be a little bit tricky (since the syntax won't follow the established patterns for primitives).

My options now look like:
* build a new set of templates for arrays of primitives (as in KTypeVTypeArrayOpenHashMap). But I haven't dug into the source enough yet for that.
* refactor my code so instead of one map int -> double[] I have several maps int -> double (which seems like a big memory overhead)
* bail on hppc and try out something else like jblas or something that supports sparse vectors.

Any thoughts on how to proceed here?

Dawid Weiss

unread,
Apr 14, 2013, 3:54:00 PM4/14/13
to java-high-performance...@googlegroups.com
Let me just clarify a few things to make sure I understand you well.

> So I'm looking for a non-generic int -> double[] map.

I assume you were using IntObjectOpenHashMap<double[]>, right (with a
primitive array as the value)? And this still has overhead too large
for you? How many unique keys do you have? How many doubles per that
array, on average?

> So far it's been great, but I'm finding the generics overhead to be far too
> high: the class casting is about twice as expensive as just iterating over
> elements, far more expensive than, say taking the log or exponentiating a
> double.

Class cast (to double[]) should be a very cheap operation once it goes
over the full JIT (c2 compiler with stats). For hotspot this means >
10k loop/ code fragment executions where the compiler trigger is hit.
You can't really compare ant of this to operations on primitive values
since these are implemented directly in the cpu, are typically
executed in processor superscalar pipelines etc. This isn't a fair
comparison but I still think casting itself is not a problem --
perhaps bounds checking is or something else is.

> So I'm looking for a non-generic int -> double[] map.

I'd say one of the easiest things to do is to create a single backing
double[] and then use int->int map where the value points to that
double[] buffer. Depending on your data you may then use skip pointers
or even prebuffered allocations in that static buffer... So not a
single data structure but a combination of them to fit your problem/
data.

> * bail on hppc and try out something else like jblas or something that
> supports sparse vectors.

Don't know how these 'sparse vectors' are implemented but I have a
feeling I don't quite get your scenario and hence the confusion.
Sebastiano Vigna's fastutil has some "large" collections (which can be
indexed using long pointers instead of ints). Perhaps this will help
you somehow.

Dawid

Nick Gerner

unread,
Apr 15, 2013, 3:43:49 PM4/15/13
to java-high-performance...@googlegroups.com

I assume you were using IntObjectOpenHashMap<double[]>, right (with a
primitive array as the value)? And this still has overhead too large
for you? How many unique keys do you have? How many doubles per that
array, on average?

Yes, IntObjectOpenHashMap<double[]>
I have some trivial test cases that include every 3rd or 7th int from 1 to 1 million. But my production data is not so uniform. If I had to guess I bet it's much more sparse than that.
The array has between 1 and 15 values.


Class cast (to double[]) should be a very cheap operation once it goes
over the full JIT (c2 compiler with stats). For hotspot this means >
10k loop/ code fragment executions where the compiler trigger is hit.
You can't really compare ant of this to operations on primitive values
since these are implemented directly in the cpu, are typically
executed in processor superscalar pipelines etc. This isn't a fair
comparison but I still think casting itself is not a problem --
perhaps bounds checking is or something else is.

I might have overstated the cost of the class cast, but it's still non-trivial. 
Take a map of 1 million elements and 0.5 fill factor (these are averages over 1000s of trials):
I can iterate over the entire map of 1 mil elements in 3ms, including allocated checks.
I can pull out the specific keys in question in 5ms
I can pull out the values as Object in 7ms
but to coerce them into double[] takes maybe 18ms.
 

> So I'm looking for a non-generic int -> double[] map.

I'd say one of the easiest things to do is to create a single backing
double[] and then use int->int map where the value points to that
double[] buffer. Depending on your data you may then use skip pointers
or even prebuffered allocations in that static buffer... So not a
single data structure but a combination of them to fit your problem/
data.

This is a very interesting idea. I like it. 
You're proposing something like:

IntIntOpenHashMap indexMap;
double[][] data;
int nextIndex;

double[] datum = data[indexMap.get(key)];

and populating it something like:

void put(int key, double[] datum) {
  //probably check if key exists
  if(data.length <= nextIndex) {
    reallocateDataByDoubling(data.length*2);
  }
  data[nextIndex] = datum;
  indexMap.put(key, nextIndex++);
}

I'll need to refactor some code to do that (since I'm directly using IntObjectOpenHashMap in a couple of places). But this is very interesting none-the-less. I guess I better get a little bit more certain about the overhead of the class casting, but regardless, it can't possibly be good. I bet I can get a 20%+ perf improvement by avoiding it.
 

> * bail on hppc and try out something else like jblas or something that
> supports sparse vectors.

Don't know how these 'sparse vectors' are implemented but I have a
feeling I don't quite get your scenario and hence the confusion.
Sebastiano Vigna's fastutil has some "large" collections (which can be
indexed using long pointers instead of ints). Perhaps this will help
you somehow. 

The sparse vector is just a map instead of an array. As in, v_i = map.get(i) instead of v_i = v[i]
The reason it's a map int -> double[] is because I actually have set of vectors that are indexed the same way (I guess this is a matrix, but I'm not doing any matrix operations, so I try not to think of it that way).

I'll probably futz around with your int->int + backing double[][] idea, see if that makes a difference.

Thanks!

--Nick

Dawid Weiss

unread,
Apr 15, 2013, 5:00:48 PM4/15/13
to java-high-performance...@googlegroups.com
> I might have overstated the cost of the class cast, but it's still
> non-trivial.
> Take a map of 1 million elements and 0.5 fill factor (these are averages
> over 1000s of trials):
> I can iterate over the entire map of 1 mil elements in 3ms, including
> allocated checks.
> I can pull out the specific keys in question in 5ms
> I can pull out the values as Object in 7ms
> but to coerce them into double[] takes maybe 18ms.

Your estimates feel very wrong to me. Don't know how you measure these
but it just doesn't seem right. Hotspot is very clever in terms of
class casts and optimizing code away, especially in tight loops. You
should *not* try to estimate every operation in isolation because it's
simply not how JVM works at lower level.

> This is a very interesting idea. I like it.
> You're proposing something like:
>
> IntIntOpenHashMap indexMap;
> double[][] data;
> int nextIndex;

No. What I proposed was to use:

IntIntOpenHashMap indexMap;
double[] data;
int firstEmptySlotInData;

double[][] is basically an array of objects (of type double[]) so the
overhead is significant. If you know there will be a few values in
each list associated with a key then you can use a single double[] and
encode the list in "blocks" of consecutive elements, overallocating a
bit. So the first block is, say, a maximum of 3 elements + header, the
next one is 7 elements + header etc. The header (first double on the
list) is typically used to encode the number of elements in the block
(if negative, for example) or the jump pointer to the next block.

All this requires some bit-fiddling and trickiness (like encoding
integer values using the first double -- Double.doubleToRawLongBits,
etc.) but will conserve memory.

> overhead of the class casting, but regardless, it can't possibly be good. I
> bet I can get a 20%+ perf improvement by avoiding it.

Again, I think you're trying to overoptimize things that you shouldn't
worry about until you really hit a performance problem.

D.
Reply all
Reply to author
Forward
0 new messages