Preliminary comparison to Trove 2.1 and FastUtil 5.1

Karl The Pagan

unread,

Jun 19, 2010, 5:45:26 PM6/19/10

to ning-tr13-users

On an Atom N280 1.66GHz / 2GB

trove-2.1.0.jar 802KB
fastutil-5.1.5.jar 13617KB

jdk 1.6.0_20 ia32
-server -Xms768m -Xmx768m

Avoiding string storage is of course the major savings here.
Differences between Trove, FastUtil, and j.u.HashMap help reveal the
more general compactness:

small-trie.txt (recorded on separate runs):
j.u HashMap: 188256k
Trove TObjectIntHashMap: 176439k (adds 51454k in same run)
FastUtil Object2IntOpenHashMap: 161862k (adds 36865k in same run)
Tr13: 20636k

Performance (best of 7):
j.u 1203ms
Trove 1985ms
FastUtil 2454ms
Tr13 14844ms

My only practical concern is that in tight memory situations Tr13's
performance was penalized by up to 10s. I increased from 512 to 768
megs to see the difference. jdk7 b96 with G1 GC and escape analysis
produced similar performance savings.

I'm pretty excited for the potential for very lightweight client
applications which take advantage of in-memory indexes.

Tatu Saloranta

unread,

Jun 21, 2010, 6:21:13 PM6/21/10

to ning-tr...@googlegroups.com

On Sat, Jun 19, 2010 at 2:45 PM, Karl The Pagan <karlth...@gmail.com> wrote:
> On an Atom N280 1.66GHz / 2GB
>
> trove-2.1.0.jar 802KB
> fastutil-5.1.5.jar 13617KB
>
> jdk 1.6.0_20 ia32
> -server -Xms768m -Xmx768m
>
> Avoiding string storage is of course the major savings here.
> Differences between Trove, FastUtil, and j.u.HashMap help reveal the
> more general compactness:
>
> small-trie.txt (recorded on separate runs):
> j.u HashMap: 188256k
> Trove TObjectIntHashMap: 176439k (adds 51454k in same run)
> FastUtil Object2IntOpenHashMap: 161862k (adds 36865k in same run)
> Tr13: 20636k
>
> Performance (best of 7):
> j.u 1203ms
> Trove 1985ms
> FastUtil 2454ms
> Tr13 14844ms

Interesting results, thanks for sharing. Space-wise, results look
similar to what I saw.
And speed-wise there's bit of difference, but it's roughly similar
(for me it was only factor of 5x, but given how many operations are
done it's not surprising to get big relative differences).
It is bit surprising to see Trove and FastUtil being slower than
regular java.util HashMap.

> My only practical concern is that in tight memory situations Tr13's
> performance was penalized by up to 10s. I increased from 512 to 768
> megs to see the difference. jdk7 b96 with G1 GC and escape analysis
> produced similar performance savings.

Interesting. Did you use byte-array based version? That would be more
prone to issues.

Also, one improvement I need to do is to replace default UTF-8 codec
(which just uses JDK's converter) with one more optimized to handling
short keys. I should be able to re-purpose something from Woodstox or
Jackson code, just haven't had time to do it yet. This might help with
performance and also reduce garbage production.

> I'm pretty excited for the potential for very lightweight client
> applications which take advantage of in-memory indexes.

That is kind of use case that I was building this for.

Also, given that structure is sorted (usually alphabetically), I think
it's even possible to do simple range queries if that seems useful. As
things are sorting is not taken much advantage of (since I can't think
of good way to do it), but range queries -- find iterator at or after
given key, iterate from there on -- would seem like a useful
additional feature for some use cases.

-+ Tatu +-

karlthepagan

unread,

Jun 22, 2010, 2:13:20 AM6/22/10

to ning-tr...@googlegroups.com

On Mon, Jun 21, 2010 at 3:21 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:

Did you use byte-array based version? That would be more
prone to issues.

Yes, byte-array based.

Also, one improvement I need to do is to replace default UTF-8 codec
(which just uses JDK's converter) with one more optimized to handling
short keys. I should be able to re-purpose something from Woodstox or
Jackson code, just haven't had time to do it yet. This might help with
performance and also reduce garbage production.

I'm also pretty interested in UTF-8 handling (searching, matching, and ui) in constant-memory situations. I've played with making a CharacterIterator flyweight made for addressing text in a ByteBuffer or byte[].

Also, given that structure is sorted (usually alphabetically), I think
it's even possible to do simple range queries if that seems useful.

I think that would be particularly useful for a data structure where the key (and uuid of a request) is something like an incremental UID or milliseconds or nanoseconds time value. I frequently build up a Map (concurrent for servers, but for my mobile apps that's unnecessary) where the key is a Long nanos. The nanos value is used as the request ID and maintenance code occasionally takes a sub-map and expires all of the requests.

The big advantage could be that in a segmented implementation of Tr13 the expired buffer parts of the structure could be freed (or recycled!) as they expire. Within a few minutes I expect all requests to either be handled or force them to expire so such a system would constantly recover or recycle segments.

Mutating a Tr13 in such a way which the key or data length remains unchanged is another requirement for such a use-case. It looks like it is possible to express a VInt with extra leading 0 bits so that shouldn't be a problem. Do you see any other possible obstacles to that feature?

Colin Fleming

unread,

Jun 22, 2010, 6:50:21 AM6/22/10

to ning-tr...@googlegroups.com

> I think that would be particularly useful for a data structure where the key
> (and uuid of a request) is something like an incremental UID or milliseconds
> or nanoseconds time value. I frequently build up a Map (concurrent for
> servers, but for my mobile apps that's unnecessary) where the key is a Long
> nanos. The nanos value is used as the request ID and maintenance code
> occasionally takes a sub-map and expires all of the requests.

Interesting use case, but wouldn't that require the trie to be
mutable? Unless I misunderstood, the tr13 tries have to be built ahead
of time from sorted data. Although if you're only ever appending keys
which are greater than any key in the map and expiring from the lowest
item, you could probably make this mutable fairly efficiently.

I was wondering whether it would be useful to include a more standard
mutable trie in tr13 that you could then convert to a more efficient
immutable one. I think for a lot of indexing tasks you need an easy
way to build the index, then you could convert to an efficient static
one and either serialise it or use it directly.

Cheers,
Colin

Tatu Saloranta

unread,

Jun 22, 2010, 2:52:12 PM6/22/10

to ning-tr...@googlegroups.com

On Mon, Jun 21, 2010 at 11:13 PM, karlthepagan <karlth...@gmail.com> wrote:
> On Mon, Jun 21, 2010 at 3:21 PM, Tatu Saloranta <tsalo...@gmail.com>
> wrote:
>>
>> Did you use byte-array based version? That would be more
>> prone to issues.
>
> Yes, byte-array based.
>>
>> Also, one improvement I need to do is to replace default UTF-8 codec
>> (which just uses JDK's converter) with one more optimized to handling
>> short keys. I should be able to re-purpose something from Woodstox or
>> Jackson code, just haven't had time to do it yet. This might help with
>> performance and also reduce garbage production.
>
> I'm also pretty interested in UTF-8 handling (searching, matching, and ui)
> in constant-memory situations. I've played with making a CharacterIterator
> flyweight made for addressing text in a ByteBuffer or byte[].

Yeah, dealing with ByteBuffer is an interesting area, since so far I
know how to optimize byte[] backed things well.
But ByteBuffer is bit different.

>> Also, given that structure is sorted (usually alphabetically), I think
>> it's even possible to do simple range queries if that seems useful.
>
> I think that would be particularly useful for a data structure where the key
> (and uuid of a request) is something like an incremental UID or milliseconds
> or nanoseconds time value. I frequently build up a Map (concurrent for
> servers, but for my mobile apps that's unnecessary) where the key is a Long
> nanos. The nanos value is used as the request ID and maintenance code
> occasionally takes a sub-map and expires all of the requests.
> The big advantage could be that in a segmented implementation of Tr13 the
> expired buffer parts of the structure could be freed (or recycled!) as they
> expire. Within a few minutes I expect all requests to either be handled or
> force them to expire so such a system would constantly recover or recycle
> segments.

Interesting idea... yes, in fact another use case that I have uses
similar storage. I think that is more and more common, essentially
keeping bounded in-memory queue of things, possibly offloading to
secondary storage for rest, or just cleaning it up.

> Mutating a Tr13 in such a way which the key or data length remains unchanged
> is another requirement for such a use-case. It looks like it is possible to
> express a VInt with extra leading 0 bits so that shouldn't be a problem. Do
> you see any other possible obstacles to that feature?

It may be tricky to split up trie structure to bounded (strictly or
loosely) segments.
But it seems like an interesting goal... I need to think about that
some more. Like I said, this could be ideal for queue-like system I am
building.
Right now I will just use tr13 to access keys within segments, but I
have been thinking of expanding this for actual data storage as well;
this works as long as iterator support exists.
And when that is done, it would even be possible to use file-backed
ByteBuffer to use disk (SSD-based most likely) as sort of virtual
memory -- newest blocks/segments in memory, older ones (less
frequently needed) on disk.

Very interesting ideas!

-+ Tatu +-

Tatu Saloranta

unread,

Jun 22, 2010, 2:58:12 PM6/22/10

to ning-tr...@googlegroups.com

Yes, having true mutable trie would be useful. For initial systems I
am thinking of using simple hybrid solutions, with smaller mutable
part using just java.util.HashMap (etc), and then merging immutable
tr13 with this, to produce next-generation tr13.

Challenge with mutability is that of limiting amount of data being
moved around. The way tr13 data is structured (which I must document
soon; it's simple, but non-obvious reading from code), additions would
lead to extensive relocating of data. Although branch nodes only
contain length information of all descendants (not offsets), lengths
are expressed as VInts, so changes in length of children lead to
changes of length indicators as well. One option would be to allow
fluff within structure, if insertion points were likely to be known...
but I don't have good idea of details here.

However -- if additions are only at the end, it might be less drastic.
And/or maybe append-only -- or, better yet, remove from head, add to
tail (or vice versa) -- structure could be designed to accomodate
this.
Maybe we could have q4 for "queue" (or qu3?). :-)

-+ Tatu +-

karlthepagan

unread,

Jun 22, 2010, 4:55:12 PM6/22/10

to ning-tr...@googlegroups.com

On Tue, Jun 22, 2010 at 11:58 AM, Tatu Saloranta <tsalo...@gmail.com> wrote:

Yeah, dealing with ByteBuffer is an interesting area, since so far I
know how to optimize byte[] backed things well.
But ByteBuffer is bit different.

One of the hypothesis I regularly apply to buffer optimizations (in Java's ByteBuffer or AS3's ByteArray) is whether it is less expensive to address the bytes individually (as you seem to do with byte[]) or address them in larger units and use shift and mask tricks to isolate the slower paths.

Of course, the major platform reality is that this bit-twiddling can have wildly varying payback on different VM's, but it seems quite effective on LLVM, and I have hope for it on Android systems. I will absolutely be doing performance runs and micro-benchmarks on my Android devices as well as in the emulator.

One option would be to allow
fluff within structure, if insertion points were likely to be known...
but I don't have good idea of details here.

Many applications key or value lengths can either be fixed or have "probable upper bound" - take the average word length +1 to 3 stddev and you only have to shift bytes when the value is outside that bound. IMO let the user set a fixed-length value for a mutable Tr13 and make them detect when the new value is too long to set. For fixed memory systems I favor an error condition return (i.e. public boolean put(k,v) returns false) or else silently truncating the value (yes meaning mutate the last byte for VInt). After all, the library user can detect when their value violates the fixed length they set.

Tatu Saloranta

unread,

Jun 22, 2010, 8:30:22 PM6/22/10

to ning-tr...@googlegroups.com

On Tue, Jun 22, 2010 at 1:55 PM, karlthepagan <karlth...@gmail.com> wrote:
> On Tue, Jun 22, 2010 at 11:58 AM, Tatu Saloranta <tsalo...@gmail.com>
> wrote:
>>
>> Yeah, dealing with ByteBuffer is an interesting area, since so far I
>> know how to optimize byte[] backed things well.
>> But ByteBuffer is bit different.
>
> One of the hypothesis I regularly apply to buffer optimizations (in Java's
> ByteBuffer or AS3's ByteArray) is whether it is less expensive to address
> the bytes individually (as you seem to do with byte[]) or address them in
> larger units and use shift and mask tricks to isolate the slower paths.
>
> Of course, the major platform reality is that this bit-twiddling can have
> wildly varying payback on different VM's, but it seems quite effective on
> LLVM, and I have hope for it on Android systems. I will absolutely be doing
> performance runs and micro-benchmarks on my Android devices as well as in
> the emulator.

Yup, this would be very useful. I mostly test on mac/linux JVMs, and
it's probably quite differfent beast from Android.
Although so far there has been reasonable correlation (for Jackson
json processor at least).

>> One option would be to allow
>> fluff within structure, if insertion points were likely to be known...
>> but I don't have good idea of details here.
>
> Many applications key or value lengths can either be fixed or have "probable
> upper bound" - take the average word length +1 to 3 stddev and you only have
> to shift bytes when the value is outside that bound. IMO let the user set a
> fixed-length value for a mutable Tr13 and make them detect when the new
> value is too long to set. For fixed memory systems I favor an error
> condition return (i.e. public boolean put(k,v) returns false) or else
> silently truncating the value (yes meaning mutate the last byte for VInt).
> After all, the library user can detect when their value violates the fixed
> length they set.

True. Plus for some types size is fixed anyway.

One relatively easy expansion path could be just allowing modification
of existing payload, at least for byte[] to byte[] type.
Synchronization would be problematic, of course, if writes are
allowed, but that will be the case for any modifications.

Yet another possibility would be to define segmented/chunked version
where individual entries are never split, and segments could be
split/merged based on size changes.
This would limit need for shuffling to within segment in question.
That would require one 'root' segment, and some way to branch off at
various branches.

-+ Tatu +-

Reply all

Reply to author

Forward