technique -- mean -- std - iters between gc
Direct_0 || 0.342401 0.023697 3
Direct_4 || 1.109712 0.088039 2
IDC_0 || 0.318936 0.016013 5
DC_4 || 0.337229 0.013684 5
InDirect_0 || 0.343316 0.025373 5
InDirect_4 || 0.870705 0.167930 2
techniques:
Direct - a new ByteBuffer.allocateDirect for each call to kryo serialize
InDirect - a new ByteBuffer.allocate for each call to kryo serialize
IDC - InDirect, Cached (the buffer is reused)
DC - Direct, Cached
_0 - use a buffer slightly larger than what's required
_4 - use a buffer that's 16 times longer than _0
the test object has a single field - a long array (1 << 20) of random
ints. after writing to the array, the sum of the bytes is printed (to
try to account for any difference between the OS (direct) and java
(indirect) memory. on some runs, i used verbose:gc to create the gc
column. it's the number of iterations in between gc passes (higher is
better)
problems:
i couldn't think of a way to monitor GC passes programatically, so i
didn't interleve the tests, and i paused and System.gc()'d before each
technique. can anyone think of an easy way to PROGRAMATICALLY MONITOR
GC ACTIVITY ?
my conclusions:
this was 1 run of 15 iterations of each technique, but i've done many
- all seem to look about the same. direct vs indirect doesn't seem to
matter much, and for "small" buffers allocating for each kryo call
doesn't seem to matter. for larger buffers (64M in this case), per
kryo call allocation isn't efficient. so planning to go forwards with
the flushing meta serializer
any thoughts ?
seth
1. use a pool of huge buffers, which requires a lot of memory
allocated statically
2. allocate a huge buffer for each call - i did my benchmark to see if
this was practical - it isn't
3. use some sort of catch / retry for overlows - inefficient ( n log
n, i think ), and still a memory hog
buffers in #1 and #2 need to be huge to accommodate the largest
possible object. i can certainly live with any of these options, but
they leave the app with the responsibility of bolting on some sort of
mechanism. if there was an easy way to simplify the management of the
buffer(s) i think it'd be worth it ...
nate wrote:
>> >> Interestingly, this could be built without needing changes to Kryo. I
>> >> am curious how well this would work. Anyone feel like implementing it? :)
i've added a couple of proof of concept serializers that show that
it's possible to implement a "streaming" writing api without any
changes to kryo - http://code.google.com/p/kryo/issues/detail?id=10
while these work, they're not perfect - either falling back to #3
above, or imposing an approximately 15% runtime penalty (i'm assuming
that this is due to intercepting all the serializer calls). think my
next attempt will be more invasive - see if i can eliminate the
penalty by moving the streaming code into the serializers themselves
(instead of wrapping them)
seth
NOTE on benchmarks: given the complexity and adhoc rules of the jit,
benchmarks tend to be pretty crude. yes, you can guard against gc, and
allow a long burnin, and limit which classes that you use. and you can
make any particular test give great results. but under real-life
conditions, it's pretty hard to reproduce this result, especially if
your app is large, uses 3rd party components or is ultimately a
library that somebody else uses