the future of Kryo: v5

Nate

unread,

Apr 25, 2018, 9:13:00 AM4/25/18

to kryo-users

Hello everyone,

I have been working on cleaning up and improving the Kryo library for a new major release. You can see the current state here:
https://github.com/EsotericSoftware/kryo/tree/kryo-5.0.0-dev

I'll ramble about some of the major changes:

Generics has been redone to support all scenarios. GenericsUtil thoroughly discovers the generic types that are known at compile time (tests), which avoids as much work as possible during serialization. The generic types known only at serialization time are tracked by Kryo's instance of Generics, which maintains a stack of GenericType instances (lots of nice javadocs in those classes).

How does using the new generics API look? For a simple example with just one type parameter, see CollectionSerializer. To get the class, call kryo.getGenerics().nextGenericClass() then after reading the child objects call kryo.getGenerics().popGenericType(). Simple!

nextGenericClass() is a shortcut for the common case of a class with a single type parameter. When there are multiple type parameters, like Map, use nextGenericTypes() instead, then call resolve() to get each class. Also, the last parameter is made current automatically (meaning it is pushed by Generics#nextGenericTypes()), so the other parameter(s) have to be pushed and popped manually. Somewhat less simple!

It's a little tricky, but these two patterns are all serializers need to worry about. It provides full support for nested generic types, eg HashMap<ArrayList<Integer>, ArrayList<String>>, which wasn't possible before.

Feedback on all this is welcome, but the core of it is relatively complex and may induce a headache (it certainly did when writing it!). The important bits are 1) to handle all generics scenarios, and 2) to minimize work at serialization time. Digging through just the calls for generics made from serializers, you'll find minimal work is done and without allocations. Given this, generics are always enabled.
Serializer method signatures have changed for issue #146 in this commit. All Serializer classes need to be changed from Class<T> to Class<? extends T>.
Various serialization improvements. Eg, TaggedFieldSerializer now has acceptsNull=true, which saves 1 byte for all non-null objects.
Unsafe had permeated the API. IMO it should be sandboxed as much as possible. I began refactoring by removing Unsafe support completely, made everything nice, then put back the input/output streams. I haven't yet looked into what serializers would need to make use of all of the Unsafe features. With Java 9 dropping Unsafe, maybe it is not worth the considerable effort to support it. FWIW, I personally don't use Unsafe but I know others do. I assume some even choosing Kryo specifically for Unsafe. I don't know how those people feel about Java 9.

I have not looked at Java 9 at all yet. I don't know if it provides APIs that Kryo can use to be more efficient.
FieldSerializer had gotten messy internally. It did a lot more work than necessary to build the cached fields. It did things with generics that were suspect and did expensive computations that were not even used. It had a lot of Unsafe logic.
Various API improvements. The Kryo class must not become a junk drawer for serializer settings. The SerializerFactory classes can be used, eg FieldSerializerFactory, which can create new, configured serializers (example). Having Input/Output varint and varlong be a hint was odd. If this ends up being needed again, maybe depending on what happens with Unsafe, it could be done in Input/Output subclasses without affecting the base class API. Javadocs are much improved.
Logging is improved. Some useless junk was logged, some important junk was not logged, and the formatting of log messages was not consistent.
Deserialization still temporarily modifies the input buffer. While in many cases this is fine, it can be quite an unexpected gotcha in some cases. I'd like to remedy this but string writing is important and any changes here need extensive benchmarks.

I'm sure everyone's first question will be about what happens to data serialized in an older Kryo version. Supporting that is noble for a minor version increment, but the changes above are too extensive for that to work. It would be dirty to have settings to disable free optimizations or enable bugs we've fixed, solely to attempt loading previously serialized bytes. I don't want the difficulties of data migration to prevent the library from evolving and I feel an improvement and maintenance pass like this is long overdue.

It may not make sense to upgrade your projects to a new major version. Kryo has been stable a long time. While it's nice to have as many projects on the latest as possible, you should not feel obligated to upgrade if the benefits do not make it worthwhile, given the pain involved. New projects of course benefit from using the latest version.

Updating Kryo versions when you care about previously serialized bytes has always been hard for all but the most trivial updates. Probably the safest way to do this is to load data with an old version, then write it with the new version. This can be done by using a class loader for the old Kryo classes. Maybe we could provide an example for this. For this to work, obviously the class files must not have not changed, only the Kryo version.

Please share your thoughts! Now is the time for us to make any big changes and improvements to both the API and serialized data that wouldn't make sense in a minor version bump. Any places in the API you don't like, are awkward, or confusing? Have any ideas for improving the API? Have any ideas for more efficient serialization?

Cheers,

-Nate

Jan Kotek

unread,

Apr 28, 2018, 12:52:26 AM4/28/18

to kryo-users

Hi Nathan,

Good work!

I have non-recursive graph serialization in Elsa. How would you feel if I merged that into FieldSerializer?

Jan Kotek

Nate

unread,

May 1, 2018, 5:30:06 PM5/1/18

to kryo-users

Hi Jan,

Non-recursive serialization would be nice. Whether it belongs in FieldSerializer depends on how much it complicates the class and whether it affects performance. It may be better suited as a separate serializer, possible a FieldSerializer subclass.

Cheers,

-Nate

--
You received this message because you are subscribed to the "kryo-users" group.
http://groups.google.com/group/kryo-users
---
You received this message because you are subscribed to the Google Groups "kryo-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kryo-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nate

unread,

Jun 5, 2018, 8:29:57 AM6/5/18

to kryo-users

More work has been done in the kryo-5.0.0-dev branch. I think it's pretty close to finished. Some notes:

Reading ASCII strings no longer modifies the buffer.
It has the same serialized bytes but copies bytes to chars before creating the string. It's nice to remove this gotcha.
Output/Input classes now use little endian everywhere (previously everything was big endian, except var ints/longs).
ByteBufferInput/Output no longer rely on the buffer's byte order
(removes a bit of juggling, plus asXxxBuffer allocates)
, the
input/output
position
and the buffer's position are kept in sync, and lots of other clean up.
writeInt(int, boolean) for letting the output decide if a fixed or variable length int is written is back. All inputs/outputs have setVariableLengthEncoding for ints and longs. Unsafe buffers don't turn this on by default, but doing so can be much faster (taking just 23% of the time in a very simple test, but of course producing larger output).
Unsafe buffers are back.
As before, the downside to using these is that the deserializing computer's native byte order must match the serializing computer. With Java 10 and a simple benchmark, using unsafe buffers completes in 59% of the time as Output/Input. If variable encoding is disabled, unsafe buffers complete in just 16% of the time (woo!).
FieldSerializer can use Unsafe to read object fields again. Unlike unsafe buffers, using unsafe for this doesn't have any downsides. Currently it uses unsafe if possible, then ReflectASM if possible, then reflection as a last resort. Since it degrades gracefully, I'm not sure users ever need to disable unsafe or ReflectASM (though we might for tests). Are there ever cases where unsafe is worse than ReflectASM or reflection, or ReflectASM is worse than reflection? TBH with Java 10 and a simple benchmark, I don't see much difference between the three.
I haven't added the FieldSerializer "memory regions" features back due to this ominous comment:
https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java#L92-L100
Leo, is this feature actually usable?
Is anyone using it?

Thoughts on the above and eyes on the kryo-5.0.0-dev branch would be appreciated.

Cheers,

-Nate

Martin Grotzke

unread,

Jun 5, 2018, 6:02:41 PM6/5/18

to kryo-...@googlegroups.com

Hi Nate,

many thanks for this enormous effort!

I wanted to go through issues to see if there are ones left breaking compatibility, so that they should be included.

I didn't go through all issues one by one but checked some I still had in mind, and as it seems you have already fixed these:

- Type registration should be required #398

- fixed API for read #146

One that _might_ affect compatibility is

- Optimizations for common special cases #439

(but I'm not sure how much this brings, maybe you want to check that one?)

Have you gone through open issues already, so that I don't have to do this completely?

Should we have a migration guide in place that tells what needs to be changed in user code? I think we should ;-)

Before finally releasing 5.0 I'd suggest to publish one or two release candidates to get some early feedback.

WDYT?

Cheers,

Martin

--
You received this message because you are subscribed to the "kryo-users" group.
http://groups.google.com/group/kryo-users
---
You received this message because you are subscribed to the Google Groups "kryo-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kryo-users+...@googlegroups.com.

Nate

unread,

Jun 6, 2018, 7:09:15 PM6/6/18

to kryo-users

I haven't yet gone through all the issues, so your help there is appreciated.

#439 looks good, I'd like to play with that soon. Now is a good time to think about other similar improvements.

A migration guide would be great!

Agreed about release candidates, we have no need to rush.

To unsubscribe from this group and stop receiving emails from it, send an email to kryo-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the "kryo-users" group.
http://groups.google.com/group/kryo-users
---
You received this message because you are subscribed to the Google Groups "kryo-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kryo-users+unsubscribe@googlegroups.com.

Nate

unread,

Jun 7, 2018, 8:53:33 PM6/7/18

to kryo-users

I've done some optimizations for #439:
https://github.com/EsotericSoftware/kryo/commit/27ae2c4fcdb1ceaaa5caac61ded16c52e3250b31#diff-a515ebc107e699c9fb01510048368d47R83

It looks a bit complex, but it's just that we only need to do checks or write bytes in certain cases. It's usually smaller than the old code and only a byte or so larger in a few cases. In other cases the savings can be 1 byte per item.

I'm not sure it's worth iterating all the keys and values in a map to do something similar for maps. Also it's quite a bit of convoluted code and would be doubled for map keys and values. I probably won't bother doing it for maps.

ObjectArraySerializer could see similar treatment, though I'd guess it gets used less often than CollectionSerializer.

Nate

unread,

Jun 8, 2018, 3:48:06 AM6/8/18

to kryo-users

FWIW, I've updated the jvmserializers project. I cleaned up the benchmark, named the runs appropriately, and updated it to Kryo 4.0.2, though I didn't re-run the full test and update the wiki. Here's a very simplistic comparison of Kryo 2.23.0, 4.0.2 and 5.0.0-dev:
http://n4te.com/x/4331-charts.html

This benchmark is pretty janky for timing and doesn't exercise much of Kryo, but the size metrics are right. Nothing stands out as terribly broken.

mongonix

unread,

Jun 8, 2018, 5:24:43 PM6/8/18

to kryo-users

On Tuesday, June 5, 2018 at 5:29:57 AM UTC-7, Nate wrote:

More work has been done in the kryo-5.0.0-dev branch. I think it's pretty close to finished. Some notes:
Reading ASCII strings no longer modifies the buffer.
It has the same serialized bytes but copies bytes to chars before creating the string. It's nice to remove this gotcha.

Very nice!

Output/Input classes now use little endian everywhere (previously everything was big endian, except var ints/longs).

ByteBufferInput/Output no longer rely on the buffer's byte order
(removes a bit of juggling, plus asXxxBuffer allocates)
, the
input/output
position
and the buffer's position are kept in sync, and lots of other clean up.
writeInt(int, boolean) for letting the output decide if a fixed or variable length int is written is back. All inputs/outputs have setVariableLengthEncoding for ints and longs. Unsafe buffers don't turn this on by default, but doing so can be much faster (taking just 23% of the time in a very simple test, but of course producing larger output).
Unsafe buffers are back.
As before, the downside to using these is that the deserializing computer's native byte order must match the serializing computer. With Java 10 and a simple benchmark, using unsafe buffers completes in 59% of the time as Output/Input. If variable encoding is disabled, unsafe buffers complete in just 16% of the time (woo!).

I'm glad you changed your mind. I remember you didn't like Unsafe buffers first ;-) But in terms of performance they are hard to beat!

FieldSerializer can use Unsafe to read object fields again. Unlike unsafe buffers, using unsafe for this doesn't have any downsides. Currently it uses unsafe if possible, then ReflectASM if possible, then reflection as a last resort. Since it degrades gracefully, I'm not sure users ever need to disable unsafe or ReflectASM (though we might for tests). Are there ever cases where unsafe is worse than ReflectASM or reflection, or ReflectASM is worse than reflection? TBH with Java 10 and a simple benchmark, I don't see much difference between the three.

I haven't added the FieldSerializer "memory regions" features back due to this ominous comment:
https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java#L92-L100
Leo, is this feature actually usable?
Is anyone using it?

No, I'm not aware of anyone really using it. It was more of implementing an optimization that could be potentially useful. So, if you simplified the code an removed it, that's fine.

Also, I really like your clean-ups of the code for handling generics. Great job!

-Leo

Nate

unread,

Jun 8, 2018, 6:05:27 PM6/8/18

to kryo-users

Unsafe buffers are back.
As before, the downside to using these is that the deserializing computer's native byte order must match the serializing computer. With Java 10 and a simple benchmark, using unsafe buffers completes in 59% of the time as Output/Input. If variable encoding is disabled, unsafe buffers complete in just 16% of the time (woo!).
I'm glad you changed your mind. I remember you didn't like Unsafe buffers first ;-) But in terms of performance they are hard to beat!

True, mostly because in my projects I can't be sure the computers that read the data will be compatible. I saw a big difference using unsafe buffers in a simple test, but interestingly almost no difference with the jvmserializers test data. Of course we know that test data only exercise a tiny part of Kryo.

I haven't added the FieldSerializer "memory regions" features back due to this ominous comment:
https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java#L92-L100
Leo, is this feature actually usable?
Is anyone using it?
No, I'm not aware of anyone really using it. It was more of implementing an optimization that could be potentially useful. So, if you simplified the code an removed it, that's fine.

OK, let's leave it out then. If it makes sense, we can add it back later.

Also, I really like your clean-ups of the code for handling generics. Great job!

Thanks! I got some gray hairs doing that. The tests are pretty thorough though, so it should handle all situations and it's pretty light at serialization time.

Cheers,

-Nate

seth/nqzero

unread,

Jun 8, 2018, 6:14:24 PM6/8/18

to kryo-users

> Now is the time for us to make any big changes and improvements to both the ...

i've posted issues for each of these but advocating for them here too:

- thread safe kryo instances

- partial deserialization (or field access) would be a big efficiency win

- async support

kryo pools become very expensive when you have many threads and many different kryo configurations

is reflectASM going to get an update pass ? i have an outstanding pull request there

Nate

unread,

Jun 9, 2018, 2:25:53 AM6/9/18

to kryo-users

Unfortunately none of those things are terribly easy.

- Thread safety is better done outside of Kryo.
- The most common serializer (FieldSerializer) doesn't lend itself to doing partial deserialization. Serializers are pluggable and others could be written to do this.

- I don't see a good way to provide access to only a portion of deserialized objects. If you use your own serializers, you can do your own callbacks, probably specific to your data.

ReflectASM might see some love, but I'm likely to run out of OSS steam very soon and go back into hibernation, aka real work. :(

Cheers,

-Nate

Nate

unread,

Jun 9, 2018, 10:16:45 PM6/9/18

to kryo-users

Here are some results from the JMH benchmarks in kryo-5.0.0-dev:

-f 4 -wi 5 -i 3 -t 2 -w 2s -r 2s
Benchmark                            (chunked) (references)   Mode Cnt     Score     Error Units
FieldSerializerBenchmark.compatible       true          true thrpt   12 1666.434 ± 37.651 ops/s
FieldSerializerBenchmark.compatible       true         false thrpt   12 1687.522 ± 37.220 ops/s
FieldSerializerBenchmark.compatible      false          true thrpt   12 2006.795 ± 150.495 ops/s
FieldSerializerBenchmark.compatible      false         false thrpt   12 2080.588 ± 23.313 ops/s
FieldSerializerBenchmark.custom            N/A          true thrpt   12 4254.326 ± 641.937 ops/s
FieldSerializerBenchmark.custom            N/A         false thrpt   12 3319.830 ± 120.158 ops/s
FieldSerializerBenchmark.field             N/A          true thrpt   12 2985.312 ± 38.829 ops/s
FieldSerializerBenchmark.field             N/A         false thrpt   12 3266.215 ± 112.011 ops/s
FieldSerializerBenchmark.tagged           true          true thrpt   12 2494.667 ± 178.776 ops/s
FieldSerializerBenchmark.tagged           true         false thrpt   12 2695.727 ± 78.062 ops/s
FieldSerializerBenchmark.tagged          false          true thrpt   12 3563.448 ± 46.638 ops/s
FieldSerializerBenchmark.tagged          false         false thrpt   12 3596.008 ± 83.828 ops/s
FieldSerializerBenchmark.version           N/A          true thrpt   12 3963.666 ± 84.364 ops/s
FieldSerializerBenchmark.version           N/A         false thrpt   12 3949.319 ± 169.012 ops/s
StringsBenchmark.readAsciiLong             N/A           N/A     ss   12     3.156 ±   0.070   s/op
StringsBenchmark.readString                N/A           N/A     ss   12     1.316 ±   0.015   s/op
StringsBenchmark.readStringLong            N/A           N/A     ss   12     3.876 ±   0.030   s/op
StringsBenchmark.writeAsciiLong            N/A           N/A     ss   12     2.254 ±   0.051   s/op
StringsBenchmark.writeString               N/A           N/A     ss   12     0.755 ±   0.023   s/op
StringsBenchmark.writeStringLong           N/A           N/A     ss   12     2.949 ±   0.134   s/op

For ops/s higher is better. For s/op lower is better. Benchmarking is tricky, so it would be nice to get more eyes on these. Likely the object graph (a single Sample instance) was too small to be meaningful. Some other test POJOs are committed but not wired up yet.

Not sure why tagged would beat field, maybe because TaggedFieldSerializer has no subclasses? I expected from best to worst:

custom, field, version, tagged, compatible

It's strange that custom with references would be higher than without, but then the +/- error for custom with references is very large. Probably the need for a larger test again.

It shows the impact of chunked encoding.

For strings, everything seems pretty OK (after a few recent commits to remove CharSequence). I'd love for it to be faster, but haven't found a way.

Joachim Durchholz

unread,

Jun 10, 2018, 4:02:37 AM6/10/18

to kryo-...@googlegroups.com

> Not sure why tagged would beat field, maybe because
> TaggedFieldSerializer has no subclasses?

AFAIK Hotspot compiles to native depending on call frequence and does
not care much about whether there's a subclass or not.
Details can vary considerably between JVM versions. E.g. for Java 8,
compilation strategy is described in
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/2b2511bd3cc8/src/share/vm/runtime/advancedThresholdPolicy.hpp#l34
.
https://stackoverflow.com/a/35602023/6944068 mentions a case where
escape analysis happened after a million iterations, long after a lot of
other optimizations were done.

I think one really needs to know what the JIT of the JVM in question is
doing: What mechanisms exist, which of them have kicked in and which
haven't, that kind of stuff.
Lather, rinse, repeat for other JVM versions.
I suspect that a benchmark that aims at a single number per test is at
best futile, at worst misleading (because you start optimizing the wrong
cases).

1. I'd first define some use cases. One that comes to mind is
high-throughput with a significant time spent inside Kryo. Another would
be low-latency. All of them long-running.
2. I'd make time series. What's the metric initially, how does it change
over time? That's going to give the users much more confidence in the
results.
3. Graphs help tremendously in interpreting the numbers. If tests ran
repeatedly, smoke graphs would be helpful, too. ("Smoke graph" is not a
standard term, I mean graphs like those produced by "smoke ping", see
https://www.google.com/imgres?imgurl=https%3A%2F%2Foss.oetiker.ch%2Fsmokeping%2Fdoc%2Freading_detail.png&imgrefurl=http%3A%2F%2Foss.oetiker.ch%2Fsmokeping%2Fdoc%2Freading.en.html&docid=HYqKuuEPsC7HEM&tbnid=AMRykYXB66kdPM%3A&vet=10ahUKEwj5seHZ0MjbAhWKK5oKHbZEBVAQMwg0KAAwAA..i&w=697&h=321&client=ubuntu&bih=906&biw=1541&q=%22smoke%20graph%22&ved=0ahUKEwj5seHZ0MjbAhWKK5oKHbZEBVAQMwg0KAAwAA&iact=mrc&uact=8
for a list of examples.)

Regards,
Jo

Nate

unread,

Jun 10, 2018, 5:33:04 AM6/10/18

to kryo-users

On Sun, Jun 10, 2018 at 10:02 AM, Joachim Durchholz <j...@durchholz.org> wrote:

Not sure why tagged would beat field, maybe because TaggedFieldSerializer has no subclasses?

AFAIK Hotspot compiles to native depending on call frequence and does not care much about whether there's a subclass or not.

It can devirtualize calls. Like a lot of JIT things, it's complex.
https://shipilev.net/blog/2015/black-magic-method-dispatch/

Details can vary considerably between JVM versions. E.g. for Java 8, compilation strategy is described in http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/2b2511bd3cc8/src/share/vm/runtime/advancedThresholdPolicy.hpp#l34 .
https://stackoverflow.com/a/35602023/6944068 mentions a case where escape analysis happened after a million iterations, long after a lot of other optimizations were done.

I think one really needs to know what the JIT of the JVM in question is doing: What mechanisms exist, which of them have kicked in and which haven't, that kind of stuff.
Lather, rinse, repeat for other JVM versions.

JVM versions can change things for sure. Ideally we at least address any major issues that affect all JVMs.

I suspect that a benchmark that aims at a single number per test is at best futile, at worst misleading (because you start optimizing the wrong cases).

Agreed, benchmark op/s (and error) are just numbers. Why they are the way they are needs to be understood (at least a little bit).

1. I'd first define some use cases. One that comes to mind is high-throughput with a significant time spent inside Kryo. Another would be low-latency. All of them long-running.

In the FieldSerializerBenchmark ops/s are how many round trip serializations per second. The biggest factor is likely the test data: how deep is the hierarchy, how much data are in the classes, what serailizers do they use, etc.

2. I'd make time series. What's the metric initially, how does it change over time? That's going to give the users much more confidence in the results.

JMH handles a lot, warm up, etc. It's hard enough to measure something accurately. I'm not sure it's important to track the metric throughout the benchmark.

3. Graphs help tremendously in interpreting the numbers. If tests ran repeatedly, smoke graphs would be helpful, too. ("Smoke graph" is not a standard term, I mean graphs like those produced by "smoke ping", see https://www.google.com/imgres?imgurl=https%3A%2F%2Foss.oetiker.ch%2Fsmokeping%2Fdoc%2Freading_detail.png&imgrefurl=http%3A%2F%2Foss.oetiker.ch%2Fsmokeping%2Fdoc%2Freading.en.html&docid=HYqKuuEPsC7HEM&tbnid=AMRykYXB66kdPM%3A&vet=10ahUKEwj5seHZ0MjbAhWKK5oKHbZEBVAQMwg0KAAwAA..i&w=697&h=321&client=ubuntu&bih=906&biw=1541&q=%22smoke%20graph%22&ved=0ahUKEwj5seHZ0MjbAhWKK5oKHbZEBVAQMwg0KAAwAA&iact=mrc&uact=8 for a list of examples.)

We can get the data out of JMH with its Java API and easily make charts (eg a Google chart URL).

Joachim Durchholz

unread,

Jun 10, 2018, 6:15:16 AM6/10/18

to kryo-...@googlegroups.com

Am 10.06.2018 um 11:32 schrieb Nate:
> On Sun, Jun 10, 2018 at 10:02 AM, Joachim Durchholz <j...@durchholz.org
> <mailto:j...@durchholz.org>> wrote:
>
> Not sure why tagged would beat field, maybe because
> TaggedFieldSerializer has no subclasses?
>
> AFAIK Hotspot compiles to native depending on call frequence and
> does not care much about whether there's a subclass or not.
>
>
> It can devirtualize calls.

Last time I read up on this (which is a while ago, and probably not 100%
accurate anymore), what they did is this:
* Whenever there's a polymorphic call, count the number of calls for
each target type.
* Continue doing this until the devirtualization threshold is reached.
* Pick the target type with the highest number of calls.
* Replace the call site with the equivalent of this code:
if (target.getClass() == clazz) {
static call to clazz.fn with target and parameters
} else {
normal polymorphic call
}

It was too dumb to devirtualize more than one target class per call
site. (Or maybe making it smarter would have been too much overhead to
really be worth it, on the average.)

> Like a lot of JIT things, it's complex.
> https://shipilev.net/blog/2015/black-magic-method-dispatch/

Interesting page, though he uses instanceof instead of direct
.getClass() calls, which is more expensive.

One thing that might help (or maybe not) could be final classes or
methods. JIT should then know right at class load time what's
polymorphic and what isn't.
Not sure if this information is being used; very little Java code
bothers to set that keyword, and it may not be worth the extra test for
the JVM.

> 1. I'd first define some use cases. One that comes to mind is
> high-throughput with a significant time spent inside Kryo. Another
> would be low-latency. All of them long-running.
>
> In the FieldSerializerBenchmark ops/s are how many round trip
> serializations per second. The biggest factor is likely the test data:
> how deep is the hierarchy, how much data are in the classes, what
> serailizers do they use, etc.
>
> 2. I'd make time series. What's the metric initially, how does it
> change over time? That's going to give the users much more
> confidence in the results.
>
> JMH handles a lot, warm up, etc. It's hard enough to measure something
> accurately. I'm not sure it's important to track the metric throughout
> the benchmark.

Users of Kryo will then know whether they need to bother about some
slowness they see.
They will also get an idea of how thorough your benchmarking was. And
from what parts of the graph you are drawing your conclusions.

It will also tell you things about the typical duration of the warm-up
phase, repeatability of test results, and similar stuff.
It's the difference between having individual data points and aggregate
data. Aggregate tends to hide details; if the details contain a
surprise, it's good to investigate more, if the details contain no
surprise, that's of interest to the users of the benchmarked code.

> 3. Graphs help tremendously in interpreting the numbers. If tests
> ran repeatedly, smoke graphs would be helpful, too. ("Smoke graph"
> is not a standard term, I mean graphs like those produced by "smoke
> ping", see
> https://www.google.com/imgres?imgurl=https%3A%2F%2Foss.oetiker.ch%2Fsmokeping%2Fdoc%2Freading_detail.png&imgrefurl=http%3A%2F%2Foss.oetiker.ch%2Fsmokeping%2Fdoc%2Freading.en.html&docid=HYqKuuEPsC7HEM&tbnid=AMRykYXB66kdPM%3A&vet=10ahUKEwj5seHZ0MjbAhWKK5oKHbZEBVAQMwg0KAAwAA..i&w=697&h=321&client=ubuntu&bih=906&biw=1541&q=%22smoke%20graph%22&ved=0ahUKEwj5seHZ0MjbAhWKK5oKHbZEBVAQMwg0KAAwAA&iact=mrc&uact=8
> <https://www.google.com/imgres?imgurl=https%3A%2F%2Foss.oetiker.ch%2Fsmokeping%2Fdoc%2Freading_detail.png&imgrefurl=http%3A%2F%2Foss.oetiker.ch%2Fsmokeping%2Fdoc%2Freading.en.html&docid=HYqKuuEPsC7HEM&tbnid=AMRykYXB66kdPM%3A&vet=10ahUKEwj5seHZ0MjbAhWKK5oKHbZEBVAQMwg0KAAwAA..i&w=697&h=321&client=ubuntu&bih=906&biw=1541&q=%22smoke%20graph%22&ved=0ahUKEwj5seHZ0MjbAhWKK5oKHbZEBVAQMwg0KAAwAA&iact=mrc&uact=8>
> for a list of examples.)
>
> We can get the data out of JMH with its Java API and easily make charts
> (eg a Google chart URL).

Sweet :-)

Martin Grotzke

unread,

Jun 10, 2018, 7:11:45 AM6/10/18

to kryo-...@googlegroups.com

Re understanding numbers (such as ops/s) I made good experiences with flame graphs produced by async-profiler recently (https://github.com/jvm-profiling-tools/async-profiler) - didn't try that for kryo though.

Cheers,

Martin

--
You received this message because you are subscribed to the "kryo-users" group.
http://groups.google.com/group/kryo-users
---
You received this message because you are subscribed to the Google Groups "kryo-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kryo-users+...@googlegroups.com.

Nate

unread,

Jun 10, 2018, 10:49:45 AM6/10/18

to kryo-users

I fixed up the FieldSerializer benchmarks (let JMH do the loops). Here's a proper run with Java 10:

-f 4 -wi 5 -i 3 -t 2 -w 2s -r 2s
Benchmark (chunked) (references) Mode Cnt Score Error Units

FieldSerializerBenchmark.compatible       true          true thrpt   12   419567.981 ± 18168.102 ops/s
FieldSerializerBenchmark.compatible       true         false thrpt   12   426442.897 ± 24351.105 ops/s
FieldSerializerBenchmark.compatible      false          true thrpt   12   536883.949 ± 16173.512 ops/s
FieldSerializerBenchmark.compatible      false         false thrpt   12   510899.557 ± 17140.413 ops/s
FieldSerializerBenchmark.custom            N/A          true thrpt   12 1652782.163 ± 81337.461 ops/s
FieldSerializerBenchmark.custom            N/A         false thrpt   12 1663496.510 ± 35768.055 ops/s
FieldSerializerBenchmark.field             N/A          true thrpt   12 1388097.595 ± 114145.319 ops/s
FieldSerializerBenchmark.field             N/A         false thrpt   12 1428242.840 ± 43040.012 ops/s
FieldSerializerBenchmark.tagged           true          true thrpt   12   782582.701 ± 24919.783 ops/s
FieldSerializerBenchmark.tagged           true         false thrpt   12   793086.996 ± 28941.039 ops/s
FieldSerializerBenchmark.tagged          false          true thrpt   12 1263359.889 ± 28499.285 ops/s
FieldSerializerBenchmark.tagged          false         false thrpt   12 1255801.036 ± 34290.111 ops/s
FieldSerializerBenchmark.version           N/A          true thrpt   12 1428134.379 ± 52969.689 ops/s
FieldSerializerBenchmark.version           N/A         false thrpt   12 1401367.543 ± 31198.617 ops/s

This is more in line with what I expected to see.

Results JSON is here:
http://n4te.com/x/4334-jmh-result.json

Can drop the JSON into this website, though I'm not a fan of how it shows the data. We need to find a better tool or do the charts ourselves.
http://jmh.morethan.io/

To unsubscribe from this group and stop receiving emails from it, send an email to kryo-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the "kryo-users" group.
http://groups.google.com/group/kryo-users
---
You received this message because you are subscribed to the Google Groups "kryo-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kryo-users+unsubscribe@googlegroups.com.

Nate

unread,

Jun 10, 2018, 10:47:46 PM6/10/18

to kryo-users

Welp, the benchmarks in my last email still had many problems. I think they are better now (really this time). Also I've added more data classes (the "media" classes from jvmserializers) and some IO benchmarks. Benchmark code is here:
https://github.com/EsotericSoftware/kryo/tree/kryo-5.0.0-dev/benchmarks/src/main/java/com/esotericsoftware/kryo/benchmarks

We now have charts, eg:

http://n4te.com/x/4338-fieldSerializer.png
http://n4te.com/x/4339-string.png
http://n4te.com/x/4340-variableEncoding.png
http://n4te.com/x/4341-array.png

The charts are done with R/ggplot2 magick:
https://github.com/EsotericSoftware/kryo/tree/kryo-5.0.0-dev/benchmarks/charts

There's a simple bash script that runs all the benchmarks and generates the charts:
https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/benchmarks/run.sh

Nate

unread,

Jun 12, 2018, 3:14:24 PM6/12/18

to kryo-users

Brand new readme, updated for 5.0:
https://github.com/EsotericSoftware/kryo/tree/kryo-5.0.0-dev

Reply all

Reply to author

Forward