smile performance

389 views
Skip to first unread message

Kireet Reddy

unread,
Mar 10, 2021, 2:55:30 PM3/10/21
to jackson-user
Hello,
    We recently switched a lot of our jackson data to smile format. I did some benchmarks on some of our data and we saw good deserialization speedup of around 30% when using smile rather than text. We are mostly concerned with deserialization as we do many more of those ops.

However when I actually enabled it for all our data, we noticed things actually running slower. It seems one of our objects in particular got about 10% slower. I captured a java flight recording and it appears the hotspot is in ByteQuadsCanonicalizer._verifySharing (the Arrays.copyOf operations). I also enabled the afterburner module -- not sure if that would have any adverse impacts.

We also have a time of day when one of our objects gets larger. We did see slower performance for this object during that time, which was expected. But we also noticed a slow down during that time in other objects that don't change size. Is there any shared resource that could explain that?

Is it expected that smile could be slower for some data structures? I had assumed it would just generally be faster.

I think our configuration is pretty straightforward, we do have some custom serializers but otherwise it's just:

SmileFactoryBuilder factoryBuilder = new SmileFactoryBuilder(new SmileFactory())
.disable(JsonFactory.Feature.CANONICALIZE_FIELD_NAMES)
.disable(JsonFactory.Feature.INTERN_FIELD_NAMES)
.disable(StreamWriteFeature.AUTO_CLOSE_TARGET);

ObjectMapper m = new ObjectMapper(factoryBuilder.build());
m.configure(MapperFeature.USE_GETTERS_AS_SETTERS, false);
m.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
m.configure(SerializationFeature.CLOSE_CLOSEABLE, false);
m.setSerializationInclusion(JsonInclude.Include.NON_NULL);
m.registerModule(new AfterburnerModule());

Tatu Saloranta

unread,
Mar 10, 2021, 3:04:20 PM3/10/21
to jackson-user
On Wed, Mar 10, 2021 at 11:55 AM 'Kireet Reddy' via jackson-user
<jackso...@googlegroups.com> wrote:
>
> Hello,
> We recently switched a lot of our jackson data to smile format. I did some benchmarks on some of our data and we saw good deserialization speedup of around 30% when using smile rather than text. We are mostly concerned with deserialization as we do many more of those ops.

Ok good, that makes sense.

One option you may want to try out, if deserialization matters more, is

SmileGenerator.Feature.CHECK_SHARED_STRING_VALUES

which defaults to false. Enabling that would typically reduce size,
speed up deserialization, if there are repeated not-super-long String
values in content.

> However when I actually enabled it for all our data, we noticed things actually running slower. It seems one of our objects in particular got about 10% slower. I captured a java flight recording and it appears the hotspot is in ByteQuadsCanonicalizer._verifySharing (the Arrays.copyOf operations). I also enabled the afterburner module -- not sure if that would have any adverse impacts.

I'll have to look into this to say for sure but one possibility is
that if `SmileParser` instances were not closed after use, this could
be a symptom.
But I'll have better idea later on.

As to Afterburner: no, it should not matter here I think (although
with HotSpot/JIT it is hard to rule out anything).

> We also have a time of day when one of our objects gets larger. We did see slower performance for this object during that time, which was expected. But we also noticed a slow down during that time in other objects that don't change size. Is there any shared resource that could explain that?
>
> Is it expected that smile could be slower for some data structures? I had assumed it would just generally be faster.

It should be, yes. One other thing you coudl try disabling (enabled by
default) would be

SmileGenerator.Feature.CHECK_SHARED_NAMES

which should not cause the issue, but could give some more information
about the issue.

> I think our configuration is pretty straightforward, we do have some custom serializers but otherwise it's just:
>
> SmileFactoryBuilder factoryBuilder = new SmileFactoryBuilder(new SmileFactory())
> .disable(JsonFactory.Feature.CANONICALIZE_FIELD_NAMES)

Ohhhhh. No, you should probably NOT disable this ^^^^

> .disable(JsonFactory.Feature.INTERN_FIELD_NAMES)
> .disable(StreamWriteFeature.AUTO_CLOSE_TARGET);
>
> ObjectMapper m = new ObjectMapper(factoryBuilder.build());
> m.configure(MapperFeature.USE_GETTERS_AS_SETTERS, false);
> m.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
> m.configure(SerializationFeature.CLOSE_CLOSEABLE, false);
> m.setSerializationInclusion(JsonInclude.Include.NON_NULL);
> m.registerModule(new AfterburnerModule());

Yes, these make sense.

-+ Tatu +-

>
> --
> You received this message because you are subscribed to the Google Groups "jackson-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to jackson-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/jackson-user/a3433732-9b29-4399-99ab-fd099da4f2d4n%40googlegroups.com.

Kireet Reddy

unread,
Mar 10, 2021, 3:26:27 PM3/10/21
to jackson-user
Thanks for the reply!

I am pretty sure the parser gets closed, we only have one place where we do smile format deserialization and it's called like this:
smileObjectMapper.readValue(in, clazz);

where in is an InputStream. The jackson code seems to close the create JsonParser in a try/resource block. I can still try to disable SmileGenerator.Feature.CHECK_SHARED_NAMES. Would it be ok to do this if we've already serialized data with this enabled, or do we need to regenerate all of our data?

Regarding JsonFactory.Feature.CANONICALIZE_FIELD_NAMES, we disabled this a long time ago as we had memory issues with it enabled. We have some structures with Maps with user generated keys, and I guess the canonicalization table may have gotten too big?

Tatu Saloranta

unread,
Mar 10, 2021, 3:59:49 PM3/10/21
to jackson-user
On Wed, Mar 10, 2021 at 12:26 PM 'Kireet Reddy' via jackson-user
<jackso...@googlegroups.com> wrote:
>
> Thanks for the reply!
>
> I am pretty sure the parser gets closed, we only have one place where we do smile format deserialization and it's called like this:
> smileObjectMapper.readValue(in, clazz);

Ok. That will close the `in` if (but only if)
`StreamReadFeature.AUTO_CLOSE_SOURCE` (or its deprecated predecessor
in `JsonParser.Feature`) is enabled.

> where in is an InputStream. The jackson code seems to close the create JsonParser in a try/resource block. I can still try to disable SmileGenerator.Feature.CHECK_SHARED_NAMES. Would it be ok to do this if we've already serialized data with this enabled, or do we need to regenerate all of our data?

CHECK_SHARED_NAMES does change output (generally reducing size), so to
see any effect you would need to regenerate.

> Regarding JsonFactory.Feature.CANONICALIZE_FIELD_NAMES, we disabled this a long time ago as we had memory issues with it enabled. We have some structures with Maps with user generated keys, and I guess the canonicalization table may have gotten too big?

That is possible -- but disabling this feature will definitely reduce
performance since all the keys will need to be re-created all the time
for every document.
Maximum symbol table size is bound to something like 8k or 16k entries
but I can see how that might still be too big for some cases.
One key question would then be whether there is just one (or small
number) `SmileFactory` (and typically `ObjectMapper` that references
it), or more: symbol tables are shared by all parsers created by the
same factory. But not across factories.

Canonicalization has similar effect for JSON and Smile, although
probably more drastic for JSON actually. So that alone should not
explain difference between formats.

But lack of canonicalization might explain something about
locking/thread-sync: maybe JSON alternative has bit more optimized
handling there (I recall working through issues in that area at some
point; and code for Smile does differ a bit),

-+ Tatu +-
> To view this discussion on the web visit https://groups.google.com/d/msgid/jackson-user/1f03dc85-f42c-4c35-86da-6117212039d6n%40googlegroups.com.

Tatu Saloranta

unread,
Mar 10, 2021, 7:52:10 PM3/10/21
to jackson-user
On Wed, Mar 10, 2021 at 12:59 PM Tatu Saloranta <ta...@fasterxml.com> wrote:
>
> On Wed, Mar 10, 2021 at 12:26 PM 'Kireet Reddy' via jackson-user
> <jackso...@googlegroups.com> wrote:

Ok, so looking at this specifically:

...
> >> > However when I actually enabled it for all our data, we noticed things actually running slower. It seems one of our objects in particular got about 10% slower. I captured a java flight recording and it appears the hotspot is in ByteQuadsCanonicalizer._verifySharing (the Arrays.copyOf operations). I also enabled the afterburner module -- not sure if that would have any adverse impacts.
....

It is a bit odd. Copy operation would occur only if the table is
shared and presumably would be a fairly big symbol table for it to
take time
(there is no synchronization in there). But if CANONICALIZATION was
disabled, it should not happen.

But now that I am looking through bootstrapping, it looks like this
setting only affects JsonParsers but not Smile or CBOR parsers.
So I don't think disabling here has any effect. Given this it sounds
like your content does get a lot of unique keys and this is why
every time a copy is likely made.

I think this may explain what you are seeing, assuming documents have
a big number of distinct unique keys: with JSON, canonicalization
being disabled, there is more constant steady overhead. With Smile
this is not the case at the moment, and for larger content negative
impact probably grows with the size of document (which has a larger
set of unique names?).

Ideally Smile and CBOR parsers would allow disabling of
canonicalization. I think I will file 2 issues; while not trivial to
implement
(for JSON things are a bit easier as it can fall back to using
`Writer`-based parser), should be relatively straight-forward.
Downside is that this probably has to go in 2.13 and not in 2.12
patch, depending on how intrusive changes are.

So it looks like I got some work on my plate. :)

-+ Tatu +-

Kireet Reddy

unread,
Mar 10, 2021, 8:09:38 PM3/10/21
to jackso...@googlegroups.com
This explanation would make a lot of sense. I tested with some of our data where the keys are fixed, so it would explain why our benchmarks showed the expected speedup, but not all of our data did in production. Also you can ignore my time of day comment, I think we happened to have a code deployment around that time which would better explain the temporary slowdown. 

Thanks so much for your help and being such a great steward of this project!

--
You received this message because you are subscribed to a topic in the Google Groups "jackson-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jackson-user/azIcJNCUEkE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jackson-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jackson-user/CAL4a10ha9J0oUXz7D5ekc%3D%3DtdzeP0W91b70xmheEDM4MOKx%3DQw%40mail.gmail.com.

Tatu Saloranta

unread,
Mar 10, 2021, 11:42:13 PM3/10/21
to jackson-user
On Wed, Mar 10, 2021 at 5:09 PM 'Kireet Reddy' via jackson-user
<jackso...@googlegroups.com> wrote:
>
> This explanation would make a lot of sense. I tested with some of our data where the keys are fixed, so it would explain why our benchmarks showed the expected speedup, but not all of our data did in production. Also you can ignore my time of day comment, I think we happened to have a code deployment around that time which would better explain the temporary slowdown.
>
> Thanks so much for your help and being such a great steward of this project!

You are welcome! This is an interesting problem to tackle (Smile and
CBOR backends are something I like working on more than many other
components), and I created 2 issues wrt ability (or lack thereof) to
turn off canonicalization:

* https://github.com/FasterXML/jackson-dataformats-binary/issues/252 (smile)
* https://github.com/FasterXML/jackson-dataformats-binary/issues/253 (cbor)

-+ Tatu +-
> You received this message because you are subscribed to the Google Groups "jackson-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to jackson-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/jackson-user/CACkKG4g%2BY0qSLdaXySbjEneDQqCgNZ2MFYJx%3DJrGMubsvqC96g%40mail.gmail.com.

Tatu Saloranta

unread,
Mar 15, 2021, 3:04:14 PM3/15/21
to jackson-user
On Wed, Mar 10, 2021 at 8:42 PM Tatu Saloranta <ta...@fasterxml.com> wrote:
>
> On Wed, Mar 10, 2021 at 5:09 PM 'Kireet Reddy' via jackson-user
> <jackso...@googlegroups.com> wrote:
> >
> > This explanation would make a lot of sense. I tested with some of our data where the keys are fixed, so it would explain why our benchmarks showed the expected speedup, but not all of our data did in production. Also you can ignore my time of day comment, I think we happened to have a code deployment around that time which would better explain the temporary slowdown.
> >
> > Thanks so much for your help and being such a great steward of this project!
>
> You are welcome! This is an interesting problem to tackle (Smile and
> CBOR backends are something I like working on more than many other
> components), and I created 2 issues wrt ability (or lack thereof) to
> turn off canonicalization:
>
> * https://github.com/FasterXML/jackson-dataformats-binary/issues/252 (smile)
> * https://github.com/FasterXML/jackson-dataformats-binary/issues/253 (cbor)

Quick update here: I implemented both for 2.13(.0-SNAPSHOT) so
canonicalization can be turned off for both Smile and CBOR
for Jackson 2.13. There is some performance overhead for that, for
common use cases, but it may be worthwhile for specific
kinds of data.

Interesting enough, 3.0.0-SNAPSHOT has very little performance
degradation for POJO use case: if we ever get there,
this will be pretty useful setting there -- one can effectively have
BOTH stellar performance for bounded key sets (POJOs)
AND no overhead for unbounded cases (Map<UUID, POJO> values....)!

-+ Tatu +-
Reply all
Reply to author
Forward
0 new messages