Parsing 70k diff jsons per second, looking for optimal performance

2,063 views
Skip to first unread message

gangum

unread,
Oct 30, 2014, 6:23:59 AM10/30/14
to jackso...@googlegroups.com

Hi,
I want to parse a json string of size about 5986 bytes. But, I want the parsing to be really fast, looking for a transaction per second record of atleast 70k. That means, 7ok strings of 5896 bytes parsed per second. Also, I would have 70k different json being parsed each second.

All my hopes are on Jackson as it is a more matured library.  Is there a way I can achieve this through Jackson. I tried the ObjectMapper but it takes a lot of time in parsing the string.

Please let me know if what I'm trying to achieve is not feasible or there is a different way to do it.

The machine configuration on which I tested was - intel core15 2.5GHZ, 8GB ram.

the very basic code that I ran is as below -

long startTime = System.currentTimeMillis();

JsonFactory factory = new JsonFactory();
ObjectMapper mapper = new ObjectMapper(factory);
JsonNode root = null;

for(int i=0; i<70000; i++)

{

try {

root = mapper.readTree(jsonString);

} catch (JsonProcessingException e)

{ e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

}

long endTime = System.currentTimeMillis(); System.out.println("jackson processing ---" + (endTime-startTime));


Tatu Saloranta

unread,
Oct 30, 2014, 12:22:52 PM10/30/14
to jackso...@googlegroups.com
On Thu, Oct 30, 2014 at 10:23 AM, gangum <gsha...@gmail.com> wrote:

Hi,
I want to parse a json string of size about 5986 bytes. But, I want the parsing to be really fast, looking for a transaction per second record of atleast 70k. That means, 7ok strings of 5896 bytes parsed per second. Also, I would have 70k different json being parsed each second.


First thing would be to calculate approximate amount of data: with 70,000 json documents of about 6kB, you would need to process 420 megabytes of data per second. Without using multi-threaded processing, it seems quite ambitious.

But that just means that you will need multi-threaded handling; most modern processors have multiple cores.
With a single core you can still get processing speeds of at least 100mB/sec, maybe more, depending on many things. There is some per-document overhead, but 6kB documents are not particularly small.

But... are you sure you can actually get data at that rate? Networks typically can not push data through at that rate. Locally attached SSDs might.

Anyway; here's how to optimize your handling, for highest possible rate.
 

All my hopes are on Jackson as it is a more matured library.  Is there a way I can achieve this through Jackson. I tried the ObjectMapper but it takes a lot of time in parsing the string.


First thing to consider is that if your input comes from a stream, it is best NOT to convert it into Java Strings: that is extra overhead. If input comes as a String for some reason (not sure why; someone has to convert from bytes to String), passing a String is ok (i.e. do not convert to anything else).
 

Please let me know if what I'm trying to achieve is not feasible or there is a different way to do it.

The machine configuration on which I tested was - intel core15 2.5GHZ, 8GB ram.

the very basic code that I ran is as below -

long startTime = System.currentTimeMillis();

JsonFactory factory = new JsonFactory();
ObjectMapper mapper = new ObjectMapper(factory);
JsonNode root = null;

for(int i=0; i<70000; i++)

{

try {

root = mapper.readTree(jsonString);

} catch (JsonProcessingException e)

{ e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

}

long endTime = System.currentTimeMillis(); System.out.println("jackson processing ---" + (endTime-startTime));



Remember that running this test just once tells you very little -- JVM needs time to load classes, compile byte-code using HotSpot.
You absolutely must run the test for multiple seconds first before taking any measurements. So numbers you get above are much lower than eventual steady performance.

But as to usage: if you have actual objects to parse to, it is more efficient to use data-binding to POJOs, instead of JsonNode. JsonNode instances take more memory than POJOs (as they are basic HashMaps, ArrayLists, which are less compact than POJOs).
So, you'd have something like:

  value = mapper.readValue(jsonString, MyValue.class);

Second: assuming you know the type, you can create an ObjectReader for that type first, outside the loop:

ObjectReader reader = mapper.reader(MyValue.class);

and then read using

  value = mapper.readValue(jsonString);

which is bit faster as it omits deserializer lookup.

You can also configure ObjectMapper to use static (declared) types, which may help, depending on POJO structure:

  mapper.enable(MapperFeature.USE_STATIC_TYPING);

After doing all of above, and ensuring your test set up correctly warms up the system, you should be close to optimal speed for data-binding.

But there is one more thing that can help: Afterburner module:

   https://github.com/FasterXML/jackson-module-afterburner

which you can enable by:

   mapper.registerModule(new AfterburnerModule());

and it will use byte-code generation for optimizing some Reflection-access; speed up for POJO reading is typically something like 20-30%.
It will not, however, help with JsonNode, or Streaming parsing.

After doing all of above,  you should be close to optimal data-binding performance.
If you still want faster speed, you will need to use Streaming API; construct JsonParser directly, iterate over JSON tokens, and construct your objects in Java code. Remember NOT to use 'parser.readValueAs()' (which would invoke data-binding).
Using Streaming API could give maybe up to 10-20% faster performance, if you write it optimally; although naive ways could give you something slower than databind. :)

I hope this helps,

-+ Tatu +-

 


--
You received this message because you are subscribed to the Google Groups "jackson-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jackson-user...@googlegroups.com.
To post to this group, send email to jackso...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gangum

unread,
Oct 30, 2014, 3:35:10 PM10/30/14
to jackso...@googlegroups.com
Hi Tatu,
thanks for the extensive reply. This really helps.

However, the problem I face is that I do not have the POJO's for this json. I just simply get the json string from an api. And as for your question about the generation of the strings, that is taken care by the api. So, in short, the api fires very second and sends me 70k json strings with different value because of which I have to parse each and every json and fetch the values.
Its like sitting at the server for a bunch of very busy stores across the continent on thanksgiving day and the every time an item is billed, I get the json and have to process it.

So, it seems that the only option left for me is to create the code using the streaming api,right?

And yes, I'm using a multicore system and the strings would be sent to different threads to process. This is just for testing purpose. 

Tatu Saloranta

unread,
Oct 31, 2014, 12:45:30 AM10/31/14
to jackso...@googlegroups.com
On Thu, Oct 30, 2014 at 12:35 PM, gangum <gsha...@gmail.com> wrote:
Hi Tatu,
thanks for the extensive reply. This really helps.

However, the problem I face is that I do not have the POJO's for this json. I just simply get the json string from an api. And as for your question about the generation of the strings, that is taken care by the api. So, in short, the api fires very second and sends me 70k json strings with different value because of which I have to parse each and every json and fetch the values.
Its like sitting at the server for a bunch of very busy stores across the continent on thanksgiving day and the every time an item is billed, I get the json and have to process it.

So, it seems that the only option left for me is to create the code using the streaming api,right?

If it truly is arbitrary structured JSON, yes. But what you doing with it? If you don't know the structure, how do you find pieces?

Yes, Streaming API may then make sense, as it avoids having to build intermediate tree representation. Depending on how you extract data, it may not be that much more difficult to use; and you can efficiently skip parts you are not interested in.

You may still want to see if API could easily just pass byte[], instead of constructing Strings, outside of test cases.
It can improve overall throughput. It's not necessarily big thing, but String decoding and encoding take up big portion of cpu time JSON and XML parsers do (or, in this case, API that gets data).
I realize that there are cases where you just get a String, if data comes in as query parameters or such. But it's worth checking just in case.
 
And yes, I'm using a multicore system and the strings would be sent to different threads to process. This is just for testing purpose. 

Ok.

-+ Tatu +-

 


On Thursday, October 30, 2014 3:53:59 PM UTC+5:30, gangum wrote:

Hi,
I want to parse a json string of size about 5986 bytes. But, I want the parsing to be really fast, looking for a transaction per second record of atleast 70k. That means, 7ok strings of 5896 bytes parsed per second. Also, I would have 70k different json being parsed each second.

All my hopes are on Jackson as it is a more matured library.  Is there a way I can achieve this through Jackson. I tried the ObjectMapper but it takes a lot of time in parsing the string.

Please let me know if what I'm trying to achieve is not feasible or there is a different way to do it.

The machine configuration on which I tested was - intel core15 2.5GHZ, 8GB ram.

the very basic code that I ran is as below -

long startTime = System.currentTimeMillis();

JsonFactory factory = new JsonFactory();
ObjectMapper mapper = new ObjectMapper(factory);
JsonNode root = null;

for(int i=0; i<70000; i++)

{

try {

root = mapper.readTree(jsonString);

} catch (JsonProcessingException e)

{ e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

}

long endTime = System.currentTimeMillis(); System.out.println("jackson processing ---" + (endTime-startTime));


gangum

unread,
Nov 2, 2014, 6:29:53 AM11/2/14
to jackso...@googlegroups.com
No, the structure of the strings remain the same, only the values vary...like say for variable temp the value in the first json was 30 and in the next 50.



On Thursday, October 30, 2014 3:53:59 PM UTC+5:30, gangum wrote:

Tatu Saloranta

unread,
Nov 2, 2014, 3:34:06 PM11/2/14
to jackso...@googlegroups.com
But then... why not using POJOs constructed to much the structure?
That's a common way to use data-binding, creating classes that are mostly there to bind JSON data into something more conventient to access than untyped JSON trees. And conversely, trees are good when there just isn't stable structure to bind from, and extra logic is needed for traversal.

-+ Tatu +-

gangum

unread,
Nov 2, 2014, 4:09:12 PM11/2/14
to jackso...@googlegroups.com
Tatu, first of all, thanks for ur interest in my post, this is really helpfull :)
Well, its just that I'm short on time for delivery and parsing the json string is the only option I have right now. Another thing that adds to the complexity is that this json is created using a utility that generates apache avro compliant json. Streaming api seems a bit pain to right as the nesting involved in my json is very deep and I was planning to avoid that effort, but I guess thats what I have to do for now. Maybe later I'll get hold of the Avro generated classes and that could make my life easier :)

Still, where can I find the benchmarking classes? I'm sure there must be some performance tests for strings, bytes of various sizes. What was the optimal performance using ObjectMapper for strings? Really curious as to what I'm trying to achieve is possible or not because I tested other api's as well but none of them where able to give the tps I need, not sure if I'm headed the right direction... :(

Tatu Saloranta

unread,
Nov 2, 2014, 11:42:32 PM11/2/14
to jackso...@googlegroups.com
On Sun, Nov 2, 2014 at 9:09 PM, gangum <gsha...@gmail.com> wrote:
Tatu, first of all, thanks for ur interest in my post, this is really helpfull :)
Well, its just that I'm short on time for delivery and parsing the json string is the only option I have right now. Another thing that adds to the complexity is that this json is created using a utility that generates apache avro compliant json. Streaming

Ouch. Output by Avro/JSON is horrible. I don't know what its authors were smoking. :-(
It is a good example of odd binding to JSON; and has problems similar to those of XML-to-JSON tools -- adding unnecessary metadata, and complicating otherwise simple and natural binding of data with these additional layers that are meant to be helpful, yet end up being anything but. Apologies for rant, but I just wish there was some kind of feedback loop so developers (of such mapping schemes) learnt to avoid these anti-patterns in future.
 
But it does explain why it'd be painful to try to use data-binding.

api seems a bit pain to right as the nesting involved in my json is very deep and I
 
was planning to avoid that effort, but I guess thats what I have to do for now. Maybe later I'll get hold of the Avro generated classes and that could make my life easier :)

That, or perhaps try to get structure changed. If you need to process 70k events per second, you will need to optimize many things -- and it would make sense to also use structure that is not unnecessarily inefficient, like Avro-json is (due to its odd twists to using wrapper names to indicate type, which is unnecessary).

Still, where can I find the benchmarking classes? I'm sure there must be some performance tests for strings, bytes of various sizes. What was the optimal performance using ObjectMapper for strings? Really curious as to what I'm trying to achieve is possible or not because I tested other api's as well but none of them where able to give the tps I need, not sure if I'm headed the right direction... :(

One possibility is this project:

https://github.com/FasterXML/jackson-benchmarks

which uses POJO similar to one used by earlier general benchmark:

https://github.com/eishay/jvm-serializers

But you'd probably want to use a modified version of jackson-benchmarks classes, to use objects you have -- default one has data size of something like 400 bytes of JSON.
One good thing is that jackson-benchmarks uses `jmh` benchmark tool, which produces pretty good (stable, relevant) results.

Also: there are tests both for POJO data-binding and tree-based handling (JsonNode). So you can see whether it makes significant difference (assuming you can model POJOs) as well. But at very least, would give some perspective on achievable performance.

I hope this helps!

-+ Tatu +-

ps. I think your results are of interest not just to me and Jackson team (we are always interested in real-world performance observed with Jackson), but also to other members of Jackson community.
Reply all
Reply to author
Forward
0 new messages