High object churn in serialization/deserialization process

200 views
Skip to first unread message

Pawel Veselov

unread,
Jan 17, 2013, 12:08:26 AM1/17/13
to google-gson
Hello.

First of all, sorry for a long introduction, but I believe that it may be relevant.

We have a J2EE application that does a lot of JSON processing, for both serializing and deserializing JSON objects. We are using 2.2.3 version of google-gson to do the job. Each individual application node may process high number of requests (up to 1000/second). We had no problems with any of this until today.

Actually, I don't quite know what happened, but the symptoms were that the garbage collection has stalled, and JVM was effectively unresponsive (though no OOMs were thrown). I've opened a separate thread on this on SO : http://stackoverflow.com/questions/14370738/java-heap-overwhelmed-by-unreachable-objects. On VM restart, the VM would get to the same state within minutes.

Analyzing the heap dump that I got from one of the VMs, after it had reached such a state, with Eclipse Memory Analyzer, I could see that about 1.9Gb of heap was occupied by unreachable objects. Now, while I'm dealing with that fact separately, the contents of these unreachable object pile are concerning on their own. To list the top occupants in quantity/space/class:

25M/804M/com.google.gson.internal.StringMap$LinkedEntry
19M/206M/com.google.gson.JsonPrimitive
5M/219M/com.google.gson.internal.StringMap
5M/199M/com.google.gson.internal.StringMap$LinkedEntry[]
9M/141M/java.lang.Integer
4M/73M/com.google.gson.JsonObject

The list goes on, there are a lot of reflection instances, etc. Going down, the first time I see instances of my bean, is at 11K objects. For this particular instances, I estimate that per each request, a maximum of 30 of them can be created. That tells me that per each request, there are 70K of StrngMap$LinkedEntry instances created (for example).

Unfortunately, I didn't include unreachable classes into the analysis, so I can only get the list, and not the origins.

All my GSon instances are static (have made this mistake before).

Here is what I've done as a test (attached). I took sample JSON, like the one we would produce in production, and simply deserialized it and serialized it back, into a map based object. I ran this operation on a single VM, only once, and took the heap dump after the operation. I then used MAT again to list the unreachable objects. Here is what I see after that single run (no. instances / class):

3,844 com.google.gson.internal.StringMap$LinkedEntry
2,924 com.google.json.JsonPrimitive
1,038 com.google.gson.internal.StringMap$LinkedEntry[]
710 com.google.gson.internal.StringMap

The numbers do seem to be quite high, even for an object of such a complexity.

Thank you,
  Pawel.

GTest.java.gz

Pawel

unread,
Jan 30, 2013, 9:39:41 PM1/30/13
to googl...@googlegroups.com

Hi.

to follow up on this.

The problem really is in the fact that json serializers are used, and in certain way.I haven't looked into the deserialization part (our application really mostly serializes). If custom serializers are used, and these serializers will serialize a piece of sub-tree using json serialization context, that creates a tree representation of this piece, which is then written out into the top json writer. If one has enough custom serialization on the objects down some tree path, a lot of those trees may end up being created. When the tree is navigated, there are also a lot of temporary Entry objects (and arrays of) created. This really drives memory consumption up, even though all objects are eventually recycled. I've had millions of objects created for serializing certain object trees (granted, there was about 4000 terminal leaves), as on certain paths to those terminal leaves, custom serialization was invoked few times.

So, to fix all that, I changed all my custom deserializers into type adapters, or type adapter factories. When I need to pass serialization of an object, I would simply chain to the proper adapter, retrieving it from current gson.

There are few things that I encountered through this that I wanted to point out:

1) For deserialization, the API doesn't specify the type of the object that should be deserialized. For hierarchical adapters, this becomes a problem, since I don't know what type to instantiate, and an adapter factory has to be used instead, so that the type can be made available to the adapter.

2) Some of the serializers could be replaced with type adapters, but some had to be replaced with the factories. Since builder API takes in type of Object for registering adapters, it was tedious to find all cases where the factories were now used (since there were no compilation errors).

3) Adapter APIs don't provide a reference to current gson, so, if it's needed, the factory has to be used instead, just so it can make that reference available to the adapter.

Thank you,
  Pawel.

Inderjeet Singh

unread,
Jan 31, 2013, 3:49:47 PM1/31/13
to googl...@googlegroups.com
Hi Pawel,

The adapter API doesn't provide reference to the current Gson instance. That is intentional and is designed this way to maximize performance. The goal with a TypeAdapter is to do all the reflection once, and only once. The Type field is never looked at again. If you really need to look at the Type at run-time, you should be writing a TypeAdapterFactory which is what you ended up doing.

HTH
Inder
----

Pawel Veselov

unread,
Jan 31, 2013, 4:38:47 PM1/31/13
to googl...@googlegroups.com
Hi Inder.

On Thu, Jan 31, 2013 at 12:49 PM, Inderjeet Singh <inde...@gmail.com> wrote:
Hi Pawel,

The adapter API doesn't provide reference to the current Gson instance. That is intentional and is designed this way to maximize performance. The goal with a TypeAdapter is to do all the reflection once, and only once. The Type field is never looked at again. If you really need to look at the Type at run-time, you should be writing a TypeAdapterFactory which is what you ended up doing.

It may be reasonable to add, say, 'void selected(Gson g, Type p)' method to TypeAdapter. This will be called only once when the adapter is picked for that specific type (in Gson.getAdapter(Type)). Purely to reduce the amount of the boilerplate code.
 

--
You received this message because you are subscribed to the Google Groups "google-gson" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-gson...@googlegroups.com.
To post to this group, send email to googl...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-gson?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
With best of best regards
Pawel S. Veselov

Inderjeet Singh

unread,
Feb 7, 2013, 12:52:34 PM2/7/13
to googl...@googlegroups.com
Can you elaborate on the proposal? What will the adapter typically do in the selected() method?
Some code snippet will be helpful as well.

Thanks
Inder

Pawel

unread,
Feb 7, 2013, 2:56:34 PM2/7/13
to googl...@googlegroups.com

Hi.

For an example of using the TypeToken: 

Let's say I have a type adapter that serializes and deserializes objects as strings. Deseriliazation requires that a public constructor with (String) signature is required.

public class AsStringTA extends TypeAdapter<Object> {

  private Constructor c;
  
  @Override
  public void write(JsonWriter out, Object value) {
    out.value(value.toSring());
  }

  @Override
  public Object read(JsonReader reader) {
    String s = reader.nextString();
    try {
      return c.newInstance(s);
     } catch (Exception e) { throw new RuntimeException(e); }
  }

  @Override
  public void selected(Gson gson, TypeToken<Object> tt) {
    try {
      c = tt.getRawType().getConstructor(String.class);
    } catch (Exception e) { throw new RuntimeException(e); }
  }

}

Example where Gson instance will be used is more obvious, when you just want a type adapter to call Gson.getAdapter, to delegate serialization or deserialization on pieces of an object.

The "selected()" method is really used in lieu of the constructor here, because we have to pass an instance of a type adapter, and not a class (if class were passed, then constructor with the same arguments could have been mandated).

I understand how above can be too dangerous and ambiguous (and therefore moot), as it can not be used for hierarchy type adapters, selected() will be called multiple times on the same adapter, but with different type tokens, so it's just a safer practice to require type adapter factories to be used instead. But even then, passing instance to gson wouldn't hurt, even if it's done multiple times.

Thank you,
  Pawel.
Reply all
Reply to author
Forward
0 new messages