Is there a way to limit the depth of the object graph that gets serialized?

Page bloom

unread,

Apr 4, 2018, 2:22:00 PM4/4/18

to yamlbeans-users

We have a massive object graph and that works fine due the normal 'lazy load' feature available in our ORM (Datanucleus) which works the same way lazy loading works in most other ORMs (eg Hibernate).

The app only loads objects in the graph as we navigate along relationships in the Java code and this is usually 10-20 objects being 'reached' (and therefore loaded from the DB or L2 cache) and in worse case never more than a couple of thousand objects being 'reached' and therefore loaded.

I am trying to serialize just a subset of the object graph, limited by a small subset of classes but one of those classes has a reference into an owner object and that owner object class is part of a much larger chunk of the object graph that 'reaches' to other classes with literally millions of other instances in the database.

As YAMLBeans serializes it uses reflection to discover every relationship of each object then proceeds to load the related objects, which is a great feature normally, but in this case it ends up attempting to load many millions of objects from the DB into memory via the ORM which obviously doesn't end well ;)

Without making any changes to our model is it possible to instruct YAMLBeans to not proceed along a particular relationship in a given class when serializing?

I was wondering if there is (or could be) a YAMLBeans mechanism where we could specify a class + relationship which YAMLBeans should not navigate past when performing serialization. It would be like "short circuiting" or putting a boundary at a point in the object graph which limits the scope of the YAMLBeans serialization process to avoid the "serializing the entire world" problem.

Aside: Having experienced this 'issue' I did think that possibly, in another scenario, with enough RAM, this could be used as a 'feature' to migrate the entire object graph of an app from a current DB to:

a different RDBMS engine eg., MySQL -> Postgres
same RDBMS engine but different schema eg., using same class model but mapping to different table layouts or different table names/column names via changed ORM metadata
a different type of datastore without changing any Java code eg., migrating from Datanucleus using RDBMS to Datanucleus using a NoSQL datastore like Cassandra

or even from one schema to another radically different schema (avoiding creating tricky SQL scripts).

Nate

unread,

Apr 4, 2018, 5:21:05 PM4/4/18

to yamlbea...@googlegroups.com

YamlBeans serialization features aren't super customizable. I thought YAML was interesting when I started the project, but have since decided it's not really great. I don't even use YamlBeans in any of my projects. I suggest JsonBeans if using JSON is acceptable. With JsonBeans you can provide a serializer to customize what gets serialized. However, if you care about performance at all, I highly suggest Kryo, which does binary serialization. Kryo has more features and more serialization customization than JsonBeans, is fast, and has smaller output. I specifically recommend the kryo-5.0.0-dev branch, which has some API and other clean up but is missing unsafe serialization support.

If you really want to use YamlBeans, I suggest contributing the ability to specify a serializer to customize output. Follow the JsonBeans approach and specify a YamlSerialzer:
https://github.com/EsotericSoftware/jsonbeans/blob/master/src/com/esotericsoftware/jsonbeans/JsonSerializer.java

https://github.com/EsotericSoftware/jsonbeans/blob/master/src/com/esotericsoftware/jsonbeans/Json.java#L124-L128

Probably the YamlSerializer receives an Emitter (and Parser for reading) it can use to customize output. Emitter is the lower level layer that YamlWriter uses. The problem with this is emitting YAML is complex. Just look at what YamlWriter needs to do.

Without something like that, currently YamlBeans uses the Beans class and doesn't allow customization.

Cheers,

-Nate

--
--
You received this message because you are subscribed to the "yamlbeans-users" group:
http://groups.google.com/group/yamlbeans-users

---
You received this message because you are subscribed to the Google Groups "yamlbeans-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to yamlbeans-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Page bloom

unread,

Apr 7, 2018, 9:11:53 PM4/7/18

to yamlbea...@googlegroups.com

I'm not a big fan of JSON - only marginally less verbose than XML.

I like the way YAML displays hierarchical data (which most expressive "non trivial/boring" data is).

I am a big fan of binary serialization though - I created a binary serializer/deserializer for C++ back in the 90's called UFOS (Universal Format for Object Serialization) where the code to serialize/deserialize was generated from our visual classworks tool:

http://stepaheadsoftware.com/products/vcw/vcwf.htm

Ah, those were the days - multi megabyte binary streams read and written in lightning fast speed - because binary parsing is massively more efficient that text base parsing. ints are ints - no need to parse the digits making up the number as a String and then call the text -> int converter methods.

I ended up using SnakeYaml (couldn't understand why people would use Jackson Yaml parser - which is just built on top of SnakeYaml).

I didn't know how to prevent the "bring in the whole world" issue with that either so I posted a question on stack overflow but then I worked out how to do it so posted an answer to my own question ;)

https://stackoverflow.com/questions/49659273/preventing-the-serializing-the-whole-world-issue-for-large-object-graph-in-sna

Reply all

Reply to author

Forward