Further optimization of arrays containing complex type elements and complex types in general.

28 views
Skip to first unread message

ppy...@gmail.com

unread,
Jul 27, 2014, 7:07:40 PM7/27/14
to universal-...@googlegroups.com
I'd like to propose an optimization for large arrays of JSON objects typically returned from web services...
Because each array element is a well known JSON object/type, I include the type definition first, then only include values in the array:

[
   { id: 1, name: 'Test', age: 23, dateSubscribed: '2014/07/28' },
   { id: 3, name: 'Some', age: 33, dateSubscribed: '2014/07/27'},
   ... lots more like these
]

As you know, each row includes the same set of string IDs which bloats the data.
Instead of that, I include something like this:

{type:123, fields:{id:'int',name:'string',age:'uint8',dateSubscribed:'date'}}
[$123,#322
    1,Test,23,2014/07/28,
    3,Some,33,2014/07/27,
   ...
]

Of course the type definitions use proper BJSON type characters not strings like 'int', 'string', 'uint8', 'date' above ... but you get the point.
I'm only not sure how to create new types (arbitrary number of them) using standard BJSON, and how to refer to them within $ type declaration (would need more than a single character for these)?

I also had to extend this to allow any nested sub-objects when returning an entire object hierarchy as array elements.
For that I've added a similar type declaration mapped to a field name/path such as 'order.details.product' where it would have name, price, ... in regular JSON, but for BJSON this could be done even better:

These nested types could be declared as a separate set of types with their own $ids declared prior to being referenced, something like this:
{type:120, name:'Product', fields:{id:'int',name:'string',price:'double'}},
{type:121, name:'Details', fields:{count:'int',product:'$120'}},
{type:122, name:'Order', fields:{id:'int',from:'string',date:'date',details:'$121'}}

Then the data array can only inlclude values, flat table:
[$122,#3
    11,'John','2014/07/28',   1,  73,'Bike',122.11,
    12,'Marry','2014/07/28',  3,  82,'Pen',3.21,
    ...

When using large data sets, this can reduce the data size quite significantly, especially when property names are long.
The same technique could be used for any JSON objects - include the type first, then use {$### followed by data, not even count would be needed in this case.
Serializer could keep track of each JSON type already generated, assign new type $ to new ones and generate them before first usage. Type names would be optional, provided by serializer if known?

So far I've used this technique only with standard JSON but I'd like to start optimizing my data even further with BJSON implementing it or some form of it.

-Piotr

Riyad Kalla

unread,
Jul 29, 2014, 6:04:24 PM7/29/14
to universal-...@googlegroups.com
Piotr, 
You are definitely in line with some of the proposals we have been discussing (https://github.com/thebuzzmedia/universal-binary-json/issues/)

This is essentially supporting embedded schemas which I'm not opposed to, just needs more time to discuss/digest.

BTW, you mentioned "BJSON" a few times, which is a separate spec here (http://bjson.org/) and this forum is really intended for the UBJSON spec - just wanted to make sure your suggestion was targeted at the right spec group :)

Best,

----------------
Riyad
http://thebuzzmedia.com


--
You received this message because you are subscribed to the Google Groups "Universal Binary JSON (UBJSON)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to universal-binary...@googlegroups.com.
To post to this group, send email to universal-...@googlegroups.com.
Visit this group at http://groups.google.com/group/universal-binary-json.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages