MessagePack bindings for node.js

910 views
Skip to first unread message

Peter Griess

unread,
May 25, 2010, 4:33:35 PM5/25/10
to nodejs
The node-msgpack bindings haven't been rigorously tested, but seems to be working fine for all the data-types that I've thrown at it.

    http://github.com/pgriess/node-msgpack

From the MessagePack site, "MessagePack is a binary-based efficient object serialization library. It enables to exchange structured objects between many languages like JSON. But unlike JSON, it is very fast and small."

Peter

Ryan Gahl

unread,
May 25, 2010, 4:46:59 PM5/25/10
to nod...@googlegroups.com
Data-types are one thing, but do you also have benchmarks compared to not using this? I'm just curious to know if you're potentially losing any perceived gains via the "v8obj->ToObject();" calls, which is 1 layer of serialization already. So the payload reduction is an apparent win, but is the serialization overhead actually reduced (unless I just don't grok the c++ correctly)? (or were you simply going for the payload reduction gain to begin with?)

Peter Griess

unread,
May 25, 2010, 6:21:49 PM5/25/10
to nodejs
Yeah, I've done some rudimentary benchmarking operations on the following object. I haven't made an attempt to narrow down which code-paths/data-types cause performance divergence.

{'abcdef' : 1, 'qqq' : 13, '19' : [1, 2, 3, 4]}

The time to serialize 500,000 instances was 7.17 seconds for JSON, and 5.80 seconds for node-msgpack. The time to serialize then deserialize 500,000 instances was 22.18 seconds for JSON, and 8.62 seconds for node-msgpack.

It's worth noting is that node-msgpack produces and consumes Buffer objects, while the JSON family of methods operates on strings. I'd expect this to further help node-msgpack outperform the JSON object when doing I/O as JSON requires an extra step to encode/decode strings into Buffer objects.

Of additional interest is that the built-in JSON object appears to memoize the result of calling JSON.stringify() for a given instance. That is, calling JSON.stringify() in a loop on the same object is insanely fast. To get an apples-to-apples comparison of doing actual work, I made 500,000 copies of the test object by calling JSON.parse(JSON.stringify()) and stashing the results.

Can you expand on the cost of v8obj->ToObject(), Ryan? I assumed that this was merely doing a cast or wrap of some sort. I'm fairly new to both V8 and JavaScript.

Peter

--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.

r...@tinyclouds.org

unread,
May 25, 2010, 6:33:55 PM5/25/10
to nod...@googlegroups.com
On Tue, May 25, 2010 at 3:21 PM, Peter Griess <p...@std.in> wrote:
> Yeah, I've done some rudimentary benchmarking operations on the following
> object. I haven't made an attempt to narrow down which code-paths/data-types
> cause performance divergence.
>
> {'abcdef' : 1, 'qqq' : 13, '19' : [1, 2, 3, 4]}
>
> The time to serialize 500,000 instances was 7.17 seconds for JSON, and 5.80
> seconds for node-msgpack. The time to serialize then deserialize 500,000
> instances was 22.18 seconds for JSON, and 8.62 seconds for node-msgpack.
>
> It's worth noting is that node-msgpack produces and consumes Buffer objects,
> while the JSON family of methods operates on strings. I'd expect this to
> further help node-msgpack outperform the JSON object when doing I/O as JSON
> requires an extra step to encode/decode strings into Buffer objects.
>
> Of additional interest is that the built-in JSON object appears to memoize
> the result of calling JSON.stringify() for a given instance. That is,
> calling JSON.stringify() in a loop on the same object is insanely fast. To
> get an apples-to-apples comparison of doing actual work, I made 500,000
> copies of the test object by calling JSON.parse(JSON.stringify()) and
> stashing the results.
>
> Can you expand on the cost of v8obj->ToObject(), Ryan? I assumed that this
> was merely doing a cast or wrap of some sort. I'm fairly new to both V8 and
> JavaScript.


As far as I know, it's a fast cast-like thing. Here's a partial trace:

http://github.com/ry/node/blob/e97a481785dc5920336f46c7b29c6d9ff5fe974b/deps/v8/src/api.cc#L1775-1789
http://github.com/ry/node/blob/e97a481785dc5920336f46c7b29c6d9ff5fe974b/deps/v8/src/execution.cc#L446-449
http://github.com/ry/node/blob/e97a481785dc5920336f46c7b29c6d9ff5fe974b/deps/v8/src/runtime.js#L527-535

Ryan Gahl

unread,
May 25, 2010, 6:47:41 PM5/25/10
to nod...@googlegroups.com
On Tue, May 25, 2010 at 5:21 PM, Peter Griess <p...@std.in> wrote:
Can you expand on the cost of v8obj->ToObject(), Ryan?

Not really :) -- wasn't sure if it was a cast or a coerce situation (which is why I asked).

I was just postulating based on my own limited knowledge of the APIs. What I'm curious about is the "when doing actual work" part -- meaning serializing... then _marshaling_, then deserializing (not just serializing and then deserializing in place as I can't think of why one would do that in the real world).

So you have an object foo, assumed to be of moderate size and complexity (nested objects but no circular references, which is another issue of course). Then you pack it up via "var foo_pack = msgpack.pack(foo);". So now it's a Buffer, right? But it's a "packed" Buffer, not necessarily something usable in places expecting a Buffer unless they expressly expect a "packed" Buffer and are ready to deserialize properly (I suppose that relegates the usefulness to local code or external code under your control). OK, so in talking through this I think I see the use case you're after. You just want to avoid using strings locally as an intermediate format when all you need is a Buffer, and when you control the serialization/deserialization mechanisms at both ends. It's not necessarily a replacement for the cases when you're actually marshaling data to an external endpoint, at least not generically.

OK, so then you have circular references to deal with. I see your comments warn against a possible stack overflow for deeply nested objects, so that says circular references aren't accounted for either. I know one answer is "circular references are bad, don't use them", but there are handfuls of valid cases for them, and a good serialization mechanism should have a way to deal with them. Well, rather, once again... a "generic" serialization mechanism, right?

Sorry, sort of free flow banter here. I get it now I think :) -- am I over-engineering this?

Peter Griess

unread,
May 25, 2010, 7:07:26 PM5/25/10
to nodejs
On Tue, May 25, 2010 at 5:47 PM, Ryan Gahl <ryan...@gmail.com> wrote:
On Tue, May 25, 2010 at 5:21 PM, Peter Griess <p...@std.in> wrote:
Can you expand on the cost of v8obj->ToObject(), Ryan?

Not really :) -- wasn't sure if it was a cast or a coerce situation (which is why I asked).

I was just postulating based on my own limited knowledge of the APIs. What I'm curious about is the "when doing actual work" part -- meaning serializing... then _marshaling_, then deserializing (not just serializing and then deserializing in place as I can't think of why one would do that in the real world).

So you have an object foo, assumed to be of moderate size and complexity (nested objects but no circular references, which is another issue of course). Then you pack it up via "var foo_pack = msgpack.pack(foo);". So now it's a Buffer, right? But it's a "packed" Buffer, not necessarily something usable in places expecting a Buffer unless they expressly expect a "packed" Buffer and are ready to deserialize properly (I suppose that relegates the usefulness to local code or external code under your control). OK, so in talking through this I think I see the use case you're after. You just want to avoid using strings locally as an intermediate format when all you need is a Buffer, and when you control the serialization/deserialization mechanisms at both ends. It's not necessarily a replacement for the cases when you're actually marshaling data to an external endpoint, at least not generically.

What I'm after is increased performance when transmitting a given JavaScript object over the wire. We have an advantage with node-msgpack both CPU usage and space usage for transmission/storage.

FWIW the MessagePack wire format is documented and has implementations in several other languages. It's certainly a format that one could use when communicating with external endpoints, as long as the transport had a mechanism for expressing the content encoding correctly (MessagePack vs. plaintext JSON).
 
OK, so then you have circular references to deal with. I see your comments warn against a possible stack overflow for deeply nested objects, so that says circular references aren't accounted for either. I know one answer is "circular references are bad, don't use them", but there are handfuls of valid cases for them, and a good serialization mechanism should have a way to deal with them. Well, rather, once again... a "generic" serialization mechanism, right?

Sorry, sort of free flow banter here. I get it now I think :) -- am I over-engineering this?

Ooh, yeah I'm glad you brought up circular references.

I'll be rejecting any object with circular references as invalid for serialization, failing the entire operation. The MessagePack format (and JSON as well, for that matter) has no facility to support this.

This is what the built-in JSON.stringify() does:

node> o['c'] = o;
{ a: 13, b: 14, c: [Circular] }
node> o
{ a: 13, b: 14, c: [Circular] }
node> JSON.stringify(o);
TypeError: Converting circular structure to JSON
    at Object.stringify (native)
    at REPLServer.<anonymous> (eval at <anonymous> (repl:68:28))
    at REPLServer.readline (repl:68:19)
    at Stream.<anonymous> (repl:29:19)
    at Stream.emit (events:25:26)
    at IOWatcher.callback (net:507:14)
    at node.js:204:9

Peter

Ryan Gahl

unread,
May 25, 2010, 7:21:33 PM5/25/10
to nod...@googlegroups.com

What I'm after is increased performance when transmitting a given JavaScript object over the wire. We have an advantage with node-msgpack both CPU usage and space usage for transmission/storage.

FWIW the MessagePack wire format is documented and has implementations in several other languages. It's certainly a format that one could use when communicating with external endpoints, as long as the transport had a mechanism for expressing the content encoding correctly (MessagePack vs. plaintext JSON).

Nice, OK. I'll definitely give it a go to see how looks from a workflow perspective. I like the promise for sure.
 
 
OK, so then you have circular references to deal with. I see your comments warn against a possible stack overflow for deeply nested objects, so that says circular references aren't accounted for either. I know one answer is "circular references are bad, don't use them", but there are handfuls of valid cases for them, and a good serialization mechanism should have a way to deal with them. Well, rather, once again... a "generic" serialization mechanism, right?

Sorry, sort of free flow banter here. I get it now I think :) -- am I over-engineering this?

Ooh, yeah I'm glad you brought up circular references.

I'll be rejecting any object with circular references as invalid for serialization, failing the entire operation. The MessagePack format (and JSON as well, for that matter) has no facility to support this.

This is what the built-in JSON.stringify() does:

node> o['c'] = o;
{ a: 13, b: 14, c: [Circular] }
node> o
{ a: 13, b: 14, c: [Circular] }
node> JSON.stringify(o);
TypeError: Converting circular structure to JSON
    at Object.stringify (native)
    at REPLServer.<anonymous> (eval at <anonymous> (repl:68:28))
    at REPLServer.readline (repl:68:19)
    at Stream.<anonymous> (repl:29:19)
    at Stream.emit (events:25:26)
    at IOWatcher.callback (net:507:14)
    at node.js:204:9

OK, fair enough. Just means we'd have to, from a framework or library perspective, translate any desired circular references as tokenized values that the receiving end understands how to deal with. Kris Zyp has a nice way of dealing with this via $ref (or now $link, IIRC) properties in his projects. It's certainly a valid thing to want to do --> i.e. a data model w/ children who have uncles who have brothers (pointing back to the fathers), etc...

Good stuff. Can't wait to get some time to tinker.

Peter Griess

unread,
May 25, 2010, 8:01:53 PM5/25/10
to nodejs
Circular reference detection implemented with change 7b013447f6295f98e5b6.

node> a
[ 1, 2, 3, [Circular] ]
node> msgpack.pack(a);
TypeError: Cowardly refusing to pack object with circular reference

    at REPLServer.<anonymous> (eval at <anonymous> (repl:68:28))
    at REPLServer.readline (repl:68:19)
    at Stream.<anonymous> (repl:29:19)
    at Stream.emit (events:25:26)
    at IOWatcher.callback (net:507:14)
    at node.js:204:9

Peter

--

Daniel Ly

unread,
May 26, 2010, 4:13:28 AM5/26/10
to nodejs
Hello Peter

You are using C++ for MessagePack. Perhaps because there is no
Buffer.pack() and Buffer.unpack() is very minimal?

I wrote http://github.com/nalply/node/commit/e2cf7839aafa3040577d385a9a05619ff39ab403
more than a month ago. Buffer.pack() and Buffer.unpack() with "d"
format. What do you think about a Javascript-only MessagePack for
Node.js if Buffer has pack() and unpack() with enough formats?

--nalply

Peter Griess

unread,
May 26, 2010, 10:53:07 AM5/26/10
to nodejs
I'm using C++ for MessagePack because I didn't want to re-implement and maintain the MessagePack pack/unpack routines myself. Even if node::Buffer supported the necessary pack and unpack directives, I'd be unlikely to use it for this reason.

However, it is nice to see the node::Buffer pack/unpack implementations being fleshed out. I was thinking of doing similar work myself to use in http://github.com/pgriess/node-msgr, but ultimately decided to just port an existing pure-JavaScript pack/unpack library to CommonsJS, with the results winding up at http://github.com/pgriess/node-jspack.

Peter

Reply all
Reply to author
Forward
0 new messages