java implementation of Buffer(str, 'binary')

96 views
Skip to first unread message

Mike Kobyakov

unread,
Apr 22, 2014, 8:29:15 PM4/22/14
to nod...@googlegroups.com

i have two services (one in node, one in java) sharing objects, which are compressed and, therefore, binary. 

nodejs does a str.toString('binary') on the compressed buffer.  this changes the buffer as in the following example.

enc   [ -108 1 72 116 104 105 115 32 105 115 32 116 104 101 32 115 116 114 105 110 103 32 70 19 0 70 18 0 -2 37 0 114 37 0 ]

enc.toString('binary')   [ -62 -108 1 72 116 104 105 115 32 105 115 32 116 104 101 32 115 116 114 105 110 103 32 70 19 0 70 18 0 -61 -66 37 0 114 37 0 ]

unfortunately, it appears that negative bytes become two negative bytes in 'binary' encoding, and i cannot find a java charset that can translate the latter into the former. 

i was told that 'binary' is simply 'latin1' encoding, but since these are binary buffers, it does not do as expected.  for example, if i take the second array from above in Java, and do 

new String(bytes, "latin1").getBytes("latin1"), 

it gives me back the same array.  [ -62 -108 1 72 116 104 105 115 32 105 115 32 116 104 101 32 115 116 114 105 110 103 32 70 19 0 70 18 0 -61 -66 37 0 114 37 0 ]

FWIW, i tracked the node buffer behavior down to v8 StringBytes::Write implementation.  https://github.com/joyent/node/blob/23dfa71dd53617c3492f34787417ca60f03ea2ec/src/string_bytes.cc#L314

but unfortunately, that's where my C++ knowledge ends, and i am confused by the following line.  i imagine if i knew what exactly that did, i could just reimplement it in java.  str->WriteOneByte(reinterpret_cast<uint8_t*>(buf), 0, buflen, flags);

thanks in advance!  hopefully, someone else has seen and/or done this already.  :/

mscdex

unread,
Apr 22, 2014, 9:31:08 PM4/22/14
to nod...@googlegroups.com
On Tuesday, April 22, 2014 8:29:15 PM UTC-4, Mike Kobyakov wrote:
nodejs does a str.toString('binary') on the compressed buffer.  this changes the buffer as in the following example.

Don't use the 'binary' encoding unless you absolutely have to. Just keep the data as a Buffer (which is what you get by default when receiving data from a socket for example). If the "object" you're describing is JSON, then you should be able to convert the Buffer via: JSON.parse(data.toString('utf8'));


Mike Kobyakov

unread,
Apr 22, 2014, 9:44:54 PM4/22/14
to nod...@googlegroups.com
I understand that's the recommended route, but it is not possible for us to change this behavior at this time.  

regardless of whether it's recommended, it should still be possible to decode the 'binary' buffer.  

Aria Stewart

unread,
Apr 23, 2014, 2:23:23 PM4/23/14
to nod...@googlegroups.com

On Apr 22, 02014, at 20:29, Mike Kobyakov <mkob...@gmail.com> wrote:

i have two services (one in node, one in java) sharing objects, which are compressed and, therefore, binary. 

nodejs does a str.toString('binary') on the compressed buffer.  this changes the buffer as in the following example.

enc   [ -108 1 72 116 104 105 115 32 105 115 32 116 104 101 32 115 116 114 105 110 103 32 70 19 0 70 18 0 -2 37 0 114 37 0 ]

enc.toString('binary')   [ -62 -108 1 72 116 104 105 115 32 105 115 32 116 104 101 32 115 116 114 105 110 103 32 70 19 0 70 18 0 -61 -66 37 0 114 37 0 ]

unfortunately, it appears that negative bytes become two negative bytes in 'binary' encoding, and i cannot find a java charset that can translate the latter into the former.



Wait, “Negative bytes”?!

That’s not a thing node does. Where is this object coming from and what kind is it?

signature.asc

mxk

unread,
Apr 23, 2014, 2:38:06 PM4/23/14
to nod...@googlegroups.com
it's just a byte value, which when i print it out, it is shown as a signed number.  

from observation, anything negative appears to transform into two bytes, which are also negative.  -108 (0x94) -> -62 -108  (0xc294), -2 (0xfe) -> -61 -66 (0xc3be).  

Aria Stewart

unread,
Apr 23, 2014, 3:02:33 PM4/23/14
to nod...@googlegroups.com

On Apr 23, 02014, at 14:38, mxk <mkob...@gmail.com> wrote:

> it's just a byte value, which when i print it out, it is shown as a signed number.
>
> from observation, anything negative appears to transform into two bytes, which are also negative. -108 (0x94) -> -62 -108 (0xc294), -2 (0xfe) -> -61 -66 (0xc3be).

What version of node are you running that does this?
signature.asc

Rebecca Turner

unread,
Apr 23, 2014, 3:15:17 PM4/23/14
to nod...@googlegroups.com

nodejs does a str.toString('binary') on the compressed buffer.  this changes the buffer as in the following example.

enc   [ -108 1 72 116 104 105 115 32 105 115 32 116 104 101 32 115 116 114 105 110 103 32 70 19 0 70 18 0 -2 37 0 114 37 0 ]

ok, let's try to reproduce this:

> var buf = new Buffer([ -108, 1, 72, 116, 104, 105, 115, 32, 105, 115, 32, 116, 104, 101, 32, 115, 116, 114, 105, 110, 103, 32, 70, 19, 0, 70, 18, 0, -2, 37, 0, 114, 37, 0 ]);

Results in:

<Buffer 94 01 48 74 68 69 73 20 69 73 20 74 68 65 20 73 74 72 69 6e 67 20 46 13 00 46 12 00 fe 25 00 72 25 00>

> var str = buf.toString('binary');

'”\u0001Hthis is the string F\u0013\u0000F\u0012\u0000þ%\u0000r%\u0000'

If we want to see it's content, we can turn it back into a buffer with:

> new Buffer(str, 'binary');

<Buffer 94 01 48 74 68 69 73 20 69 73 20 74 68 65 20 73 74 72 69 6e 67 20 46 13 00 46 12 00 fe 25 00 72 25 00>

But I notice that if you leave the encoding off of the second one it'll encode the whole thing as UTF8 and you get:

> new Buffer(str);

<Buffer c2 94 01 48 74 68 69 73 20 69 73 20 74 68 65 20 73 74 72 69 6e 67 20 46 13 00 46 12 00 c3 be 25 00 72 25 00>

Which matches your:

enc.toString('binary')   [ -62 -108 1 72 116 104 105 115 32 105 115 32 116 104 101 32 115 116 114 105 110 103 32 70 19 0 70 18 0 -61 -66 37 0 114 37 0 ]

So that's your problem, when you go to convert the "binary" string back into a buffer, you're doing it with a utf8 encoding.  You need to keep the same encoding throughout.

-- Rebecca

mxk

unread,
Apr 23, 2014, 6:32:44 PM4/23/14
to nod...@googlegroups.com
This is in Java.  i am examining two byte arrays, one raw and one which had .toString('binary') done on it.

mxk

unread,
Apr 23, 2014, 7:37:34 PM4/23/14
to nod...@googlegroups.com
This is interesting.  this is a problem with my testing code, looks like UTF8 is the default encoding when writing to a file.  :/

i need to tweak to get a better example.  thanks.

Aria Stewart

unread,
Apr 24, 2014, 10:32:23 AM4/24/14
to nod...@googlegroups.com

On Apr 23, 02014, at 18:32, mxk <mkob...@gmail.com> wrote:

> This is in Java. i am examining two byte arrays, one raw and one which had .toString('binary') done on it.


Oh, I see! That makes sense.

I’d suggest always outputting bytes as unsigned integers, preferably in hexadecimal. It’d have made spotting what was going on a lot easier — turns out the negative numbers were a red herring!

Glad you’re on your way to getting this straightened out.

Aria
signature.asc
Reply all
Reply to author
Forward
0 new messages