Dart - Binary data type

1,679 views
Skip to first unread message

Frank Pepermans

unread,
Mar 12, 2014, 4:41:56 AM3/12/14
to mi...@dartlang.org
I'm extending Codec to build an Object <-> byte-array converter,

Now I've been trying to find a ByteArray-ish class within Dart (client-side) that would support an API similar to :

writeInt; <-> readInt();
writeUint; <-> readUint();
writeBytes; <-> readBytes();
writeUTF8(); <-> readUTF8();
...
etc

Is there something in the core libs for this? Or perhaps a pub package?

Ivan Zaera Avellon

unread,
Mar 12, 2014, 5:11:34 AM3/12/14
to mi...@dartlang.org
Have a look at "dart:typed_data". It has Uint8List and ByteDataView or something like that.

I'm not sure if it works in all browsers, though. I think it was failing in IE10...


El 12/03/14 09:41, Frank Pepermans escribió:
--
For other discussions, see https://groups.google.com/a/dartlang.org/
 
For HOWTO questions, visit http://stackoverflow.com/tags/dart
 
To file a bug report or feature request, go to http://www.dartbug.com/new
To unsubscribe from this group and stop receiving emails from it, send an email to misc+uns...@dartlang.org.

signature.asc

Frank Pepermans

unread,
Mar 12, 2014, 5:35:58 AM3/12/14
to mi...@dartlang.org
Thanks found it :)

But I do need to support IE10 unfortunately, might have to go with JSON instead of binary encoding then...

Kasper Lund

unread,
Mar 12, 2014, 5:37:18 AM3/12/14
to General Dart Discussion
If I remember correctly, it's only IE9 that does not support
dart:typed_data. IE10 should be fine.

Cheers,
Kasper

Alex Tatumizer

unread,
Mar 12, 2014, 1:16:42 PM3/12/14
to mi...@dartlang.org
The class in question is called ByteData
There's no way to write String (UTF-8 or otherwise), into it - there's open issue for it, but it's put on hold - probably until javascript provides a counterpart, so (I'm speculating) we are talking about years or decades :).
But that's not the biggest problem.
ByteData is very slow, even in VM. 
When compiled to js, the last time I checked 1/2 year ago, it was not usable at all (please correct me if I'm wrong).
In VM, there's an obvious workaround - please see pigeon_map project (pigeonson class) for details.
However, after compiling to javascript... no, I can't talk about it without tears. 

Filipe Morgado

unread,
Mar 12, 2014, 2:08:51 PM3/12/14
to mi...@dartlang.org
On Wednesday, 12 March 2014 17:16:42 UTC, Alex Tatumizer wrote:
There's no way to write String (UTF-8 or otherwise), into it.

I use something like this:

int writeCharCode(final Uint8List buffer, int index, final int code) {
  if (code < 0) {
    throw new ArgumentError('Invalid character code:  $code');
  } else if (code <= _UTF8_ONE_BYTE_LIMIT) {
    buffer[index++] = code;
  } else if (_isLeadSurrogate(code)) {
    throw new UnimplementedError('Surrogates are not supported:  $code');
  } else if (code <= _UTF8_TWO_BYTE_LIMIT) {
    buffer[index++] = 0xC0 | (code >> 6);
    buffer[index++] = 0x80 | (code & 0x3f);
  } else if (code <= _UTF8_THREE_BYTE_LIMIT) {
    buffer[index++] = 0xE0 | (code >> 12);
    buffer[index++] = 0x80 | ((code >> 6) & 0x3f);
    buffer[index++] = 0x80 | (code & 0x3f);
  } else {
    throw new ArgumentError('Invalid character code:  $code');
  }
  return index;
}

// UTF-8 constants.
const int _UTF8_ONE_BYTE_LIMIT = 0x7f;   // 7 bits
const int _UTF8_TWO_BYTE_LIMIT = 0x7ff;  // 11 bits
const int _UTF8_THREE_BYTE_LIMIT = 0xffff;  // 16 bits
const int _UTF8_FOUR_BYTE_LIMIT = 0x10ffff;  // 21 bits, truncated to Unicode max.

// UTF-16 constants.
const int _UTF8_SURROGATE_TAG_MASK = 0xFC00;
const int _UTF8_SURROGATE_VALUE_MASK = 0x3FF;
const int _UTF8_LEAD_SURROGATE_MIN = 0xD800;

bool _isLeadSurrogate(int codeUnit) =>
    (codeUnit & _UTF8_SURROGATE_TAG_MASK) == _UTF8_LEAD_SURROGATE_MIN;

int writeString(final Uint8List buffer, int index, final String value) {
  final length = value.length;
  for (var index = 0; index < length; index++)
    index = writeCharCode(buffer, index, value.codeUnitAt(index));
  return index;
}

... but inside a custom growable ByteBuffer class.

When dealing with strings, my experiments suggest ByteData performance is a little slower than String manipulations (but maybe consuming half the memory?).
However, ByteData is around twice as fast as strings when those need to be encoded/decoded to/from UTF-8 (when dealing with sockets/files).

It would be nice to have native functions to encode/decode Json directly to/from UTF-8 bytes (Json is 99% ASCII), instead of having to convert everything to/from Strings.

Alex Tatumizer

unread,
Mar 12, 2014, 2:20:26 PM3/12/14
to mi...@dartlang.org
There's standard UTF-8 encoder/decoder (see https://api.dartlang.org/apidocs/channels/stable/#dart-convert.Utf8Encoder), but it converts between strings and List<int>
Then you can use setRange to copy it into Uint8List. 
Currently, there's no way to do it in one hop, 

Filipe Morgado

unread,
Mar 12, 2014, 2:32:10 PM3/12/14
to mi...@dartlang.org
Using UTF8 encoder/decoder is very slow, specially when dealing with large texts (such as Json).

My point is that we usually don't need the intermediate String representation when dealing with Json, XML, etc ...
Parsing directly from bytes brings a very nice performance improvement.

So yeah, I'd like to see a little more love for bytes, in the VM.

Ivan Zaera Avellon

unread,
Mar 12, 2014, 2:43:19 PM3/12/14
to mi...@dartlang.org

Mmm. I'm using Uint8Lists all over cipher and haven't noticed it to be so slow. At least not so noticeable slow to think it could be slowing everything too much. 

I'll keep an eye on it and inform if I see something strange.

Cheers,
Ivan

El 12/03/14 18:16, Alex Tatumizer escribió:
signature.asc

Alex Tatumizer

unread,
Mar 12, 2014, 3:49:33 PM3/12/14
to mi...@dartlang.org
Uint8Lists are quite fast in VM (though slow after translation to js). What is slow is ByteData. You can benchmark something like setInt32 to see it.
(ByteData should not be confused with TypedData interface which all typed data implement. ByteData is a class to pack/unpack ints and floats)

Ivan Zaera Avellon

unread,
Mar 12, 2014, 4:42:39 PM3/12/14
to mi...@dartlang.org
Ah, OK. I don't use them except in marginal cases, when converting values to little/big endian.

But the majority of the code just uses raw Uint8Lists, with no ByteData views at all. I just use [] operator to read and set bytes.

That explains it, I think ;-).


El 12/03/14 20:49, Alex Tatumizer escribió:
Uint8Lists are quite fast in VM (though slow after translation to js). What is slow is ByteData. You can benchmark something like setInt32 to see it.
(ByteData should not be confused with TypedData interface which all typed data implement. ByteData is a class to pack/unpack ints and floats)

signature.asc

Andrew Skalkin

unread,
Sep 18, 2014, 11:24:54 PM9/18/14
to mi...@dartlang.org
It's been half a year since this discussion took place, and it looks like nothing has changed in regard to binary string serialization; did I miss anything, such as a good pub package? The closest I have found is a fragment of Alex's pigeon_map library:
https://github.com/tatumizer/pigeon_map/blob/master/lib/src/pigeonson.dart

I guess voting for this issue might bring more attention to it? This is really a missing piece of the puzzle.
Reply all
Reply to author
Forward
0 new messages