Growable Untyped Data Buffer?

155 views
Skip to first unread message

Filipe Morgado

unread,
Aug 11, 2014, 11:46:03 AM8/11/14
to mi...@dartlang.org
Hi guys,

Just raising a point that seems kinda odd to me ... the lack of a growable untyped data buffer, with methods such as:
class Buffer {
 
void addByte(int byte) ...
 
void writeByte(int byte, int index) ...
 
void getByte() ...
 
int readByte(int index) ...
 
 
void addInt32(int value) ...
 
void writeInt32(int value, int index) ...
 
void getInt32() ...
 
int readInt32(int index) ...
 
 
... etc for each type, including typed lists ...
}

(Or maybe I missed something huge)

When working with binary data, we often need to write bytes, integers, floats, etc interchangeably to the same buffer.
However, there is no "easy" way to do it.
There is 'package:typed_data' which provides appendable typed buffers, but no general-use buffer.

Typed data such as integers and floats (and respective lists) require the use of ByteData, which is awfully slow, or the runtime allocation of typed lists as views to the underlying buffer, which cannot be cached because of alignments (and because there are potentially lots of them).

Most binary encodings and database drivers come with their own implementation of a growable buffer (using ByteData), or go to great length to pre-compute the the size of the output buffer, at the expense of performance.

Additionally, the implementation of a growable buffer requires another layer of bound checks (along with the native buffer's).

My point is that ... a native implementation of an untyped growable buffer, with methods to deal with integers, floats, Unicode, ASCII, Base64, Base16, etc ... could be very beneficial to anything that needs to deal with random binary data. And could (probably) be compiled to efficient JS.

Should we expect ByteData performance to get fixed?
Are we supposed to use typed list views instead of ByteData?
Are EventSinks really the way to go?
Are there plans to change something in this domain?
Thoughts?

Thank you :)

Vyacheslav Egorov

unread,
Aug 11, 2014, 11:57:49 AM8/11/14
to General Dart Discussion
Should we expect ByteData performance to get fixed?

ByteData performance should not be bad. Do you have an example benchmark that demonstrates "awful" slowness?


// Vyacheslav Egorov


--
For other discussions, see https://groups.google.com/a/dartlang.org/
 
For HOWTO questions, visit http://stackoverflow.com/tags/dart
 
To file a bug report or feature request, go to http://www.dartbug.com/new

To unsubscribe from this group and stop receiving emails from it, send an email to misc+uns...@dartlang.org.

Filipe Morgado

unread,
Aug 11, 2014, 1:23:35 PM8/11/14
to mi...@dartlang.org

My conclusion has been vaguely corroborated by other people.
But of course, I may be doing something wrong.

I can't currently test on other platforms.

32-bits integers and floats are consistently 4-6x slower with ByteData.
64-bits integers seem very slow on both ByteData and typed lists.

For each type, I compared ByteData to the type's correspondent list.

"NoCache" results represent the same benchs but without caching the ByteData or typed list views.

I got the following results:
Windows 7 Pro 64-bits
Intel Core2 Duo E7500 2.93GHz
Dart VM version: 1.6.0-dev.7.0 (Fri Aug 01 09:37:51 2014) on "windows_x64"

###########  Uint8
  Uint8 List          0.09
  Uint8 Data          0.13   40% slower
###########  Int32
  Int32 List          0.15
  Int32 Data          0.89   6x slower

###########  Uint32
  Uint32 List          0.23
  Uint32 Data          0.99  4x slower

###########  Uint32 NoCache
  Uint32 List NoCache          0.72
  Uint32 Data NoCache          0.99  40% slower

###########  Int64
  Int64 List          7.82
  Int64 Data          8.63  10% slower

###########  Uint64
  Uint64 List          4.09
  Uint64 Data          5.30  30% slower

###########  Uint64 NoCache
  Uint64 List NoCache          4.67
  Uint64 Data NoCache          5.49   18% slower

###########  Float32
  Float32 List          0.25
  Float32 Data          1.30   5x slower

###########  Float32 NoCache
  Float32 List NoCache          0.71
  Float32 Data NoCache          1.36  2x slower

###########  Float64
  Float64 List          0.23
  Float64 Data          1.29  5-6x slower

###########  Float64 NoCache
  Float64 List NoCache          0.71
  Float64 Data NoCache          1.58  2.2x slower

Hope it helps.
 

Filipe Morgado

unread,
Aug 11, 2014, 1:52:05 PM8/11/14
to
The benchmarks are so "micro", they may not be relevant, but the conclusion still holds when encoding/decoding large chunks in production code.

This is my take at a growable untyped buffer. There are probably a few mistakes.
(Other libs in the package are still being refactored/merged from company code, so please ignore).

A native version of that would make my day and, if performant, would probably be used by a lot of people on the server.

Vyacheslav Egorov

unread,
Aug 11, 2014, 5:12:14 PM8/11/14
to General Dart Discussion
Thank you for the benchmark!

It indeed reveals some ineffeciencies on the VM side.

setXYZ() accepts optional endianess argument which defaults to BIG_ENDIAN so you are not doing exactly the same operations in both cases. 

Large differences happens because we don't inline endiannes conversion and we have to travel into runtime for it, which is indeed wasteful.

With Int64/Uint64 we need a special support on ia32 to make it work efficiently. Our current Int64 support is not sufficient there. 

I will put this on my plate and return to this, as I am currently looking on related issues with Int32/Uint32 handling. 






// Vyacheslav Egorov


On Mon, Aug 11, 2014 at 7:52 PM, Filipe Morgado <pix...@gmail.com> wrote:
The benchmarks are so "micro", they may not be relevant, but the conclusion still holds when encoding/decoding large chunks in production code.

This is my take at a growable untyped buffer. There are probably a few mistakes.
(Other libs in the package are still being refactored/merged from company code, so please ignore).

A native version of that would make my day and would probably be used by a lot of people on the server.

Vyacheslav Egorov

unread,
Aug 11, 2014, 5:15:23 PM8/11/14
to General Dart Discussion
And BTW in Uint8 case part of the difference comes from the fact that in Uint8 we figure out that _Uint8ListBench.bytes always contains Uint8List of length 1024. We don't have such tracking support for ByteData, so we fail to propagate the length and eliminate bounds checking code. The rest is the cost of double indirection.

###########  Uint8
  Uint8 List          0.09
  Uint8 Data          0.13   40% slower


// Vyacheslav Egorov

Filipe Morgado

unread,
Aug 11, 2014, 5:36:49 PM8/11/14
to mi...@dartlang.org
On Monday, 11 August 2014 22:12:14 UTC+1, Vyacheslav Egorov wrote:
setXYZ() accepts optional endianess argument which defaults to BIG_ENDIAN so you are not doing exactly the same operations in both cases.

True >.< Thanks for the catch.

... And the answers :)

So I suppose we can keep using ByteData, as the inefficiencies will be corrected in the future?

And what about generic growable buffers? Will they stay in the realm of packages despite the (hypothetical?) performance gains?

Thanks :)

Vyacheslav Egorov

unread,
Aug 11, 2014, 5:46:36 PM8/11/14
to General Dart Discussion
So I suppose we can keep using ByteData, as the inefficiencies will be corrected in the future?

I certainly hope so, some of these (e.g. inling of endianess conversion) are easy, some would require more work (better support for IntN types, I'd say we are still figuring out and building foundations there). So I obviously can't give you any ETA.

And what about generic growable buffers?

I am not aware of any plans to add a generic growable typed data buffer. 

Overall I would not expect much gain --- it would likely be implemented in pure Dart anyways :)

 


// Vyacheslav Egorov


--

Lasse R.H. Nielsen

unread,
Aug 12, 2014, 2:50:08 AM8/12/14
to mi...@dartlang.org

We have growable typed lists in package: typed_data, so we could consider having a growable bytedata as well. I'll file a feature request for it.

/L

John McCutchan

unread,
Aug 12, 2014, 2:29:09 PM8/12/14
to mi...@dartlang.org
Just to clarify- this is not in dart:typed_data but a separate package.

Alex Tatumizer

unread,
Aug 12, 2014, 4:19:54 PM8/12/14
to mi...@dartlang.org
>
Overall I would not expect much gain --- it would likely be implemented in pure Dart anyways :)
Actually, pure dart is surprisingly good at these tasks.
Consider:
testNaiveByteDataSet(size, iterations) {
  var data=new Uint8List(size*4);
  for (int i=0; i<iterations; i++) {
    for (int j=0; j<size; j++) {
      int k=j*4;
      data[k]=j&0xFF;
      data[k+1]=(j>>8)&0xFF;
      data[k+2]=(j>>16)&0xFF;
      data[k+3]=(j>>24)&0xFF;
    }
  }
 }
Timing is: 4.1 nanoseconds/iteration (tested on 2.7GHz Intel i7)
That's quite fast! Ideally, if we could write it in assembly, and write the whole int32 at once (most processors support misaligned data), we could get 1 nano.
But even 4 nano translates to 1GB/sec.

As a contrast, compare with ByteData:
testLibraryByteDataSet(size, iterations) {
  var data=new ByteData(size*4);
  for (int i=0; i<iterations; i++) {
    for (int j=0; j<size; j++) {
      data.setInt32(j*4, j);
    }
  }
 
}

58 nanoseconds per iteration :(
I don't know the reason, but something goes wrong there.



Vyacheslav Egorov

unread,
Aug 12, 2014, 4:34:01 PM8/12/14
to General Dart Discussion
Actually, pure dart is surprisingly good at these tasks.

I know :) What I meant was: there would be no difference between SDK implementation and third party package implementation hosted on pub because both would likely be pure Dart. 

The only thing we could do for SDK we don't currently do for third_party packages is white list certain helper functions for inlining to make sure optimizer does the right thing even when it is short on the budget. 

I don't know the reason, but something goes wrong there.

I explained what goes wrong here above. It calls into runtime for endiannes conversion. We should start by just using pure Dart code for that (among other things)


// Vyacheslav Egorov


Alex Tatumizer

unread,
Aug 12, 2014, 5:06:00 PM8/12/14
to mi...@dartlang.org
Small improvement: It gets even faster if we remove unnecessary &0xFF :-)
When we write into Unit8List, dart takes just least significant byte and ignores the rest, no questions asked.
testNaiveByteDataSet(size, iterations) {
  var data=new Uint8List(size*4);
  for (int i=0; i<iterations; i++) {
    for (int j=0; j<size; j++) {
      int k=j*4;
      data[k]=j;
      data[k+1]=j>>8;
      data[k+2]=j>>16;
      data[k+3]=j>>24;
    }
  }
 }


improves performance by 10% - we get 3.6 nanosec/iteration :)
Looking forward to getting that out of box :-)

Frank Pepermans

unread,
Aug 12, 2014, 6:04:10 PM8/12/14
to mi...@dartlang.org
even a tiny faster, use int k=j<<2; instead of int k=j*4;

Alex Tatumizer

unread,
Aug 12, 2014, 6:20:22 PM8/12/14
to mi...@dartlang.org
This is interesting, I thought j*4 gets compiled to shift (so j*4 would be the same as j<<2), but it seems it is not. But on my hw, j*4 works a bit faster! Go figure...
(things might not be intuitive there; performance can be affected by code alignment and what-not)

Frank Pepermans

unread,
Aug 12, 2014, 6:37:19 PM8/12/14
to mi...@dartlang.org

Surprised me too, but in fact it is not optimized, for me the shift runs it faster.

Anyway, sure it will be addressed at some point, there's a lot of room left for the vm to bump in speed.

This is interesting, I thought j*4 gets compiled to shift (so j*4 would be the same as j<<2), but it seems it is not. But on my hw, j*4 works a bit faster! Go figure...
(things might not be intuitive there; performance can be affected by code alignment and what-not)

Lasse R.H. Nielsen

unread,
Aug 13, 2014, 1:37:27 AM8/13/14
to mi...@dartlang.org
On Wed, Aug 13, 2014 at 12:04 AM, Frank Pepermans <fr...@igindo.com> wrote:
> even a tiny faster, use int k=j<<2; instead of int k=j*4;

Seems like a missed optimization opportunity. For the VM, the two
operations are equivalent, so there is no reason to not use the same
(most efficient) code. :)

/L
--
Lasse R.H. Nielsen - l...@google.com
'Faith without judgement merely degrades the spirit divine'
Google Denmark ApS - Frederiksborggade 20B, 1 sal - 1360 København K -
Denmark - CVR nr. 28 86 69 84

Frank Pepermans

unread,
Aug 13, 2014, 1:58:50 AM8/13/14
to mi...@dartlang.org
Getting off topic, now on my desktop I get a different result:

RUN_A: avg about 110000 us
RUN_B: avg about 82480 us

making *4 faster than <<2

///RUN_A
import 'package:benchmark_harness/benchmark_harness.dart';

void main() {
  TemplateBenchmark.main();
}

int i, j;

class TemplateBenchmark extends BenchmarkBase {
  
  const TemplateBenchmark() : super("Template");

  static void main() => new TemplateBenchmark().report();
  
  void run() { for (i=0; i<10000000; i++) j=i<<2; }
}

//RUN_B
import 'package:benchmark_harness/benchmark_harness.dart';

void main() {
  TemplateBenchmark.main();
}

int i, j;

class TemplateBenchmark extends BenchmarkBase {
  
  const TemplateBenchmark() : super("Template");

  static void main() => new TemplateBenchmark().report();
  
  void run() { for (i=0; i<10000000; i++) j=i*4; }
}


Vyacheslav Egorov

unread,
Aug 13, 2014, 7:01:23 AM8/13/14
to General Dart Discussion
On x86 detecting overflow after mul operation is easy, it's just a branch on overflow, but not so much after shl unless you are shifting by 1 (~ multiplying by 2).

That's why we don't generally strength reduce j * 4 to j << 2. Keep in mind that Dart integers are arbitrary precision and can never be ignored unless operation is proven to be not overflowing.

In this particular example we could take range information computed by the range analysis into account and replace multiplication with shift. I will put a TODO on my list for that.




// Vyacheslav Egorov


Anton Moiseev

unread,
May 4, 2015, 4:10:19 PM5/4/15
to mi...@dartlang.org
Hi guys,

I was lately concerned with the question - what's better to use Uint8List or ByteData when dealing with large binary lists? From this thread it seems like a year ago Uint8List could offer a better performance, but what's the status of ByteData optimization at the moment? Is there an issue to track the progress?

Thanks!

Vyacheslav Egorov

unread,
May 4, 2015, 4:34:39 PM5/4/15
to mi...@dartlang.org
Just use class that suits your problem better (e.g. if you are only writing Uint8 values into a list - then use Uint8List, if you are writing different sized integers / floats into a list use ByteData).

If you have any performance issues with your code - file bugs.

We have some known outstanding work in that area - but it's not high on the priority list unless somebody is hitting performance issues.

Alex Tatumizer

unread,
May 4, 2015, 4:44:16 PM5/4/15
to mi...@dartlang.org
At some point, ByteData was so slow that when you needed to write int32 into buffer, you would be better off doing it like this (with Uint8List):
data[k] = j;
data[k + 1] = j >> 8;
data[k + 2] = j >> 16;
data[k + 3] = j >> 24;

This was fixed some time ago. Now ByteData performs fine, and it's faster (by almost a factor of 2) to use setInt32


Anton Moiseev

unread,
May 4, 2015, 5:07:30 PM5/4/15
to mi...@dartlang.org
Alex, thanks! Nice to hear ByteData performs better now.

Vyacheslav, no issues now. But frankly, I try to avoid doing any benchmarks myself when it comes to VM environments, it's too tricky :). Never know when you run into some special case you guys highly optimized, and you'll never get the same performance in a real app.
Reply all
Reply to author
Forward
0 new messages