Uint8List vs Stream<int> for StreamTransformer

363 views
Skip to first unread message

Алексей Князев

unread,
Feb 19, 2014, 10:57:01 AM2/19/14
to mi...@dartlang.org
I'm writing StreamTransformer that process large volumes of binary data (basically, just dividing raw stream into some valuable units, like LineSplitter).
Input is List<int> (from File.openRead().listen or Socket.listen).
What would be more efficient: 
  • to buffer data (while waiting for delimiter) and return a Uint8List instance
OR
  • return a new Stream<int> (or Stream<Uint8List> for little buffering) when a new data unit starts and closing that Stream (and creating another one) with the next unit's start?

Greg Lowe

unread,
Feb 19, 2014, 4:04:25 PM2/19/14
to mi...@dartlang.org
This is how I do it:

import 'dart:async';
import 'dart:io';
import 'dart:convert';

main() {
  var path = ...;
  new File(path)
    .openRead()
    .transform(UTF8.decoder)
    .transform(new LineSplitter())
    .forEach((l) => print('line: $l'));
}

Алексей Князев

unread,
Feb 19, 2014, 4:31:11 PM2/19/14
to mi...@dartlang.org
The question is about a right way of implementing StreamTransformer abstract class, rather than using transform() on stream.

четверг, 20 февраля 2014 г., 1:04:25 UTC+4 пользователь Greg Lowe написал:

Greg Lowe

unread,
Feb 19, 2014, 6:12:41 PM2/19/14
to mi...@dartlang.org
Sorry didn't read that carefully.

Depending on your use case, you may be able to return a Stream of Lists which are just views into the original List<int> provided by File.openRead(), this means no copies need to be made. 

I imagine Stream<int> would be pretty inefficient.





--
For other discussions, see https://groups.google.com/a/dartlang.org/
 
For HOWTO questions, visit http://stackoverflow.com/tags/dart
 
To file a bug report or feature request, go to http://www.dartbug.com/new

Greg Lowe

unread,
Feb 19, 2014, 8:09:54 PM2/19/14
to mi...@dartlang.org, gr...@vis.net.nz
Here's an example of how I would approach this. This example splits a stream of binary chunks into a stream of binary chunks + sentinel values. It allows the splitting to happen with zero copying.

Not sure if it's the best way, or the fastest way. Feedback from resident gurus appreciated ;)

https://gist.github.com/xxgreg/9104926

Alex Tatumizer

unread,
Feb 20, 2014, 9:27:28 AM2/20/14
to mi...@dartlang.org, gr...@vis.net.nz
On the surface, view looks like a slam dunk, but there's a caveat.
It all depends on whether segments you produce are going to have roughly the same lifetime.
If one segment out of, say, 100, is "special", and program later saves it (discarding the rest) then you have memory problem
(all original blocks are referred to from "special" segments, thus preventing them from GC).

Java made a full circle: with respect to strings, they at first had no support for views, then implemented slam dunk (e.g. substring could be  returned as a "view" to other string - transparently for program),
but lately they changed their mind again, declared it a tragic mistake, and returned to original implementation. Exactly for the reason stated above.

Performance-wise, at the time of writing, views have no noticeable advantage over slices, but I saw i saw some inlining for views checked in just last week (maybe not released yet), it will probably help.
But this is not an issue of performance only (see above)
 

Greg Lowe

unread,
Feb 20, 2014, 1:50:02 PM2/20/14
to mi...@dartlang.org, gr...@vis.net.nz
Yes, this is a good point. For intermediate steps in a chain of transformers this generally isn't really an issue, as at the end of the chain the data is usually converted to objects, strings and ints, and the original view references are discarded.
Reply all
Reply to author
Forward
0 new messages