Stream IO and back-pressure?

59 views
Skip to first unread message

Istvan Soos

unread,
Oct 11, 2017, 6:19:35 AM10/11/17
to General Dart Discussion
I've encountered this over the past few months several times, and I
have no clue how to debug or resolve it:

Case 1: Reading a large file (> 10GB) line-by-line.
Case 2: Running a 'SELECT * FROM table;' in Postgres (>1M rows).

In both cases, the individual lines or rows are not too large, usually
a few hundreds of kilobytes max, and the related processing is not
that big deal either, mostly some kind of aggregation which may have
an async component in it.

However, when that async processing takes a bit more time, the IO
operations can fill up the buffers, and the memory consumption becomes
much more than it should be (e.g. in the 10 GBs instead of a few
hundred MB).

What is the proper way (or best practice) to apply back-pressure, so
that the source does not floods the memory?

What can we do, if the source is a third-party library that doesn't do
the ideal thing (e.g. may not propagate pause)?

Thanks,
Istvan

Lasse R.H. Nielsen

unread,
Oct 11, 2017, 12:38:42 PM10/11/17
to mi...@dartlang.org
On Wed, Oct 11, 2017 at 12:19 PM, Istvan Soos <istva...@gmail.com> wrote:
I've encountered this over the past few months several times, and I
have no clue how to debug or resolve it:

Case 1: Reading a large file (> 10GB) line-by-line.
Case 2: Running a 'SELECT * FROM table;' in Postgres (>1M rows).

In both cases, the individual lines or rows are not too large, usually
a few hundreds of kilobytes max, and the related processing is not
that big deal either, mostly some kind of aggregation which may have
an async component in it.

However, when that async processing takes a bit more time, the IO
operations can fill up the buffers, and the memory consumption becomes
much more than it should be (e.g. in the 10 GBs instead of a few
hundred MB).

What is the proper way (or best practice) to apply back-pressure, so
that the source does not floods the memory?

That would be StreamSubscription.pause when your buffer reaches a certain level.
I don't remember whether we have a buffer stream-transformer in package:async, otherwise it might be useful to add one.
Say, "buffer at most n events, then pause until there's only k (<n) events available".
It should be relatively easy to write if we don't have it yet.
 
What can we do, if the source is a third-party library that doesn't do
the ideal thing (e.g. may not propagate pause)?

Then the third-party library needs to provide some other way to ask the source to back off.
If there is no such feature, I guess your only option is to "process faster!" - not really a viable option :)
 
/L
--
Lasse R.H. Nielsen - l...@google.com  
'Faith without judgement merely degrades the spirit divine'
Google Denmark ApS - Frederiksborggade 20B, 1 sal - 1360 København K - Denmark - CVR nr. 28 86 69 84
Reply all
Reply to author
Forward
0 new messages