Stream Processing [was [PiLuD] Queue Machines]

25 views

Skip to first unread message

David Barbour

unread,

Oct 25, 2011, 10:24:27 PM10/25/11

to pi...@googlegroups.com

On Tue, Oct 25, 2011 at 2:34 PM, Jeremy Tregunna <jeremy....@me.com> wrote:

Looking for performance characteristics, implications, etc. But it helps when you look for the right thing.

Stream processing has low memory locality (touch a lot of code per element), and uses a lot of memory (each queue is a buffer). OTOH, it composes well (pipelines, merges), parallelizes well (early parts of a pipeline can run in parallel with later parts, and multiple pipelines can easily run parallel side-by-side), and has highly predictable performance characteristics (CPU, latency, memory costs).

Consequently, stream processing is a very effective basis for:

* high-performance, scalable, parallel or distributed computing

* real-time computing on any architecture

* embedded systems where predictable memory is important

Stream processing is used heavily for these purposes, both in languages (such as Lucid and Lustre, `synchronous reactive programming`, often used in robotics) and architectures (OpenGL shader pipelines; flow-based programming; etc.).

A benefit of stream processing is that it's `natural` to associate state with locations in a stream. Even in a pure state approach, the state naturally supports caching, memoization, `diff` or `patch` processing and compression, and other optimizations. By comparison, there is no `natural` way to associate state with a behavior on a stack. Stack activities are naturally ephemeral. Developers can work around the issue by adding a `heap` or static state to their model, albeit at cost of complication.

There is considerable variation among stream processing systems regarding what elements of the stream `mean`, and how side-effects and state are supported. Elements in a stream might represent events or a time-varying value. Effects might be triggered on events, continuous, or shifted entirely to the edges. State might be internal, i.e. accumulate history while computing the stream, or externalized to a database or tuple space. There are many design concerns to weigh here, such as events and internal state allow a lot of expressiveness, but using only state can support the language designer in ensuring useful properties like resilience against disruption, consistency among independent observers, and pushes decisions for caching and memoization and other optimizations to a larger system view.

Consider: Functional Reactive Programming (FRP) and Reactive Demand Programming (RDP) are both stream processing models. FRP traditionally supports both events and behaviors, allows internal state (causal behaviors, event accumulators) but forces developers to propagate values to the edge of the system to integrate effects. RDP restricts developers to time-varying values, and forbids internal state, but does allow a continuous effects in the middle (including access to external state).

Anyhow, there are many stream processing models, just as there are many stack-based programming models.

IMO, stack-based programming survives on path dependence. It was an effective design for a `central` processing unit and a single application with a complete absence of concurrency. But it's a rather poor fit for modern computation and service architectures - whether you're considering FPGAs, GPGPUs, or OpenCL, or something larger like web-services. While I mention complications for representing long-lived state, there are many other complications in the stack architecture - e.g. composing concurrent behaviors. Even if you do pursue stack-basis, you'd be better off at least modeling it as continuation passing rather than a true stack!

A stack is a fine example of a `simplistic` model that causes complications everywhere else! Stack-based programming hurts its users again, and again, and again. But it's hard to recognize this abuse until you've escaped it.

Language designers must choose between what is `simple` and what is `easy`.

spir

unread,

Oct 26, 2011, 4:12:19 AM10/26/11

to pi...@googlegroups.com

On 10/26/2011 04:24 AM, David Barbour wrote:
> Language designers must choose between what is `simple` and what is `easy`.

Great remark.
(Both often tend to go hand in hand, because simpler means easier to think as
long as it is also correct (!). But when simpler comes with incorrectness,
oversimplicity, lack of facilities,...)