[ANN] Narrator: expressive, composable stream analysis

Zach Tellman

unread,

Nov 2, 2013, 6:28:27 PM11/2/13

to clo...@googlegroups.com

This is a reimplementation of an approach I've discussed in several talks [1] [2], with an eye towards performance, memory efficiency, and flexibility w.r.t. how the event stream is represented. The readme does a good job of explaining how it works, but there have been a number of new event processing libraries recently (core.async, EEP, etc.), so I'll spend some time here describing how this differs.

First, this library is focused on aggregations over event streams, not arbitrary transformations. It is designed such that these aggregations can be automatically parallelized, and use non-thread-safe data structures (such as those in the excellent stream-lib [3]) without having to worry about coordination. As such, within this narrower application it has a richer set of operators, and should be a fair bit faster (millions of messages/sec/core).

Second, this has support for time-series analysis of ordered streams, either historical or in real time. The input for either type of analysis can be normal sequences, core.async channels, or Lamina channels. At Factual we use this for aggregations across many of our real-time systems, and I also use it for both ad hoc queries and daily rollups of logs and other historical data.

On a personal note, I think this is one of the most interesting and useful libraries I've written. I'm really looking forward to seeing how people use it, and encourage feedback on how to make it better.

Zach

[1] http://www.infoq.com/presentations/analyze-running-system

[2] http://vimeo.com/45132054#!

[3] https://github.com/addthis/stream-lib

Ben Wolfson

unread,

Nov 2, 2013, 6:36:14 PM11/2/13

to clo...@googlegroups.com

seems kind of similar to babbage: https://github.com/ReadyForZero/babbage/tree/1.1

--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry]

Zach Tellman

unread,

Nov 2, 2013, 7:16:18 PM11/2/13

to clo...@googlegroups.com

I was aware of Babbage, but haven't used it. There is a certain similarity to the syntax, but I think most (if not all) of the things I listed differentiate Narrator from Babbage, as well. Please correct me if I'm wrong.

You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/gXGVXgqd9Xs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.

dm3

unread,

Nov 10, 2013, 6:03:33 AM11/10/13

to clo...@googlegroups.com

I've read about Lamina and Narrator, watched the linked videos and I think I understand how it all fits together:

1) Instrument the applications using Lamina's `instrument` or `trace`

2) Probe the instrumented code somehow by channeling the traces to some endpoint (how do you do this? do you automatically probe everything and channel to some remote service? Do you have some method of dynamically enabling/disabling probes on your services (e.g. embedded repl or some management endpoint)?

3) Analyze the traces remotely using Narrator + networking code + a UI (that's the gist of the Omphalos as I've understood). If so, what would you say is the largest difference between Riemann's[1] stream analysis functionality and what is provided by Narrator? Would you say that Omphalos and Riemann serve completely different puproses?

Am I on the right track?

[1] http://riemann.io/

Zach Tellman

unread,

Nov 10, 2013, 5:09:18 PM11/10/13

to clo...@googlegroups.com

Riemann is a service for receiving streams of events, and causing one or more side-effects (sending email, routing to Graphite, etc). It can do arbitrary transformations on event streams (the effects from an input may be arbitrarily time shifted), and assumes that the inputs are fixed structure (numbers, shallow maps, etc).

Narrator is a library for analyzing streams of events, and returns either a single value representing the stream, or values representing fixed, regular intervals within the stream. The analysis can only on that interval (or multiples of that interval, using 'moving'). Using the 'recur' operator, it can do analysis on arbitrarily nested structures.

At Factual, we use both of these in tandem. Since the trace data for function call trees is arbitrarily nested, we use Narrator, and then separate the data into flattened sub-parts, and pass it onto Riemann. We typically have a fixed set of functions that we're tracing (entry points for HTTP requests, primarily), and automatically send them along via UDP to Omphalos [1]. Obviously the instrumented functions that are called in the process of creating a response may change, this is within the control of the authors of the libraries we use.

Does this answer all your questions? I'm happy to elaborate on any of the above.

Zach

[1] This is discussed in more detail here: http://www.infoq.com/presentations/analyze-running-system

--

Reply all

Reply to author

Forward