Howto push a data steam to an output?

39 views
Skip to first unread message

Timothy Washington

unread,
Jan 2, 2018, 6:52:42 PM1/2/18
to onyx...@googlegroups.com
Hello, I have more of a design question.

A) With Aggregation & State Management, I'll have an input switch that can be flipped on and off. When on, the system will start consuming an external data stream, as input
  • ‎Within a workflow, is there a pattern to push that data steam to an output? Ie, this is not a pure function.
    • I can't think of an applicable plugin here.
    • Do flow conditions apply here?
    • Is there a concept such as sub-workflows? Challenge 1-3 was the closest possibility I saw.

B) Thinking of aggregation state inside of a workflow, I've only seen examples of basic data types used.
  • Can we have a stateful job with things like async channels? especially with distributed state? 
    • I saw an example in Challenge 4-3, where an atom was created inside of a Lifecycle.

C) I also want to multiplex multiple data streams in the same channel.
  • Is there a referentially transparent way to partition (for Kafka) a data steam? 

Of course I've already scanned through the "onyx tests, onyx-examples, learn-onyx" resources. But didn't see anything that quite fit the bill. Are there recommended approaches for each situation? 


Mike Drogalis

unread,
Jan 3, 2018, 10:01:46 AM1/3/18
to Onyx
On Tue, Jan 2, 2018 at 3:52 PM, Timothy Washington <twas...@gmail.com> wrote:
Hello, I have more of a design question.

A) With Aggregation & State Management, I'll have an input switch that can be flipped on and off. When on, the system will start consuming an external data stream, as input
  • ‎Within a workflow, is there a pattern to push that data steam to an output? Ie, this is not a pure function.
    • I can't think of an applicable plugin here.
    • Do flow conditions apply here?
    • Is there a concept such as sub-workflows? Challenge 1-3 was the closest possibility I saw.
It sounds like you may need a plugin to read from whatever data source you're ingesting from that responds to a global switch. You can either have a lifecycle periodically poll for the global on/off state somewhere, or feed through marker messages in your stream indicating what the state should be. You may want to move all of this logic out of the stream processor to present a simpler ingest layer -- it sounds like this piece is hairy, regardless of the technology.

There are no sub-workflows.
 
B) Thinking of aggregation state inside of a workflow, I've only seen examples of basic data types used.
  • Can we have a stateful job with things like async channels? especially with distributed state? 
    • I saw an example in Challenge 4-3, where an atom was created inside of a Lifecycle.

Stateful jobs cannot have non-serializable state that is persisted between jobs. State is snapshotted to S3 via checkpoints.
You can have things like async channels via lifecycles. We should probably remove that challenge from learn-onyx.
I wrote it up before we supported exactly-once state tracking with :windows and :triggers. While you can still have an
atom in a lifecycle, there's no reason not to use windows, and an atom doesn't have any real durability.  
 
C) I also want to multiplex multiple data streams in the same channel.
  • Is there a referentially transparent way to partition (for Kafka) a data steam? 

You can manually set the partition key for every outgoing message to Kafka.  
 
Of course I've already scanned through the "onyx tests, onyx-examples, learn-onyx" resources. But didn't see anything that quite fit the bill. Are there recommended approaches for each situation? 


--
You received this message because you are subscribed to the Google Groups "Onyx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to onyx-user+unsubscribe@googlegroups.com.
To post to this group, send email to onyx...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/onyx-user/CAADtM-Z9eJm28_XOL5LPNxwgsm6itH8sK-ioc%2BHArOqK0e5Fww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages