Cutting one step on real time pipeline : stream-collector > kinesis > elasticsearch

37 views

Skip to first unread message

André Ikeda

unread,

Apr 12, 2016, 2:22:07 AM4/12/16

to snowpl...@googlegroups.com

Hi guys,

When implementing the real time pipeline, handling a lot of data, I ended up with this architecture:

AWS:

[Step 1]

-> Load balancer ->

[Step 2]

-> 3 collectors instances ->

[Step 3]

-> kinesis [6 shards for collectors output] ->

[Step 4]

-> 3 enrichment instances ->

[Step 5]

-> kinesis [6 shards for enrichment output] ->

[Step 6]

-> sink instance, 3 process sinking into -> Elasticsearch [Single node]

But when I was at a debug session to identify where I was "losing data", I realized that I could send the output/stdout of [Step 2] directly to enrichments process on the same instance, cutting [Step 2], 1 kinesis stream with 6 shards at [Step 3] and eliminating 3 instances for [Step 4].

The output of enrichment process is sent to Kinesis just because I cant send data directly to Elasticsearch if my input is from stdin.

Does it make sense? What are the cons about this decision?

thanks in advance,

André

Ihor Tomilenko

unread,

Apr 12, 2016, 2:33:20 AM4/12/16

to Snowplow

Hi André,

We no longer actively support this group. Instead, we ask to post any new topics/discussion to our newly created forum here: http://discourse.snowplowanalytics.com/

Could you, please, re-post this topic on Discourse and we will be happy to assist you.

Regards,

Ihor

Reply all

Reply to author

Forward

0 new messages