--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscribe@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
I drafted an implementation outline in kafka-streams to address
the problem of sliding-window reordering (to cater for late
messages within the time window), it also caters for
de-duplication:
You can implement something similar in akka-streams I believe.
First thing that comes to mind is to sink messages into a sorted
map (keyed by event-time timestamp and msg key pair) and then a
new periodic source picks them up - and connect the two with a
Flow.fromSinkAndSource. You'll need to take care of offset commits
- after the windowing de-duplication stage, i.e. on restart you
don't want to lose the messages buffered in the map.
Looking forward to ideas how to do this better.
Cheers,
Michał
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
Michal Borowiecki | |||||||||||||||||||||||||||||
Senior Software Engineer L4 | |||||||||||||||||||||||||||||
|
|
|
|||||||||||||||||||||||||||
This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postm...@openbet.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by OpenBet for employment and security purposes. To protect the environment please do not print this e-mail unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company registered in England and Wales. Registered no. 3134634. VAT no. GB927523612 |
I drafted an implementation outline in kafka-streams to address the problem of sliding-window reordering (to cater for late messages within the time window), it also caters for de-duplication:
You can implement something similar in akka-streams I believe.
First thing that comes to mind is to sink messages into a sorted map (keyed by event-time timestamp and msg key pair) and then a new periodic source picks them up - and connect the two with a Flow.fromSinkAndSource. You'll need to take care of offset commits - after the windowing de-duplication stage, i.e. on restart you don't want to lose the messages buffered in the map.
Looking forward to ideas how to do this better.
Cheers,
Michał
On 23/06/17 17:42, Shiva Ramagopal wrote:
Hi,
I'm looking for the latest and greatest techniques and thoughts in stream deduplication and would love to know if anyone here has done this at scale. Specifically, I'm looking for deduping that also handles late-arriving messages.
In the past few days of my search, I've mostly come across ideas and implementations like
- Batching the stream based on time windows (non-overlapping) and deduping within the batch- Possible improvements on the above technique using overlaping time windows- HDFS-specific cases where a stream is consumed (pretty batchy), written to HDFS and deduped there
My source is Kafka, if that helps.
ThanksShiv
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscribe@googlegroups.com.