Edward Ribeiro organized our last discussion for the Designing Data Intensive Applications and it was a great discussion!
Here are a couple of my takeaways.
Chapter 11: Stream Processing
This was a good review of the history of streaming. The author first went over the (now somewhat dated) publish-subscriber model of messaging systems, and then moved into discussing the new revolutionary move we've seen towards log-style message systems. I like the simplicity of the new systems over the complexity of the old. The new style of message processing implies fewer guarantees and fewer ways of working with messages, but there is strength in the simplicity because the simpler patters are sufficient for most things that you would want to do and much easier to reason about.
One thing I was looking for in the chapter was how people deal with stream processing when you need to join the stream back to another data source, either another stream (to find related messages) or to a database (to enrich messages). It turns out, there's no special sauce here and no free lunch. If you want to join to another stream then you need to keep track of a window of time for both streams and then emit events whenever you find related messages. Similary, when enriching stream messages based on data in a database, the best way to do this is to copy the entire database into the process that is handling messages so that you can make quick joins against the data. (Ouch...)
Chapter 12: The Future of Data Systems
I have not read this chapter yet, but from Edward's description it is going to serve a really neat purpose in framing the rest of the book. Throughout the book, the author discussed big chunks of infrastructure: databases, streams, batch processors; but in chapter 12, the author reveals that he actually thinks of an entire infrastructure an an analogy to one giant, unified database. For example, a log-based message system is really just the write-ahead log like MySQL uses to ensure durability. And a search engine kinda serves as a sort of caching component for the "infrastructure database". So I'm looking forward to reading this chapter and then reconsidering how all they other chapters are really talking about pieces of the "infrastructure as a database". ... Maybe I wish I'd read this chapter first!
Thanks for leading the discussion Edward! Thanks for your input Chang!