lindfadi sanjaye markieta

0 views
Skip to first unread message

Adrienne Borgman

unread,
Aug 2, 2024, 10:59:51 AM8/2/24
to raltailangbo

Now that I understand what "streaming data" is, then I understand what Kafka and Kinesis mean when they bill themselves as processing/brokering middleware for applications with streaming data. But it has piqued my interests: can/should "stream middleware" like Kafka or Kinesis be used for non-streaming data, like traditional message brokers? And vice versa: can/should traditional MQs like RabbitMQ, ActiveMQ, Apollo, etc. be used for streaming data?

Let's take an example where an application will be sending its backend constant barrage of JSON messages that need to be processed, and the processing is fairly complex (validation, transforms on the data, filtering, aggregations, etc.):

Based on how Kafka/Kinesis are billed and on my understanding of what "streaming data" is, they seem to be obvious candidates for Cases #1 (contiguous video data) and #2 (contiguous time-series data). However I don't see any reason why a traditional message broker like RabbitMQ couldn't efficiently handle both these inputs as well.

And with Case #3, we're only provided with an event that has occurred and we need to process a reaction to that event. So to me this speaks to needing a traditional broker like RabbitMQ. But there's also no reason why you couldn't have Kafka or Kinesis handle the processing of event data either.

So basically, I'm looking to establish a rubric that says: I have X data with Y characteristics. I should use a stream processor like Kafka/Kinesis to handle it. Or, conversely, one that helps me determine: I have W data with Z characteristics. I should use a traditional message broker to handle it.

So I ask: What factors about the data (or otherwise) help steer the decision between stream processor or message broker, since both can handle streaming data, and both can handle (non-streaming) message data?

Kafka deals in ordered logs of atomic messages. You can view it sort of like the pub/sub mode of message brokers, but with strict ordering and the ability to replay or seek around the stream of messages at any point in the past that's still being retained on disk (which could be forever).

Kafka's flavor of streaming stands opposed to remote procedure call like Thrift or HTTP, and to batch processing like in the Hadoop ecosystem. Unlike RPC, components communicate asynchronously: hours or days may pass between when a message is sent and when the recipient wakes up and acts on it. There could be many recipients at different points in time, or maybe no one will ever bother to consume a message. Multiple producers could produce to the same topic without knowledge of the consumers. Kafka does not know whether you are subscribed, or whether a message has been consumed. A message is simply committed to the log, where any interested party can read it.

Unlike batch processing, you're interested in single messages, not just giant collections of messages. (Though it's not uncommon to archive Kafka messages into Parquet files on HDFS and query them as Hive tables).

Case 1: Kafka does not preserve any particular temporal relationship between producer and consumer. It's a poor fit for streaming video because Kafka is allowed to slow down, speed up, move in fits and starts, etc. For streaming media, we want to trade away overall throughput in exchange for low and, more importantly, stable latency (otherwise known as low jitter). Kafka also takes great pains to never lose a message. With streaming video, we typically use UDP and are content to drop a frame here and there to keep the video running. The SLA on a Kafka-backed process is typically seconds to minutes when healthy, hours to days when healthy. The SLA on streaming media is in tens of milliseconds.

Case 3: You can use Kafka for this kind of thing, and we do, but you are paying some unnecessary overhead to preserve ordering. Since you don't care about order, you could probably squeeze some more performance out of another system. If your company already maintains a Kafka cluster, though, probably best to reuse it rather than take on the maintenance burden of another messaging system.

RabbitMQ:general purpose messaging..., often used to allow web servers to respond to requests quickly instead of being forced to perform resource-heavy procedures while the user waits for the result. Use when you need to use existing protocols like AMQP 0-9-1, STOMP, MQTT, AMQP 1.0

It may sometimes be useful to use both! For example in Use Case #2, if this was a stream of data from a pace-maker say, I would have pace-maker transmit heartbeat data to a RabbitMQ message queue (using a cool protocol like MQTT) where it is immediately processed to see if the source's heart is still beating. This could power a dashboard and an emergency response system. The message queue would also deposit the time series data into Kafka so that we could analyse the heartbeat data over time. For example we might implement an algorithm to detect heart disease by noticing trends in the heartbeat stream.

Who would have thought facilitating payments for Beanie Baby trades could be so lucrative? The only acquisition on our list whose value we can precisely measure, eBay spun off PayPal into a stand-alone public company in July 2015. Its value at the time? A cool 31x what eBay paid in 2002.

Welcome to season three, episode nine of acquired the show about technology acquisitions. And IPOs. I'm Ben Gilbert. I'm David Rosenthal and we are your hosts. Today we are back with the acquired version of Terminator two, the second part of our Netflix episode. You're like that, David. It's just for you. Oh man, that's great.

That's great. I love it. Listeners. Now, if you remember it, the last episode, we did cover the DVD saga of Netflix and where we left our heroes in 2009 shortly before the Epic launch of Quickster. So today we're going to dive in on the era of streaming and later original content. So David, I wanted to have a, a, a fun fact to start us off on, on Netflix.

So as you remember, they were once a plucky startup mailing DVDs to customers and, and, uh, you know, a remnant of the pre.com bubbles starting in 97. And they were doing this, you know, even before most people had DVD players, they were waiting for the DVD wave to crest. This company now accounts for 15% of all internet traffic.

Oh, no, that's in my show notes. I, well, sorry to blow your cover early, but you know, streaming movies and TV as a category actually now makes up 58% of downstream internet traffic and no single service accounts for more of that, that bandwidth then, uh, that Netflix does, and at peak times it can even account for 40% of the U S as concurrent internet traffic.

So you could imagine maybe like 8:00 PM Eastern or something like that. Absolutely incredible. Yeah. And this is with some of the best compression and optimization technology that like humans as a species have figured out how to do it. The last episode was about a company fighting to get its first 500,000 customers, and this episode is very much about sort of global domination.

All right. Listeners, we announced on the last episode that we had formally launched the acquired limited partner program and we've been just totally floored by how many of you have have joined our LP community and are listening to the bonus show and are sending us really great questions for, um, doing Q and a on the show.

David last week's episode was like very fun, so I'm pumped. I got to meet Dan and thanks for bringing them on. Yeah, it was super fun. We had Dan Hill, uh, who in addition to being the CEO of waves, first portfolio company, Alma co, founder and CEO. He was Airbnb's head of growth for a long time and had just great stories about growing Airbnb from, you know, series B days to $30 billion plus.

And. There's so much to learn from him. Um, so really fun to have him on the LP show. Anyway, listeners, if you want to hear Dan talk about why Airbnb was successful sort of in this space and how they chose their metrics and a bunch of other great stuff, you can click the link in the show notes to support the show or go to glow.fm/acquired.

dot. FM slash acquired. I feel like we really need a jingle for that. Did you do, we could just play that every time. Yeah, that's, yeah. Acquire needs, better jingles, period. That might be one of my holiday, a holiday projects back to the show. Now, before we dive in, as always, listeners, I want to thank our sponsors for all of season three Silicon Valley bank.

We have with us today, Al Guerrero, a managing director in the Santa Monica office. So Al focuses on digital media, e-sports, gaming, and other interactive frontiers. Perfect for this episode. Al, thank you for joining us. One question for you. What opportunities do you see for startups in digital media today?

Yeah, there's a lot of people know it's been a. A tricky year for a lot of digital media companies, and we're seeing a lot of them look at expanding their business model by doing live events, launching subscription models. But the one that I really want to focus in on is the companies that are selling product.

And so what's happening is a lot of these digital media companies are leveraging their audience and their engagement with the audience to sell authentically, sell a product. As an example, there's a company here in LA called click brands that creates female fashion related and beauty related content.

They're leveraging the insights they get from their audience to then launch and develop clothing lines, which they've actually partnered with target to successfully sell clothing line. All leveraging the data and the insight circuit they're getting from their audience. And again, it's a very authentic, natural extension of the content that they're creating on a daily basis.

Awesome. Super interesting. Thank you, Alan and listeners. Al just published a great piece on medium called what's next for digital media startups. That kind of goes deeper into this product idea as well as sort of all the other newer revenue models that digital media startups are trying to use today. So if you're into this topic, you should check it out.

90f70e40cf
Reply all
Reply to author
Forward
0 new messages