[X: Past Is Present Movie In Hindi 720p

0 views

Skip to first unread message

Iberio Ralda

unread,

Jun 13, 2024, 12:20:28 AM6/13/24

to viconpuncper

I recently gave a talk with this title at Sisu's Future Data conference, and since I think in prose and not Powerpoint, I had to write the blog post before I could put the slides together. It's taken me a bit to put the final polish on this and release it to the world, but I hope you find it valuable. If you'd like to watch the talk in full you can find it here.

X: Past is Present movie in hindi 720p

Download Zip ··· https://t.co/0hA1JwQ4De

Data products have drawn lots of attention, raised a lot of capital, and generated a lot of traction over the past decade. They've created a tremendous amount of change in how the most data-forward organizations are run---Stitch Fix is a million miles away from being a traditional clothing retailer and Airbnb does not at all resemble a traditional hotelier. And data products have fundamentally changed the careers of many of us data professionals, creating space for entirely new job titles and elevating once-menial roles into highly strategic career paths.

? We have several talks lined up at Coalesce (next week!) on how data products have changed careers and teams: starting an analytics engineering team, structuring a data team, and adopting a product mindset.

But for all of this change, I feel like we've hit a bit of a plateau over the past few years. I've personally been working in the "modern data stack" now since late 2015---five whole years! And during that time, the set of products that make up this best-of-breed stack have been reasonably consistent (this list is certainly not exhaustive):

What's more, while there certainly have been incremental advances in each of these products in that timeframe, none of their core user experiences has fundamentally changed. If you fell asleep, Rip Van Winkle-style, in 2016 and woke up today, you wouldn't really need to update your mental model of how the modern data stack works all that much. More integrations, better window function support, more configuration options, better reliability... All of these are very good things, but they suggest a certain maturity, a certain stasis. What happened to the massive innovation we saw from 2012-2016?

To be clear, all of the above applies to dbt just as much as it does to any of the other products above. If you compare dbt-circa-2016 to dbt-circa-2020 you'll find that, while the modern version is far more powerful, the core user experience is very similar. My goal here is not to cast aspersions, but rather to attempt to understand the dynamics of the product ecosystem that all of us are building our careers on top of.

This feels important to me. Humans are tool-building and tool-using creatures---our tooling defines our capabilities, and has, for our entire history as a species. As such, the progress of tooling in this space could not be more relevant to us as practitioners. When I first used Redshift in 2015 I felt like I had been granted superpowers. When am I getting more?

When Fishtown Analytics moved into our new office in November of 2019, one of the first things I did was to hang a painting on the wall. It's a piece of modern art from the 70's called Redshift, and I bought it in an auction on Everything But The House because I loved the name. In my opinion, the modern data stack catalyzed around the release of Amazon Redshift in October of 2012, and hanging this massive painting at the entry to our office memorialized its historic importance.

While several of these products were founded prior to Redshift's launch, the launch is what made their growth take off. Using these products in conjunction with Redshift made users dramatically more productive. Looker on Postgres is fine, but Looker on Redshift is awesome.

This night-and-day difference is driven by the internal architectural differences between MPP (massively parallel processing) / OLAP systems like Redshift and OLTP systems like Postgres. A complete discussion of these internals is beyond the scope of this post, but if you're not familiar I highly recommend learning more about this, as it shapes nearly everything about the modern data stack today.

While Redshift is a very capable MPP database, it wasn't the first. MPP databases had been popularized the prior decade, and many of those products had (and have) fantastic performance. But Redshift was the first cloud-native MPP database, the first MPP database that you could buy for $160 / month instead of $100k+ / year. And with that reduction in price point, all the sudden the floodgates opened. Redshift was, at the time, AWS' fastest-growing service ever.

10-1000x performance increases tend to change the way that you think about building products. Prior to the launch of Redshift, the hardest problem in BI was speed: trying to do relatively straightforward analyses could be incredibly time-consuming on top of even medium-sized datasets, and an entire ecosystem was built to mitigate this problem.

Before wrapping up this section, I want to just say that my statements about Redshift's historical significance shouldn't be taken as a stance on which the best data warehouse is today. BigQuery didn't release standard SQL until 2016 and so wasn't widely adopted prior to that, and Snowflake's product wasn't mature until the 2017-2018 timeframe (IMHO). In fact, if you looked at a breakdown of usage between the three products circa 2016 I think you'd see Redshift's usage as 10x the other two combined. So, for those of us building products in the modern data stack, Redshift was the ocean from which we evolved.

If Redshift launched so much innovation from 2012-2016, why did things start to slow down? This has been something I've been mulling over since 2018, when I first started to viscerally feel this decline in the rate of change. I realized that the stack of products we were recommending to our consulting clients had stayed the same since the day we started Fishtown Analytics, which really bothered me. Were we missing out on some groundbreaking new products? Were we getting stale?

It turns out that this is a normal cycle for industries to go through. A major enabling technology gets released, it spurs a bunch of innovation in the space, and then these products go through a deployment process as companies adopt them. You can watch this happen in the very largest technological shifts ever. In fact, I just searched "cumulative miles of railroad track," grabbed some data, and voila!---an S curve:

Each technology individually goes through its own "S" curve, from development to deployment, and as each round of technologies begins to mature it both attracts new customers and becomes more technologically mature.

What we saw from 2005 (when Vertica was released) and 2012 (when Redshift was released) was the early development phase for the MPP database---the beginning of its S curve. And from there, it's gone warehouse >> BI >> ingestion >> transformation. Note that we are still in the early days of this deployment curve!

When I inspect this theory as a user, it checks out. I can tell you from first-hand knowledge that the experience of using literally every one of the products I listed above has improved dramatically over the past four years. Yes, Fivetran and Stitch still move data from point A to point B, but their reliability has improved dramatically, as has their connector coverage. The same is true for the other layers of the stack as well. dbt, whose path I know quite well, has been completely rearchitected since 2016 to be more modular, more performant, and more extensible---all this while not changing the fundamental UX.

This is what it looks like to traverse up the S curve. Early adopters are forgiving, but technologies need to improve to be adopted by larger and larger audiences. The telegraph went through the same thing: Thomas Edison invented a telegraph multiplexer in 1874, thereby enabling Western Union to quadruple the throughput of its existing lines. Same telegraph, more throughput.

Seen through this frame, this is actually quite exciting. We're seeing these foundational technologies mature: to extend their coverage to more use cases, to become more reliable. These are exactly the things that need to happen to enable the next wave of innovation in the modern data stack, which will be unlocked by these now-foundational technologies.

Let's summarize real quick. We saw a tremendous amount of innovation immediately following the launch of Redshift in 2012, unlocking brand new levels of performance, efficiencies, and new behaviors. We then saw a maturation period as these nascent products were deployed by the market, improved their technology, and rounded out their feature sets. By now, these products are ready to act as a foundation on which successive innovations can be built.

I'm not an oracle, but I do spend a lot of time thinking about this stuff and have lots of conversations with interesting people building and investing in products in the space. I think we can take useful clues from the state of the world today: both the good and the bad. The good aspects represent the places of strength, our solid foundation to build on, while the bad aspects represent opportunity areas.

Governance is a product area whose time has come. This product category encompasses a broad range of use cases, including discovery of data assets, viewing lineage information, and just generally providing data consumers with the context needed to navigate the sprawling data footprints inside of data-forward organizations. This problem has only been made more painful by the modern data stack to-date, since it has become increasingly easy to ingest, model, and analyze more data.

While there have been commercial products in this space for some time (Collibra and Alation are most often cited), they tend to be focused on the enterprise buyer and therefore haven't seen the broad adoption that is true for the rest of the modern data stack. As such, most companies don't use a governance product today.

I've written a lot about this topic, as it's one that's very adjacent to the work we do with dbt. dbt actually has its own extremely lightweight governance interface---dbt Docs---and we anticipate doing a lot of work to extend this existing functionality in the coming years.