A data structure is not only used for organizing the data. It is also used for processing, retrieving, and storing data. There are different basic and advanced types of data structures that are used in almost every program or software system that has been developed. So we must have good knowledge about data structures.
In programming, an algorithm is a set of steps for solving a known problem. The problems solved by an algorithm could be sorting a set of data, searching through available data, or even encrypting data.
A data structure is a named location that can be used to store and organize data. And, an algorithm is a collection of steps to solve a particular problem. Learning data structures and algorithms allow us to write efficient and optimized computer programs.
Do you actually use data structures and algorithms on your day to day job? I've noticed a growing trend of people assuming algorithms are pointless questions that are asked by tech companies purely as an arbitrary measure. I hear more people complain about how all of this is a purely academic exercise. This notion was definitely popularized after Max Howell, the author of Homebrew, posted his Google interview experience:
While I've also never needed to use binary tree inversion, but I have come across everyday use cases of data structures and algorithms when working at Skype/Microsoft, Skyscanner and Uber. This included writing code and making decisions based on these concepts. But for the most part, I used this knowledge to understand how and why some things were built. Knowing these concepts made it easy to understand design and implementations referencing these.
This article is a set of real-world examples where data structures like trees, graphs, and various algorithms were used in production. I hope to illustrate that a generic data structures and algorithms knowledge is not "just for the interview" - but something that you'd likely find yourself reaching for when working at fast-growing, innovative tech companies. If you want to buy one book that teaches you everything you need to know for the majority of interviews, Grokking Algorithms is hands down the best guide.
I've used a very small subset of algorithms, but almost all data structures. It should be of no surprise that I am no fan of algorithm-heavy and non-practical interview questions with exotic data types like Red-Black trees or AVL trees. Never asked these, and never will. You can read about what I think about these interviews at the end of this article. Still, there is lots of value in being aware of what options for basic data types they can choose to tackle certain problems. With this, let's jump into examples.
Many frameworks that render UI elements use tree structures behind the scenes. React maintains a virtual DOM, and uses smart reconciliation - a "diffing" algorithm - to deliver great performance, by only re-rendering parts of the screen that's changed. Ryan Bas visualizes this process in his writeup on React reconciliation.
Multi-city was one of the features that took Skyscanner quite a bit of time to build - in all fairness, the difficulty was more on the product side, than anything. The best multi-city deals are calculated by using shortest path algorithms like Dijkstra or A*. Flight routes are represented as a directed graph, with each edge having a weight of the cost of the ticket. Calculating the cheapest price option between two cities was done via an implementation of a modified A* search algorithm per route. If you're interested in flights and shortest paths, the article on implementing the shortest flight search path using BFS by Sachin Malhotra is a good read.
There are definitely some real-world use cases where efficient sorting matters, and having control over what type of sorting you use, based on the data, can make a difference. Insertion sort can be useful when streaming realtime data in large chunks and building realtime visualization for these data sources. Merge sort can work well with divide-and-conquer approaches if it comes to large amounts of data stored on different nodes. I've not worked with these, so I'll still mark sorting algorithms as something with little day-to-day use, beyond the appreciation of the different approaches.
The most frequent data structure I've used regularly was hashtables and the hashing function. It's such a handy tool from counting, to detecting duplications, to caching, all the way to distributed systems use cases like sharding. After arrays, it's easily the most common data structure I've used on countless occasions. Almost all languages come with this data structure, and it's simple to implement if you'd need it.
The stack data structure will be very familiar to anyone who has debugged a language that has a stack trace. As a data structure, I've had a few problems to use it for, but debugging and performance profiling makes me intricately familiar with it. It's also an obvious choice when traversing a tree depth-first.
I rarely had to choose queues as data structures for my code, but I came across it plenty of times in codebases, code popping, or pushing values in it. A common use case is implementing BFS traversing of trees, where the queue data structure lends itself. You can also use queues for a variety of other use cases. I once read code scheduling jobs that made good use of priority queues, running the shortest jobs first, using the Python heap queue algorithm.
One of the most difficult and interesting problems to solve at Uber is how to optimize the pricing of trips, and the dispatching of partners. Prices can be dynamic, and drivers are constantly on the move. H3 is a grid system engineers built to both visualize and analyze data across cities, at an increasingly granular level. The data - and visualization - structure for this is a hexagonal grid with hierarchical indexes, and a couple of internal visualization tools are built on top of it.
The data structure has specific indexing, traversal, hierarchical grid, region, and unidirectional edge functions, detailed in the API reference. For a more detailed deep-dive, see the article on the H3 library, the source code, or the presentation on how and why this tool was built.
What I really liked about this experience is learning that creating your own specialized data structures can make sense in a niche area. There aren't many use cases where hexagonal grids with hierarchical indexes would make sense beyond combining mapping with various data levels within each cell. Still, if you're familiar with some data structures, understanding this new data structure is much easier - as it would be for you to design yet another data structure for a similarly specialized need.
Those were the highlights of the actual data structures and algorithms I've come across professionally between multiple companies and many years. So let's go back to the original tweet that complained about asking things like inverting a binary tree on a whiteboard. I'm on Max's side on this one.
Knowing how complex algorithms or exotic data structures work are not something you should need to know to work at a tech company. You should know what an algorithm is, and should be able to come up with simple ones on your own, like a greedy one, a breadth-first or depth-first search or a simpler sorting algorithm. You should also know about basic data structures that are pretty common, like hashtables, queues, or stacks. But more complicated algorithms like Quicksort, Dijkstra, A* and more advanced ones are not things you'd need to memorize: you'll have a reference for this. The most I did with algorithms beyond sorting was look them up and understand them myself. Same with exotic data structures like Red-Black trees or AVL trees. I've never had to use these data structures. Even if I did, I would look them up again. I have never asked questions that needed this kind of knowledge to solve them, and never will.
I am all for asking practical coding exercises, where there are many good solutions, from brute force or greedy approaches to potentially more sophisticated ones. For example, asking to implement a justify text function like this question is a fair one: it's something I did when building a text rendering component on Windows Phone. You can solve this problem just by using an array and a few if/else statements, without any fancy data structures.
The reality is that many teams and companies are going overboard with algorithmic challenges. I can see the appeal of algorithmic questions: they give you signal in 45 minutes or less, and questions can be easily swapped around; thus, there is little harm if the question leaks. They are also easy to scale when recruiting, as you can have a question pool of 100+ questions, and any interviewer can evaluate any of them. Especially in Silicon Valley, it is more and more common to hear questions geared for dynamic programming or exotic data structures. These questions will help hire strong engineers - but also result in turning away people who would have excelled at a job that doesn't need advanced algorithms knowledge.
To anyone reading whose company has a bar to hire people who know some of the advanced algorithms by heart: think again if this is what you need. I've hired fantastic teams at Skyscanner London and Uber Amsterdam without any tricky algorithm questions, covering no more than data structures and problem solving. You shouldn't need to know algorithms by heart. What you do need is awareness of the most common data structures and the ability to come up with simple algorithms to solve the problem at hand, as a toolset.
If you work in a fast-moving, innovative tech company, you will almost certainly come across all sorts of data structures and different algorithm implementations in the codebase. When you build new and innovative solutions, you will often find yourself reaching to the right data structure. This is when you'll want to be aware of the options to choose from and their tradeoffs.
aa06259810