If we want to store data about people we are related to, we use a family tree as the data structure. We choose a family tree as the data structure because we have information about people we are related to and how they are related, and we want an overview so that we can easily find a specific family member, several generations back.
Abstract Data Structures are higher-level data structures that are built using primitive data types and provide more complex and specialized operations. Some common examples of abstract data structures include arrays, linked lists, stacks, queues, trees, and graphs.
Algorithms are fundamental to computer programming as they provide step-by-step instructions for executing tasks. An efficient algorithm can help us to find the solution we are looking for, and to transform a slow program into a faster one.
The algorithms we will look at in this tutorial are designed to solve specific problems, and are often made to work on specific data structures. For example, the 'Bubble Sort' algorithm is designed to sort values, and is made to work on arrays.
Data structures and algorithms (DSA) go hand in hand. A data structure is not worth much if you cannot search through it or manipulate it efficiently using algorithms, and the algorithms in this tutorial are not worth much without a data structure to work on.
On the next page we will look at two different algorithms that prints out the first 100 Fibonacci numbers using only primitive data structures (two integer variables). One algorithm uses a loop, and one algorithm uses something called recursion.
Do you actually use data structures and algorithms on your day to day job? I've noticed a growing trend of people assuming algorithms are pointless questions that are asked by tech companies purely as an arbitrary measure. I hear more people complain about how all of this is a purely academic exercise. This notion was definitely popularized after Max Howell, the author of Homebrew, posted his Google interview experience:
While I've also never needed to use binary tree inversion, but I have come across everyday use cases of data structures and algorithms when working at Skype/Microsoft, Skyscanner and Uber. This included writing code and making decisions based on these concepts. But for the most part, I used this knowledge to understand how and why some things were built. Knowing these concepts made it easy to understand design and implementations referencing these.
This article is a set of real-world examples where data structures like trees, graphs, and various algorithms were used in production. I hope to illustrate that a generic data structures and algorithms knowledge is not "just for the interview" - but something that you'd likely find yourself reaching for when working at fast-growing, innovative tech companies. If you want to buy one book that teaches you everything you need to know for the majority of interviews, Grokking Algorithms is hands down the best guide.
I've used a very small subset of algorithms, but almost all data structures. It should be of no surprise that I am no fan of algorithm-heavy and non-practical interview questions with exotic data types like Red-Black trees or AVL trees. Never asked these, and never will. You can read about what I think about these interviews at the end of this article. Still, there is lots of value in being aware of what options for basic data types they can choose to tackle certain problems. With this, let's jump into examples.
When we built Skype of Xbox One, we worked on a barebones Xbox OS, that was missing key libraries. We were building one of the first full-fledged applications on the platform. We needed a navigation solution that we could hook up both to touch gestures and to voice commands.
We built a generic navigation framework on top of WinJS. To do so, we needed to maintain a DOM-like graph to keep track of the actionable elements. To find these elements, we did DOM traversal - basically, a tree traversal - across the existing DOM. This is a classic case of BFS or DFS (breadth-first search or depth-first search).
If you're doing web development, you already work with a tree structure: the DOM. All DOM nodes can have children and the browser renders nodes on-screen after traversing the DOM tree. If you are searching for a specific element, you can use built-in DOM methods to find it - such as getElementById - or you could implement a BFS or DFS search to go through all the nodes, similar to how it's done in this example.
Many frameworks that render UI elements use tree structures behind the scenes. React maintains a virtual DOM, and uses smart reconciliation - a "diffing" algorithm - to deliver great performance, by only re-rendering parts of the screen that's changed. Ryan Bas visualizes this process in his writeup on React reconciliation.
Uber's mobile architecture, RIBs uses trees as well - similar to most UI frameworks that render elements in a hierarchy. RIBs maintains a RIB tree for state management, attaching and detaching the RIBs that need to be rendered. When working with RIBs, we would sometimes sketch where the new RIBs would fit in the hierarchy, and discuss whether the RIBs in question should have views - making it a part of the view hierarchy - or just manage state.
If you ever find yourself needing to build visualization of hirearchical elements, a common approach is to use a tree-like structure, traverse the tree and render the elements you visit. I've come across many internal tools that use this approach. An example is the RIBs visualization tool built by Uber's Mobile Platform team, that you can see in a demo in this video.
Skyscanner finds the best deals on airline tickets. It does this by scanning all routes worldwide, then putting them together. While the nature of the problem is more on crawling, and less on caching - as airlines calculate the layover options - the multi-city planning option becomes the shortest path problem.
Multi-city was one of the features that took Skyscanner quite a bit of time to build - in all fairness, the difficulty was more on the product side, than anything. The best multi-city deals are calculated by using shortest path algorithms like Dijkstra or A*. Flight routes are represented as a directed graph, with each edge having a weight of the cost of the ticket. Calculating the cheapest price option between two cities was done via an implementation of a modified A* search algorithm per route. If you're interested in flights and shortest paths, the article on implementing the shortest flight search path using BFS by Sachin Malhotra is a good read.
With Skyscanner, the actual algorithm was far less important, though. Caching, crawling, and handling the varying website load were much more difficult things to crack. Still, a variation of the shortest paths problem comes up with many several travel companies that optimize for price based on combinations. Unsurprisingly, this topic was also a source of hallway discussions here.
Sorting is an algorithm family I rarely had an excuse to implement or needed to use in-depth. It's interesting to understand the different types of ways to sort, from bubble sort, insertion sort, merge sort, selection sort and - the most complex one - quicksort. Still, I found that there is rarely a reason to implement any of this, especially when you don't need to write sort functions as part of a library.
At Skype, I got to exercise a bit on this knowledge, though. One of the other engineers decided to implement an insertion sort for listing contacts. In 2013, when Skype connected to the network, contacts would arrive in bursts, and it would take some time for all the contacts to arrive. So this engineer thought it's more performant to build the contact list organized by name, using insertion sort. We had a back-and-forth on this, over why not just use the default sort algorithm. In the end, it was more work to properly test the implementation, and to benchmark it. I personally didn't see much point in doing so: but we were in the stage of the project that we had the time.
There are definitely some real-world use cases where efficient sorting matters, and having control over what type of sorting you use, based on the data, can make a difference. Insertion sort can be useful when streaming realtime data in large chunks and building realtime visualization for these data sources. Merge sort can work well with divide-and-conquer approaches if it comes to large amounts of data stored on different nodes. I've not worked with these, so I'll still mark sorting algorithms as something with little day-to-day use, beyond the appreciation of the different approaches.
The most frequent data structure I've used regularly was hashtables and the hashing function. It's such a handy tool from counting, to detecting duplications, to caching, all the way to distributed systems use cases like sharding. After arrays, it's easily the most common data structure I've used on countless occasions. Almost all languages come with this data structure, and it's simple to implement if you'd need it.
The stack data structure will be very familiar to anyone who has debugged a language that has a stack trace. As a data structure, I've had a few problems to use it for, but debugging and performance profiling makes me intricately familiar with it. It's also an obvious choice when traversing a tree depth-first.
I rarely had to choose queues as data structures for my code, but I came across it plenty of times in codebases, code popping, or pushing values in it. A common use case is implementing BFS traversing of trees, where the queue data structure lends itself. You can also use queues for a variety of other use cases. I once read code scheduling jobs that made good use of priority queues, running the shortest jobs first, using the Python heap queue algorithm.
b1e95dc632