Zanzibar Paper

1 view

Skip to first unread message

Ramya Bradbury

unread,

Aug 3, 2024, 4:46:41 PM8/3/24

to trucacstancil

Zanzibar is Google's purpose-built authorization system. It's a centralized authorization database built to take authorization queries from high-traffic apps and return authorization decisions. An instance of Zanzibar hosts a list of permissions and responds to queries from many apps. Google published a paper on their global authorization system that they presented at 2019 USENIX Annual Technical Conference and it has since become a popular resource for developers who are building authorization services.

Google has many high-traffic apps, like Search, Docs, Sheets, and Gmail. Google accounts are shared between those systems, so authorization decisions (that is, what actions a Google account can take) need to be coordinated. These apps operate at huge scales, so constant inter-service communication wasn't practical. Their authorization system needed to handle billions of objects shared by billions of users and needed to return results with very low latency. Also, their system needed to handle filtering questions, like "what documents can this user see?"

Zanzibar limits both user errors and system errors. To quote one of the designers, Lea Kissner, "The semantics in Zanzibar are very carefully designed to try and make it very difficult for you to shoot yourself in the foot." For a resource like a git repository, Zanzibar's API exposes who can see (or edit/delete/act upon) that repository, why they can see it, and how to stop it from being seen.

Zanzibar also limits system errors. Zanzibar is a distributed system, which means it takes time to propagate new permissions. To avoid data staleness, Zanzibar stores permissions in Google's Spanner database. Spanner provides strong consistency guarantees, so Zanzibar never applies old permissions to new content.

Zanzibar uses several tricks to reduce latency. First, it uses several layers of caching. The outermost cache layer is Leopard, an indexing system built to respond quickly to authorization checks. Then, read requests are cached across the servers that store permissions. Also, calls between services inside Zanzibar are cached.

On top of that, Zanzibar relies on some hand-tuning. In any authorization policy, some common permissions are used far more often than others. Zanzibar's team hand-tunes these hot spots, for instance by enabling cache prefetching.

Zanzibar is a centralized source of authorization decisions. That can be a useful approach for two reasons. First, it is a single source of truth. Each of your services can call Zanzibar and get a "yes" or "no" answer in response, and those answers are consistent between services. Second, each of those services calls the same API, which makes it easier to use across many services.

Zanzibar also supports reverse indexing (also known as data filtering). This means that after assigning a user many individual permissions, you can also ask, "what resources does this user have access to?" This is a common authorization request (e.g., for list endpoints). It's also useful for maintaining and debugging access controls.

Zanzibar does not solve for enforcement, it only provides the "decision" part of authorization. You must determine where and when to call Zanzibar in your application. And when your application calls Zanzibar and gets an "access denied" decision, it's up to you to interpret that response and decide what should happen. There's no is_allowed?() call in a Zanzibar-like system.

One of Zanzibar's defining characteristics is data centralization. Implementing a Zanzibar-like model requires centralizing all the data you would ever need into a core database service -- in Google's case, this is Spanner. Google has the resources to do this but not every company can handle this type of tradeoff. We talk through these tradeoffs in our technical overview of Zanzibar, Google Zanzibar for the rest of us.

Finally, Zanzibar is a major technical investment. Building your own Zanzibar takes at least a year of effort from a dedicated team. Airbnb's Himeji (a Zanzibar-alike) took more than a year of engineering work from a dedicated team. Using Zanzibar also takes engineering effort. At Google, the service is supported by a full-time team of engineers, plus several engineers from each service that uses Zanzibar. Most apps that use Zanzibar-like systems require hand-tuning to avoid hot spots.

Engineering teams are increasingly adopting services for core infrastructure components, and this applies to authorization too. There are a number of authorization-as-a-service options available to those who want something like what Google made available to its internal engineers via Zanzibar.

Determining whether online users are authorized to access digital objects is central to preserving privacy. This paper presents the design, implementation, and deployment of Zanzibar, a global system for storing and evaluating access control lists. Zanzibar provides a uniform data model and configuration language for expressing a wide range of access control policies from hundreds of client services at Google, including Calendar, Cloud, Drive, Maps, Photos, and YouTube. Its authorization decisions respect causal ordering of user actions and thus provide external consistency amid changes to access control lists and object contents. Zanzibar scales to trillions of access control lists and millions of authorization requests per second to support services used by billions of people. It has maintained 95th-percentile latency of less than 10 milliseconds and availability of greater than 99.999% over 3 years of production use.

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Google published Zanzibar whitepaper back in 2019, and in a short time it gained attention quickly. In fact some companies like Airbnb and Carta started to shift their legacy authorization structure to Zanzibar style systems.

Additional to shifts from large tech companies, Zanzibar based solutions increased over the time. We're also one of them, building an open source authorization service based on Google Zanzibar.

Zanzibar is Google's global authorization system that handles who can access or modify content across all its services and applications. It's a powerful and unified solution designed to keep things secure, consistent, and scalable. Think of it as the central brain that decides who gets to do what in Google's ecosystem.

Let me break it down for you. Zanzibar uses Access Control Lists (ACLs) to manage permissions. Imagine ACLs as lists of rules that specify who can perform certain actions on different objects. For instance, an ACL might say, "User A can view Document B" or "User C can edit File D".

Whenever you try to access a Google service, Zanzibar quickly checks these ACLs to see if you have the right permissions. This way, it ensures that only the right people can access or change the content, keeping everything secure and consistent across all Google services.

So, now you know what Zanzibar is. But why did Google need such a system? In the next section, we'll discuss the challenges Google faced in managing permissions across its many services and how Zanzibar solves these problems.

Flexibility: Zanzibar is like a Swiss Army knife for access control. It supports a wide array of access control policies, catering to both consumer and enterprise applications. This flexibility allows Google to define and extend different access control methods within one unified system, making it easier for applications to work together and manage permissions efficiently.

Low Latency: Speed is crucial. Authorization checks are often in the critical path of user interactions. Zanzibar makes these checks lightning-fast, even during peak times. This is especially important for things like search results, where hundreds of checks might be needed in a split second.

Security: By centralizing and standardizing authorization checks, Zanzibar enhances security across all Google services. It supports complex policies and maintains strict security standards, ensuring that all access control decisions are accurate and secure.

At the heart of Zanzibar is the concept of relational tuples, which are simple statements that define relationships between users and objects. These tuples look like "User X has relation Y to Object Z." For instance, "User 1 owns Document 2" is represented as document:2#owner@user:1.

By centralizing authorization, Zanzibar provides a consistent and secure way to manage permissions across all Google services. This centralization simplifies the process of defining and enforcing access control policies, making it easier to maintain security and consistency.

At the core of Zanzibar is the Data Model, which uses relational tuples to represent permissions. A relational tuple is a simple statement that defines a relationship between users and objects. These tuples are stored in a centralized database, making it easy to quickly look up permissions.

Zanzibar provides several APIs that allow applications to interact with the authorization system. These APIs enable reading and writing relational tuples, checking permissions, and monitoring changes.

Implementing a system like Zanzibar involves several high-level steps to ensure that it is scalable, consistent, and efficient. While we won't dive into the actual coding here, I'll provide an overview of what it takes to build such a system. For a more detailed example, check out our blog post Exploring Google Zanzibar: A Demonstration of its Basics.

This post is a somewhat lengthy TL;DR version of the original Zanzibar Paper. If you want to learn more about Zanzibar, we've also written a post where we implemented the basics of Google Zanzibar.