Proposal: Storage API for Aggregated API Servers

270 views
Skip to first unread message

Marko Mudrinić

unread,
Apr 24, 2018, 3:24:29 AM4/24/18
to K8s API Machinery SIG
Hello,

I'm Marko Mudrinić, one of GSoC students, who will be working on the Storage API for Aggregated API Servers project.
This posts explains what we're trying to accomplish and how we see it can look like. We would love to hear your opinions, suggestions and feedback on this topic!

Abstract

Kubernetes offers two ways to extend the core API, by using the CustomResourceDefinitons or by setting up an aggregated API server. This ensures users don’t need to modify the core API in order to add the features needed for their workflow, which later ensures the more stable and secure core API.

One missing part is how to efficiently store data used by aggregated API servers. This project implements a Storage API, with a main goal to share the cluster’s main etcd server with the Aggregated API Servers, allowing it to use cluster’s main etcd just like it would use it’s own etcd server.

Problems we're trying to solve

Currently, one of the biggest problem that's blocking utilizing and implementing aggregated API servers is how to store data used by API servers. Convenient direct access to the Kubernetes storageetcd is not possible, and to access etcd from API server, you would need to obtain:
  • Network access from the API server to etcd
  • Certificates for accessing etcd
This is not always possible, for example, if you're using Kubernetes as a Service solutions (e.g. GKE), you can acquire access to etcd.

Beside accessibility, this also represents a big security problem. Even in case we obtain direct network access and certificates, the aggregated API server would have full access to etcd, allowing the aggregated API server to access and/or modify all Kuberentes' data. For example, this is especially big problem if you want to use third-party aggregated API servers.

Proposal for solving the problem

The solution for this problem is to expose an API for accessing etcd from the aggregated API servers.
By proposal, aggregated API servers would send their requests to kube-apiserver, which would then forward it to the new API proposed by this document.

By exposing an API, we can control what data the aggregated API server can access, which resolves the security problems mentioned above. To further improve security of data in etcd, this project is going to utilize etcd namespaces to additional isolate API server's data from Kubernetes data. Each aggregated API server would be able to only access etcd namespaces that's given to that API server. Access to other namespaces and/or Kubernetes data is not possible. To additionally ensure this, we're going to start etcd gRPC-proxy from the new Storage API to the API Server's etcd namespace.

Beside resolving security problem, the project resolves configuration problems. To better understand how it resolves the configuration problem, let's take a look at two possible ways of exposing this API:
  • By providing it as a standalone API located between kube-apiserver and etcd. As it's located between kube-apiserver and etcd, it's easier to ensure and maintain direct network access between them. Once provided with network access and certificates (when initializing the new Storage API), it can be used by unlimited number for aggregated API server, without need for additional configuration.
  • By integrating it into kube-apiserver. This is a zero configuration option, because by integrating it into kube-apiserver, the new Storage API would already have network access and certificates for etcd through the delegation chain.
The first step would be creating a standalone API, then throughout the project we will working on implementing it into kube-apiserver if community agrees.
This project also resolves backup problems, as backup of your etcd cluster would include aggregated API sever's data as well. 

Project Goals

The goal of this project is to provide API as described above.
Beside using the cluster's main etcd cluster, we want to allow operators to use this API with standalone etcd clusters as well. This is especially important if the cluster's main etcd is already under the load, and is not capable of handling aggregated API servers, you can still use the same API like you would do when you use cluster's main etcd.

For example, this can be very useful for cloud providers, as they can use this API to provide etcd instances for their customers to be used by aggregated API server.

If the community agrees, we're going to integrate this API into kube-apiserver, to easier ensure access to etcd for aggregated API servers.

One of the final goals, is to allow installing aggregated API servers using tools such as "helm".

Project Non-Goals

The non-goals of this project includes allowing operators to place quotas and throttle access to etcd from aggregated API server, to ensure Kubernetes can always work as expected.

General Information about the project

Mentors: David Eads (deads2k), Stefan Schimanski (sttts).

Daniel Smith

unread,
Apr 24, 2018, 1:23:07 PM4/24/18
to mudrin...@gmail.com, K8s API Machinery SIG
Please take a look at my laundry list here: https://docs.google.com/document/d/1i0xzRFB-uGLmLYueLMBTpHrOot9ScFxpkkcVcZHVbyA/edit#heading=h.fp4auewgnkh0

I'm not sure how I feel about serving etcd's API from kube-apiserver. I think I'd want to do that from a separate component. (I'd want the regular apiserver to be namespaced, as well, for example.)

--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To post to this group, send email to kubernetes-sig...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/dff22d7e-42dc-40a6-a6e1-027c16aaef45%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Eads

unread,
Apr 24, 2018, 2:19:45 PM4/24/18
to Daniel Smith, Marko Mudrinić, K8s API Machinery SIG
Since this exposes the actual etcd API, it doesn't introduce a new storage layer.  It fulfills the requirement of "we only operate against etcd" by simply exposing a controlled etcd endpoint and doesn't need to attempt to improve the storage layer.

It is already possible for someone to namespace the etcd of their regular apiserver using a the existing primitives and pointing their kube-apiserver to the namespaced proxy they configure.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-machinery+unsub...@googlegroups.com.
To post to this group, send email to kubernetes-sig-api-machinery@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-machinery+unsub...@googlegroups.com.
To post to this group, send email to kubernetes-sig-api-machinery@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_J3badjr17s9%2B0b0ZDdwqOr5vtCNgyC-g-U-G1rFu4J9OMhQ%40mail.gmail.com.

Marko Mudrinić

unread,
Apr 25, 2018, 7:51:57 AM4/25/18
to K8s API Machinery SIG
There're several pros and cons for integrating API into kube-apiserver.
One of the biggest pros is configuration: No need to supply certificates because it would get them from the delegation chain. Similar for the network access, as the API is part of kube-apiserver, it would have network access to the etcd.

The biggest cons are related to the security, but as long as we somewhat limit it, it could be good. For example, if we allow API only to access etcd namespaces, and not anything outside of namespaces, the Kubernetes data would be safe. Also, by implementing quotas and throttling, we can make sure API server can never use more resources than allowed.

Of course, the API is supposed to work as a standalone component, and that's how we are going to implement it in the beginning. Integrating into kube-apiserver can come once we have API that can work in the standalone mode and only if we agree it's a good (or bad) idea.

Brian Grant

unread,
Apr 25, 2018, 1:37:51 PM4/25/18
to David Eads, Daniel Smith, mudrin...@gmail.com, K8s API Machinery SIG, Clayton Coleman, Eric Tune
On Tue, Apr 24, 2018 at 11:19 AM David Eads <de...@redhat.com> wrote:
Since this exposes the actual etcd API, it doesn't introduce a new storage layer. 

Yes, it does.

There are 2 interfaces that are relevant.
1. The interface of the "master" apiserver and the storage backend. We agreed not to change that in the foreseeable future: https://github.com/kubernetes/kubernetes/issues/1957
2. Storage for aggregated API servers. We agreed they need to run their own storage. Our recommendation was to use etcd: https://github.com/kubernetes/kubernetes/issues/46351

The intent of both was to not expose ANY API that we'd then be locked into.

It's also grpc, and so far we're only using grpc for cases that meet the following requirements:
  • The API is imperative
  • A single in-system client and small number of server implementations
  • Especially for local (same node) calls -- outside the authentication and authorization model for the rest of Kubernetes
This doesn't satisfy those requirements.

So, my response is stronger than Daniel's: No, we should absolutely not do this.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To post to this group, send email to kubernetes-sig...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To post to this group, send email to kubernetes-sig...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To post to this group, send email to kubernetes-sig...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAFS1MjKsadODbCa0Jj6XkB2zKvJcmEi8VZsRWT9DdDLhCPRdOA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages