I'm Marko Mudrinić, one of GSoC students, who will be working on the Storage API for Aggregated API Servers project.
This posts explains what we're trying to accomplish and how we see it can look like. We would love to hear your opinions, suggestions and feedback on this topic!
Kubernetes offers two ways to extend the core API, by using the CustomResourceDefinitons or
by setting up an aggregated API server. This ensures users don’t need to modify the core API in
order to add the features needed for their workflow, which later ensures the more stable and
secure core API.
One missing part is how to efficiently store data used by aggregated API servers. This project
implements a Storage API, with a main goal to share the cluster’s main etcd server with the
Aggregated API Servers, allowing it to use cluster’s main etcd just like it would use it’s own etcd
Problems we're trying to solve
Currently, one of the biggest problem that's blocking utilizing and implementing aggregated API servers is how to store data used by API servers. Convenient direct access to the Kubernetes storage—etcd is not possible, and to access etcd from API server, you would need to obtain:
- Network access from the API server to etcd
- Certificates for accessing etcd
This is not always possible, for example, if you're using Kubernetes as a Service solutions (e.g. GKE), you can acquire access to etcd.
Beside accessibility, this also represents a big security problem. Even in case we obtain direct network access and certificates, the aggregated API server would have full access to etcd, allowing the aggregated API server to access and/or modify all Kuberentes' data. For example, this is especially big problem if you want to use third-party aggregated API servers.
Proposal for solving the problem
The solution for this problem is to expose an API for accessing etcd from the aggregated API servers.
By proposal, aggregated API servers would send their requests to kube-apiserver, which would then forward it to the new API proposed by this document.
By exposing an API, we can control what data the aggregated API server can access, which resolves the security problems mentioned above. To further improve security of data in etcd, this project is going to utilize etcd namespaces
to additional isolate API server's data from Kubernetes data. Each aggregated API server would be able to only access etcd namespaces that's given to that API server. Access to other namespaces and/or Kubernetes data is not possible. To additionally ensure this, we're going to start etcd gRPC-proxy
from the new Storage API to the API Server's etcd namespace.
Beside resolving security problem, the project resolves configuration problems. To better understand how it resolves the configuration problem, let's take a look at two possible ways of exposing this API:
- By providing it as a standalone API located between kube-apiserver and etcd. As it's located between kube-apiserver and etcd, it's easier to ensure and maintain direct network access between them. Once provided with network access and certificates (when initializing the new Storage API), it can be used by unlimited number for aggregated API server, without need for additional configuration.
- By integrating it into kube-apiserver. This is a zero configuration option, because by integrating it into kube-apiserver, the new Storage API would already have network access and certificates for etcd through the delegation chain.
The first step would be creating a standalone API, then throughout the project we will working on implementing it into kube-apiserver if community agrees.
This project also resolves backup problems, as backup of your etcd cluster would include aggregated API sever's data as well.
The goal of this project is to provide API as described above.
Beside using the cluster's main etcd cluster, we want to allow operators to use this API with standalone etcd clusters as well. This is especially important if the cluster's main etcd is already under the load, and is not capable of handling aggregated API servers, you can still use the same API like you would do when you use cluster's main etcd.
For example, this can be very useful for cloud providers, as they can use this API to provide etcd instances for their customers to be used by aggregated API server.
If the community agrees, we're going to integrate this API into kube-apiserver, to easier ensure access to etcd for aggregated API servers.
One of the final goals, is to allow installing aggregated API servers using tools such as "helm".
The non-goals of this project includes allowing operators to place quotas and throttle access to etcd from aggregated API server, to ensure Kubernetes can always work as expected.
General Information about the project
Mentors: David Eads (deads2k), Stefan Schimanski (sttts).
I added this project for discussing on the next SIG-API-Machinery call (04/25).