I have 3 nodes, each running RKE on Ubuntu 18.04. Everything was, disconcertingly, far easier to to set up than I was expecting. Node layout:
Node 1 - Master Node
Node 2 - All roles
Node 3 - Worker Node
So first quick test - deploy a NGINX container using Nodeport setup to ensure all nodes can route to pod/container. Set up with one instance and increase that instance to 2 to test the creation of scaling. All works perfectly. Now I need to allow ingress into my 2 nginx containers. I set up an ingress point under the Load balancing section. I give it a hostname that I plan to add to my onsite DNS server on my domain. I point it at the nginx workload on port 80 and save it all. No problems.
Download ○○○ https://t.co/qimyrgiTEx
To begin with we need to fix the naming.. no-one has ever deployed Kubernetes on bare-metal. You deploy Kubernetes on an Operating System, that OS can be installed on bare-metal hardware or virtual hardware etc.. (moving on).
Both of these options are fantastic and provide a solution if you need to move quick (KaaS) or need a degree of flexibility (DIY). Given that these two examples appear to cover most use-cases why should we care?
One other requirement that is starting to become more and more common are edge clusters, which are typically small on-site clusters that will handle local processes in things like stores, offices, warehouses etc.. and will send the results back to central processing once complete. In most use-cases the infrastructure can be 2-3 1U servers to a stack of Raspberry PIs, all driven by things like application needs and physical space and power.
With a bare-metal deployment we usually mean installing software onto hardware with no hypervisor present. In some circumstances this may or may not provide better performance or save both money or reduce complexity.
However, if we take into consideration the size of a standard enterprise class server (TBs of ram, 10s of cores/CPUs) we can start to see that we have a huge amount of computing power restrained to a single instance of both Operating System and Kubernetes use-case (worker/control plane). If we just consider the Kubernetes use-case for control plane nodes (low memory and CPU) then bare-metal servers can immediately lead to hugely under-utilised hardware.
A highly available Kubernetes cluster requires a load-balancer to provide availability to the control plane in the event a node fails and to balance the load into the control plane. In a cloud environment an end-user clicks the [load-balancer] button and * magic * occurs, followed by a virtual IP that an end user will access to provide both HA and load-balancing to nodes underneath it.
If budget allows then as part of the architecture we can use a pair of load balancer appliances. We will require a pair of them to provide redundancy incase the appliance fails or requires maintenance. These appliances will typically provide the capability to create an external virtual IP address that can load balance over a number of physical IP addresses.
In some circumstances the load-balancers may provide an API or capability to integrate their functionality into a Kubernetes cluster making it much easier for applications deployed within the cluster to utilise these hardware load balancers for application redundancy.
The alternative is to use a software based load balancer which are usually simple to deploy. However in order to provide both load-balancing and high-availability then we will have to implement additional software to sit along side the software load-balancers. These two pieces of functionality are:
This functionality provides the capability of having an externally accessible IP address that can move between functioning nodes. This means that users attempting to access a service will use this VIP (virtual IP), which will always be exposed on a functioning node.
Load-balancing provides two pieces of functionality, it provides high-availability by ensuring that traffic is directed to a working node. It also ensures that traffic can be shared between a pool of working nodes ensuring that load is balanced. This provides the capability of having a larger amount of available service capacity than a single host, that can be scaled up by increasing nodes in the pool.
With both of these pieces of functionality in place we have a single virtual IP that will always direct us to a working load-balancer instance, which in turn will load-balance our access to the network service we want to access.
This section provides design decisions that need to be considered when deploying a highly-available Kubernetes cluster on bare-metal hardware and without cloud services. Using some of these design decisions can allow you to be both more efficient in the use of modern hardware and provide a lot of the same sorts of service that people come to expect from a cloud environment!
This seems a bit like a cheat, but given the reasonably small requirements for the control-plane components it does make good sense to run the control-plane nodes as virtualised machines (with the adequate resources guaranteed). Regardless of hypervisor or vmm (virtual machine manager) typically a small amount of overhead is required for the emulation of physical hardware along with minuscule performance overheads on I/O. However the benefit of freeing the remaining capacity to be used for other use-cases hugely outweighs any tiny performance or virtualisation inefficiencies.
In a production environment the control-plane nodes should only be running the control-plane components. This means that anything application specific is only ran on the worker nodes. The main reason that this is a recommendation or usually a best-practice is mainly down to a few key reasons:
There are a number of options available to us that would allow some workloads to be safely ran next to the control-plane components. All of these would involve modifying the kubelet configuration on the nodes that will be running the kubernetes management components, along with the manifests for the management pieces.
In order for these components to be secured by the CPU Manager we will need to modify their Spec so that they are given the Guaranteed QoS class. We can find the manifests for the control-plane components under /etc/kubernetes/manifests and with the above configuration enabled we can modify these manifests with configuration that will tie them to resources and ensure their stability.
With some level of protection around the control-plane components we can look into what could make sense to run on this same infrastructure. Both of the above examples should ring-fence resources around processing capacity CPU and application memory. However the control-plane can still be impacted by things like slow I/O, in the case that something else was thrashing the same underlying storage, we could end up in a position where the control-plane components fail or etcd nodes fail due to high latency. A simple solution for this would be to ensure that these two use-cases use different underlying storage, so that neither can impact each other. One other area is system bandwidth, if this additional capacity is used by applications with high bandwidth requirements then it could potentially effect the control-plane components. Again in this scenario consider additional network interfaces that ensure that traffic is completely segregated from the control-plane traffic.
This section is limited to both the networking function of load-balancing and the control-plane for kubernetes. The load-balancing for applications and services that are running within a Kubernetes cluster can be hosted elsewhere and usually is more application focused.
In the event that hardware appliances such as F5s (docs are here) are present then follow the vendor documentation for deploying that particular solution. However in the event we need to roll our own, then we will discuss the architecture decisions and options in this section.
As with the discussion of stacked vs unstacked control plane nodes (etcd on the same nodes), we also have the architectural decision of co-locating the load-balancing components on the same nodes. This first architecture will utilise two systems external to the Kubernetes nodes to create an external load-balancer pair, that under most circumstances would in a similar manner to a load-balancing appliance.
As mentioned in the comments on the example configuration below, typically the first node 01 we need to set the state to MASTER. The means that on startup that this node will be the node be allocated the VIP first. The priority number is used during the keepalived cluster elections to determine who will become the next MASTER and the highest priority wins.
Below are the additions to the /etc/haproxy/haproxy.conf that will be there by default, ensure you back up the original before modifying and then append the configuration below. As mentioned here we need to remember that the frontend will expose itself on port 6443, and it will load-balance to the kubernetes API-servers listening on port 6444.
If we wanted to use the flags mentioned above in the section Kubernetes resource control then we can use kubeadm to print out all of the configuration yaml and we can edit the sections that are identified using the kind: key.
We need to remove the advertiseAddress as it defaults to a ridiculous default (not sure why), and edit the bindPort to 6444 as this is what the API-Server needs to listen on in order to not conflict with the load-balancer.
To test the cluster, we can stop and start the VIP with sudo systemctl stop keepalived and ensure that kubectl get nodes continues to act as expected. Rebooting of nodes will also create the same experiences as having node failures. We should be able to see logs showing that keepalived is moving our VIP to working nodes and ensuring that access always remains into the running cluster.
The above guide details all of the steps required to build a HA Kubernetes cluster that has the load-balancing components co-located on the same nodes as the Kubernetes components. If we wanted to build an external or unstacked load-balancing pair of nodes then the process is very similar and covered in brief below.
bcf7231420