Let’s start our exploration with the first step of any Kubernetes cluster’s lifecycle – bootstrapping. At this stage, a cluster admin is expected to provide a number of parameters one of which will be called service-cidr
(or something similar depending on the orchestrator) which gets mapped to a service-cluster-ip-range
argument of the kube-apiserver
.
For the sake of simplicity we’ll assume kubeadm
is used to orchestrate a cluster.
An Orchestrator will suggest a default value for this range (e.g. 10.96.0.0/12
) which most of the times is safe to use. As we’ll see later, this range is completely “virtual”, i.e. does not need to have any coordination with the underlying network and can be re-used between clusters (one notable exception being this Calico feature). The only constraints for this value are:
Once a Kubernetes cluster has been bootstrapped, every new ClusterIP
service type will get a unique IP allocated from this range, for example:
$ kubectl create svc clusterip test --tcp=80 && kubectl get svc test
service/test created
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
test ClusterIP 10.96.37.70 <none> 80/TCP 0s
The first IP from the Service CIDR range is reserved and always assigned to a special kubernetes
service. See this explanation for more details.
Inside the kube-controller-manager
’s reconciliation loop, it builds an internal representation for each Service which includes a list of all associated Endpoints. From then on, both Service
and Endpoints
resources co-exist, with the former being the user-facing, aggregated view of a load-balancer and the latter being the detailed, low-level set of IP and port details that will be programmed in the dataplane. There are two ways to compile a list of Endpoints:
All Endpoints are stored in an Endpoints
resource that bears the same name as its parent Service. Below is an example of how it might look for the kubernetes
service:
apiVersion: v1
kind: Endpoints
metadata:
labels:
endpointslice.kubernetes.io/skip-mirror: "true"
name: kubernetes
namespace: default
subsets:
- addresses:
- ip: 172.18.0.4
ports:
- name: https
port: 6443
protocol: TCP
Under the hood Endpoints are implemented as a set of slices; this will be covered in the Optimisations sections.
It is worth noting that the DNS Spec, mentioned briefly in the previous chapter, also defines the behaviour for the ClusterIP
type services. Specifically, the following 3 query types must be supported:
metadata.name
) in the same namespace or <serviceName>.<ns>.svc.<zone>
in a different namespace.ClusterIP
.The Kubernetes’ kube-controller-manager
is constantly collecting, processing and updating all Endpoints and Service resources, however nothing is being done with this yet. Ultimate consumers of this information are a set of node-local agents (controllers) that will use it to program their local dataplane. Most of these node-local agents are using
client-go library to synchronize and process updates coming from the API server, which means they will all share the following behaviour:
List
operation) and observed for the remainder of the their lifecycle (via Watch
operation).