Flannel is another example of a dual CNI plugin design:
Connectivity is taken care of by the flannel
binary. This binary is a metaplugin
– a plugin that wraps other reference CNI plugins. In the simplest case, it generates a bridge
plugin configuration and “delegates” the connectivity setup to it.
Reachability is taken care of by the Daemonset running flanneld
. Here’s an approximate sequence of actions of what happens when the daemon starts:
PodCIDR
and ClusterCIDR
. This information is saved in the /run/flannel/subnet.env
and is used by the flannel metaplugin to generate the host-local
IPAM configuration.flannel.1
and updates the Kubernetes Node object with its MAC address (along with its own Node IP).This plugin assumes that daemons have a way to exchange information (e.g. VXLAN MAC). Previously, this required a separate database (hosted etcd) which was considered a big disadvantage. The new version of the plugin uses Kubernetes API to store that information in annotations of a Node API object.
The fully converged IP and MAC tables will look like this:
Assuming that the lab is already setup, flannel can be enabled with the following 3 commands:
make flannel
Check that the flannel daemonset has reached the READY
state:
$ kubectl -n kube-system get daemonset -l app=flannel
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-flannel-ds 3 3 3 3 3 <none> 90s
Now we need to “kick” all Pods to restart and pick up the new CNI plugin:
make nuke-all-pods
Here’s how the information from the diagram can be validated (using worker2
as an example):
$ NODE=k8s-guide-worker2 make tshoot
bash-5.0# ip route get 1.1
1.1.0.0 via 10.244.2.1 dev eth0 src 10.244.2.6 uid 0
$ docker exec -it k8s-guide-worker2 ip route
default via 172.18.0.1 dev eth0
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 dev cni0 proto kernel scope link src 10.244.2.1
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2
$ docker exec -it k8s-guide-worker2 ip neigh | grep PERM
10.244.1.0 dev flannel.1 lladdr ce:0a:4f:22:a4:2a PERMANENT
10.244.0.0 dev flannel.1 lladdr 5a:11:99:ab:8c:22 PERMANENT
$ docker exec -it k8s-guide-worker2 bridge fdb show dev flannel.1
5a:11:99:ab:8c:22 dst 172.18.0.3 self permanent
ce:0a:4f:22:a4:2a dst 172.18.0.4 self permanent
Let’s track what happens when Pod-1 tries to talk to Pod-3.
We’ll assume that the ARP and MAC tables are converged and fully populated.
1. Pod-1 wants to send a packet to 10.244.0.2
. Its network stack looks up the routing table to find the NextHop IP:
$ kubectl exec -it net-tshoot-4sg7g -- ip route get 10.244.0.2
10.244.0.2 via 10.244.1.1 dev eth0 src 10.244.1.6 uid 0
2. The packet reaches the cbr0
bridge in the root network namespace, where the lookup is performed again:
$ docker exec -it k8s-guide-worker ip route get 10.244.0.2
10.244.0.2 via 10.244.0.0 dev flannel.1 src 10.244.1.0 uid 0
3. The NextHop and the outgoing interfaces are set, the ARP table lookup returns the static entry provisioned by the flanneld
:
$ docker exec -it k8s-guide-worker ip neigh get 10.244.0.0 dev flannel.1
10.244.0.0 dev flannel.1 lladdr 5a:11:99:ab:8c:22 PERMANENT
4. Next, the FDB of the VXLAN interface is consulted to find out the destination VTEP IP:
$ docker exec -it k8s-guide-worker bridge fdb | grep 5a:11:99:ab:8c:22
5a:11:99:ab:8c:22 dev flannel.1 dst 172.18.0.3 self permanent
5. The packet is VXLAN-encapsulated and sent to the control-node
where flannel.1
matches the VNI and the VXLAN MAC:
$ docker exec -it k8s-guide-control-plane ip link show flannel.1
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 5a:11:99:ab:8c:22 brd ff:ff:ff:ff:ff:ff
6. The packet gets decapsulated and its original destination IP looked up in the main routing table:
$ docker exec -it k8s-guide-control-plane ip route get 10.244.0.2
10.244.0.2 dev cni0 src 10.244.0.1 uid 0
7. The ARP and bridge tables are then consulted to find the outgoing veth interface:
$ docker exec -it k8s-guide-control-plane ip neigh get 10.244.0.2 dev cni0
10.244.0.2 dev cni0 lladdr 7e:46:23:43:6f:ec REACHABLE
$ docker exec -it k8s-guide-control-plane bridge fdb get 7e:46:23:43:6f:ec br cni0
7e:46:23:43:6f:ec dev vethaabf9eb2 master cni0
8. Finally, the packet arrives in the Pod-3’s network namespace where it gets processed by the local network stack:
$ kubectl exec -it net-tshoot-rkg46 -- ip route get 10.244.0.2
local 10.244.0.2 dev lo src 10.244.0.2 uid 0
Similar to kindnet flanneld
sets up the SNAT rules to enable egress connectivity for the Pods, the only difference is that it does this directly inside the POSTROUTING
chain:
Chain POSTROUTING (policy ACCEPT 327 packets, 20536 bytes)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- * * 10.244.0.0/16 10.244.0.0/16
0 0 MASQUERADE all -- * * 10.244.0.0/16 !224.0.0.0/4 random-fully
0 0 RETURN all -- * * !10.244.0.0/16 10.244.0.0/24
0 0 MASQUERADE all -- * * !10.244.0.0/16 10.244.0.0/16 random-fully
direct routing
mode, which acts by installing static routes for hosts on the same subnet.