For various reasons , It often happens that the work node is disconnected from the master node . under these circumstances , In fact, there are many problems , for example , Whether the master node has deleted the... Running on the node that cannot be connected Pod？Kubernetes How the controller behaves ？Pod Whether to continue running on the work node ？ In short , We want to know when nodes become inaccessible ,Kubernetes What does system behavior look like ？
Definition ： stay Kubernetes in , Nodes that cannot be connected are called isolated nodes （partitioned node）.
In order to know more about , Let's create an isolated node case and understand its behavior .
The sample cluster has a master node （master node） and 3 Work nodes （worker node）. Here we create a system with 2 Copies of Nginx Deployment. These copies run on different nodes ：kind-worker2 and kind-worker3. chart 1 Shows the state of the sample cluster ：
chart 1： The state of the sample cluster
Create an isolated node
A simple way to create an isolated node is to delete it IP Address , namely kind-worker2. chart 2 Shows the necessary steps ：
chart 2： Create an isolated node
Kubernetes How the system behaves ？
Work node （kind-worker2） Set to NotReady state , but Pod It's still running , This is because the node is responsible for kube-controller-manager Of node-controller Partially waiting pod-eviction-timeout, This is to make sure that in Pod The node is not accessible until it is deleted .
pod-eviction-timeout The default setting is 5 minute , Can be in kube-controller-manager Modify during startup .
stay pod-eviction-timeout（ The example is 5 minute ） after ,node-controller Will run on isolated nodes pod The schedule is Termination state .kube-controller-manager Of Deployment Controller The section begins to create new replicas and schedules on different nodes . In the example , We are kind-worker A node is created Nginx copy . chart 3 It shows Kubernetes All state changes on the system ：
chart 3： The situation on the master node
Isolate the... Running on the work node Pod How will ？
Enter the isolated work node , Let's see what happened . From the picture 4 in , We can observe that Pod It's still running , This is because API server Can't isolate node from Kubelet Write to delete Pod. Again ,Kubelet You can't control what's running Pod.
chart 4：Pod Continue running on isolated work nodes
Once the isolated node joins the cluster ,Pod You can delete .
When the node is disconnected , A lot of things happen behind the scenes , Here is a brief summary ：
- When a node becomes inaccessible , The master node sets the node to “NotReady” state .
- The master node will wait before performing any operation pod-eviction-timeout. As kube-controller-manager Part of the guiding process , By default ,pod-eviction-timeout Parameter set to 5 minute .
- stay pod-eviction-timeout After time , The isolated node of the master node Pod be in “Terminating” state , And will create... On different nodes Pod New examples .
- these Pod Will continue to run on isolated nodes .
Link to the original text ：https://medium.com/tailwinds-...