编程知识 cdmana.com

Deep anatomy of k8s pod (3)

Catalog :

Pod The schedule of

Pod Expansion and reduction of capacity of

Pod Rolling upgrade of

  One 、Pod The schedule of

Pod It's just the carrier of the container , It usually needs to be passed through RC、Deployment、DaemonSet、Job Wait for the object to complete Pod Scheduling and automatic control functions of .

1、RC、Deployment Automatic scheduling

RC One of the main functions of is to automatically deploy multiple copies of a container application , And continuously monitor the number of copies , The number of copies specified by the user is always maintained within the cluster .

2、NodeSelector: Directional scheduling

Master Upper Schedule Be responsible for the realization of Pod The schedule of , But I don't know Pod Which node will be scheduled to . It can be done by Node The label of (Label) and Pod Of nodeSelector Properties match , Reach will Pod Schedule to the specified Node On .

(1) First pass kubectl label Command the target Node Label it

kubectl label nodes node-name key=value

  

For example, this way to cnode-2 and cnode-3 New label

 

To see if it is labeled, you can use the following command

kubelct describe nodes node-name

  

 

(2) stay Pod New in definition nodeSelector Settings for ,

apiVersion: v1kind: ReplicationControllermetadata:  name: nodeselectorrc  labels:    name: nodeselectorrcspec:  replicas: 1  template:    metadata:      name: nodeselectorrc      labels:        name: nodeselectorrc    spec:      containers:      - name: nodeselectorrc        image: nginx        imagePullPolicy: IfNotPresent        ports:        - containerPort: 80      nodeSelector:        name: cnode-2

  

 

【 notes 】 If no node has this label , So Pod There will be no scheduling

3、NodeAffinity: Affinity scheduling

Because of NodeSelector Through Node Of label Make an exact match , therefore NodeAffinity Added In,NotIn,Exists、DoesNotExist、Gt、Lt Wait for the operator to choose Node, Can make scheduling more flexible , At the same time NodeAffinity Some information will be added to set the affinity scheduling strategy

(1)RequiredDuringSchedulingIgnoredDuringExecution: Must meet the specified rules to schedule Pod To Node On

(2)PreferredDuringSchedulingIngoredDuringExecution: Emphasize that the preference satisfies the specified rules , The scheduler attempts to Pod Schedule to Node On , But don't force , Multiple priority rules can also set weights , To define the post selection order of execution .

Need to be in Pod Of metadata.annotations Set in NodeAffinity Content of , for example

spec:  affinity:    nodeAffinity:      preferredDuringSchedulingIngoredDuringExecution:      - weight: 1        preference:          matchExpressions:          - key: name            operator: In            values: ["cnode-1","cnode-2"]

  

Above yaml Instruction code means that only Node Of label Contained in the key=name, And the value is not ["cnode-1","cnode-2"] One of , To be Pod Schedule target for , There are also operators In,Exists,DoesNotExist,Gt,Lt.

 

PreferredDuringSchedulingIngoredDuringExecution Use as follows :

spec:  affinity:    nodeAffinity:      preferredDuringSchedulingIngoredDuringExecution:      - weight: 1        preference:          matchExpressions:          - key: name            operator: In            values: ["cnode-1","cnode-2"]

  

4、DaemonSet: Scenario specific scheduling

Used to manage each in the cluster Node Only one copy of Pod A copy of the example item

 

Suitable for the following scenarios :

1、 At every Node On the execution of GlusterFS Store or Ceph Stored daemon Program

2、 Every Node Run a log collector on the , for example fluentd perhaps logstach.

3、 Every Node Run a health program on , Collect this Node Performance data for , for example Prometheus node Exporter

The scheduling strategy depends on RC Similar , You can also use NodeSelector perhaps NodeAffinity To schedule

for example : For each Node Start a nginx

apiVersion: apps/v1kind: DaemonSetmetadata:  name: ds-nginx  labels:    name: ds-nginxspec:  selector:    matchLabels:      name: ds-nginx  template:    metadata:      name: ds-nginx      labels:        name: ds-nginx    spec:      containers:      - name: ds-nginx        image: nginx        imagePullPolicy: IfNotPresent        ports:        - containerPort: 80

  

 

5、Job Batch scheduling

It can be done by Job The resource object defines and starts a batch task , One or more parallel tasks usually start a sequence of tasks , After processing , The whole batch task ends .

According to the implementation of batch tasks , It can be divided into the following modes :

 

1、Job Template Expansion Pattern : One Job Object corresponds to a pending Work Item, There are a few Work item There are several independent Job, Suitable for Work item Less , Every Work item A scene with a large amount of data to be processed

for example : Define a Job Templates ,job.yaml.txt

apiVersion: batch/v1kind: Jobmetadata:  name: process-item-$ITEM  labels:    jobgroup: jobexamplespec:  template:    metadata:      name: jobexample      labels:        jobgroup: jobexample    spec:      containers:      - name: c        image: busybox        imagePullPolicy: IfNotPresent        command: ["sh","-c","echo $Item && sleep 5"]      restartPolicy: Never

  

# Use the following command to generate yaml Archives 
for i in a b c; do cat job.yaml.txt | sed "s/\$ITEM/${i}/" > ./job-$i.yaml; done

  

# Review the implementation 
kubectl get jobs -l jobgroup=jobexample

  

2、Queue with Pod Per Work Item Pattern : Use a task queue to store Work Item, One Job Do it as a consumer Work item, In this mode ,Job It will start N One Pod, Every Pod Corresponding to one WorkItem

3、Queue with Variable Pod Count Pattern : Use a task queue to store Work Item, One Job Do it as a consumer Work item,Job Started Pod The quantity is variable

4、Single Job with Static Work Assignment Pattern : One Job Produce multiple Pod, The task items are assigned statically by the program

Considering the parallel problem of batch processing ,k8s Will job There are three types :

1、Non-parallel Jobs

One Job Just start one Pod, Unless Pod Abnormal , Just Restart the Pod, once Pod Normal end ,Job It's going to end .

2、Parallel Jobs with a fixed completion count

Parallel job Meeting Start multiple Pod, Need to set Pod Argument to .spec.completions It's a positive number , When normal It's over Pod When the quantity reaches this number ,Job End , At the same time, this argument is used to control the parallelism , That is to start several at the same time Job To deal with Work item

3、Parallel Jobs with a work queue

Parallel tasks job Need a separate queue ,work item All in one queue , Can't set job Of .spec.completions Arguments , Now job It has the following characteristics

(1) Every Pod Can independently judge and decide whether there are any task items to store

(2) If a Pod It can end normally , Then Job It's not starting a new Pod

(3) If one Pod Successfully ended , Then there should be no other Pod Still working Situation of , It should all be at the end of 、 The state of withdrawal .

(4) If all Pod It's all over , And at least one of them Pod Successfully ended , Then the whole thing Job Success is considered. End .

in addition ,k8s From 1.12 After the version job Joined the ttl control , When pod After finishing the task , Automatically Pod Shut down , Recycling .

 

 

6、Cronjob: Scheduled tasks

It can be scheduled according to the set timing expression Pod

(1)Cron Job The expression of timing ( And Linux It's basically the same )

Minutes Hours DayOfMonth Month DayOfWeek Year

  

The characters that can appear in each field are as follows .

◎ Minutes: May appear 【,-*/】 This 4 Characters , The effective range is 0~59 The integer of .

◎ Hours: May appear 【,-*/】 This 4 Characters , The effective range is 0~23 The whole of Number .

◎ DayofMonth: May appear 【,-*/?LWC】 This 8 Characters , The effective range is 0~31 The integer of .

◎ Month: May appear 【,-*/】 This 4 Characters , The effective range is 1~12 The integer of or JAN~DEC.

◎ DayofWeek: May appear 【,-*/?LC#】 This 8 Characters , The effective range is 1~7 The integer of or SUN~SAT.1 Sunday ,2 For Monday , And so on

◎ *: Represents any value that matches the field , If be in, Minutes Domain use , It means that the event will be triggered every minute .

◎ /: It means to trigger from the start time , And then trigger it at regular intervals , For example, in Minutes The field is set to 5/20, It means the first one 1 The second trigger is on the 5min When , Next every 20min Trigger once

◎ -: Specify an integer range . for example ,1-4 It means an integer 1、2、3、4.
◎ ,: A separated series of values specifies a list . for example 3, 4, 6, 8 Mark the four specified integers .

For example, it needs to be executed every minute :

*/1 * * * * 

(2) establish Cron Job

Use yaml Archives

apiVersion: batch/v1beta1kind: CronJobmetadata:  name: hellospec:  schedule: "*/1 * * * *"  jobTemplate:    spec:      template:        spec:          containers:          - name: hello            image: busybox            command: ["/bin/bash","-c","date;echo Hello"]          restartPolicy: OnFailure

  

 

7、PodAffinity:Pod Affinity and exclusion scheduling strategy

According to what is being executed on the node Pod The label of the node is not the label of the node to judge and schedule , Requirements for nodes and Pod Two bars Pieces to match . This rule can be described as : If there is a label in the X Of Node One or more eligible Y Of Pod, So Pod It should be ( If it's mutually exclusive , Then it becomes a refusal ) Execution in this Node On .

Here X A node in a cluster 、 Rack 、 Area and so on , Through Kubernetes In the built-in node label key To make an announcement . This key The name of is topologyKey, To express the node belongs to topology Scope .

◎ kubernetes.io/hostname

◎ failure-domain.beta.kubernetes.io/zone

◎ failure-domain.beta.kubernetes.io/region

What's different from nodes is ,Pod Belongs to a namespace , So the conditions are Y Expressing Is one or all of the name spaces Label Selector.

Same as node affinity ,Pod The condition setting of affinity and mutual exclusion is also requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution.

Pod Affinity is defined in PodSpec Of affinity Under the column podAffinity In the subfield .Pod Mutual exclusion is defined at the same level podAntiAffinity In the subfield .

for example :

Reference target Pod

apiVerison: v1kind: Podmetadata:  name: pod-flag  labels:    security: s1    app: nginxspec:  containers:  - name: nginx    image: nginx

  

(1)Pod Affinity scheduling

apiVersion: v1kind: Podmetadata:  name: pod-affinityspec:  affinity:    podAffinity:      requiredDuringSchedulingIgnoredDuringExecution:      - labelSelector:          matchExpressions:          - key: security            operator: In            values: ["s1"]        topologyKey: kubernetes.io/hostname  containers:  - name: nginx    image: nginx

  

After establishing, we will find two Pod On the same node

 

(2)Pod Exclusive scheduling

apiVersion: v1kind: Podmetadata:  name: pod-antiaffinityspec:  affinity:    podAntiAffinity:      requiredDuringSchedulingIgnoredDuringExecution:      - labelSelector:          matchExpressions:          - key: security            operator: In            values: ["s1"]        topologyKey: kubernetes.io/hostname  containers:  - name: nginx    image: nginx

  

You'll find this when you set it up Pod With reference Pod Not on the same node

 

8、Pod Priority Preemption:Pod Priority scheduling

In medium and large clusters , In order to improve the resource utilization of clustering as much as possible , Will adopt a prioritized approach , That is, different types of loads have different priorities , At the same time, the total amount of resources required by all loads in the cluster is allowed to exceed the resources that the cluster can provide , In this case , When there is a shortage of resources , The system can choose to release some unimportant load ( The lowest priority is ), Ensure that the most important load can obtain enough resources for stable execution .

If a preemptive schedule occurs , High priority Pod It's possible to preempt nodes N, And put it in low priority Pod Banish nodes N.

(1) First set up PriorityClasses

apiVersion: v1kind: scheduling.k8s.io/v1beta1metadata:  name: high-priorityvalue: 10000globalDefault: false

  

(2) And then in Pod I quote Pod Priorities

apiVersion: v1kind: Podmetadata:  name: nginx-priorityspec:  containers:  - name: nginx    image: nginx  priorityClassName: high-priority

  

Two 、Pod Expansion and reduction of capacity of

1、 Through kubectl scale

Through kubectl scale Expand and shrink , It's actually modifying the copy number of the controller

kubectl scale rc rc-name --replicas= Number of copies 

  

 For example, I will nodeselectorrc Expand to 3

 

It's the same with volume reduction , Just reduce the number of copies

2、Horizontal Pod AutoScale(HPA)

Realization is based on cpu The usage rate is automatically Pod Expand or shrink capacity ,HPA The controller is based on Master Of kube-controller-manager Service start arguments

--horizontal-pod-authscaler-sync-period

  

The defined length of time ( The default is 30s), Periodic monitoring targets pod Of cpu Usage rate , And when the conditions are met, to RC or Deployment Medium Pod Adjust the number of copies , To meet the user-defined average Pod CPU Usage rate

establish HPA You can use kubectl autoscale Command to quickly create or use yaml Configuration file to create , In establishment HPA front , The need already exists RC perhaps Deployment thing , And it has to be defined resources.requests.cpu Resource request value for , If this value is not set , Then heapster Will not be able to collect Pod Of cpu Usage rate

【 notes 】 This requires the installation of heapster To collect resources cpu Usage rate

Command line mode :

kubectl autoscale rc rc-name --min=1 --max=10 --cpu-percent=50

  

yaml The way :

apiVersion: autoscaling/v1kind: HorizontalPodAutoscalermetadata:  name: hpa-namespec:  scaleTargetRef:    apiVersion: v1    kind: ReplicationController    name: rc-name  minReplicas: 1  maxReplicas: 10  targetCPUUtilizationPercentage: 50

  

Both of the above ways are in cpu The usage rate reaches 50% Expand and shrink the volume , The minimum number of copies is 1, The maximum number of copies is 10

3、 ... and 、Pod Rolling upgrade of

Rolling upgrade through kubectl rolling-update Command complete , This command creates a RC, And then automatically control the old RC Medium Pod The number of copies is gradually reduced to 0, At the same time, the new RC Medium Pod The number of copies from 0 Gradually increase to the target value , In the end Pod Upgrade of . Old and new RC Need to be in the same Namespace Next .

 

1、 Use yaml Upgrade

Like one nginx Of v1 edition :nginx-v1.yaml

apiVersion: v1kind: ReplicationControllermetadata:  name: roll-v1  labels:    name: roll-v1    version: v1spec:  replicas: 3  selector:    name: roll-v1    version: v1  template:    metadata:      name: roll-v1      labels:        name: roll-v1        version: v1    spec:      containers:      - name: roll-v1        image: nginx        imagePullPolicy: IfNotPresent        ports:        - containerPort: 80

  

After it is established, it is as follows :

 

It needs to be upgraded to v2 edition :nginx-v2.yaml

apiVersion: v1kind: ReplicationControllermetadata:  name: roll-v2  labels:    name: roll-v2    version: v2spec:  replicas: 3  selector:    name: roll-v2    version: v2  template:    metadata:      name: roll-v2      labels:        name: roll-v2        version: v2    spec:      containers:      - name: roll-v2        image: nginx        imagePullPolicy: IfNotPresent        ports:        - containerPort: 8080

  

Use the following command to upgrade :

kubectl rolling-update roll-v1 -f nginx-v2.yaml

 

It will be replaced gradually after execution v1 Version of pod

 

It should be noted that :

1、RC The name cannot be the same as the old RC identical

2、 stay selector At least one of them Label It's different from the old one , To show that it is new RC.( In fact, it must be all label It's not the same )

2、 Upgrade directly by command

You can also use the command line to replace the container image directly

kubectl rolling-update rc-name --image=image-name:version

  

3、 Roll back

When there is a problem with scrolling updates , You can roll back

kubectl rolling-update rc-name --image=image-name:version --rollback

  

===============================

I am a Liusy, A program designer likes .

Welcome to the wechat public account 【Liusy01】, Communicate together Java Technology and fitness , Get more dry goods , Claim Java Advanced dry goods , Get the latest interview materials for big factories , Together to become Java A great god .

Here we are , Another wave of attention

版权声明
本文为[itread01]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224152743236I.html

Scroll to Top