编程知识 cdmana.com

Remember the troubleshooting of an Ubuntu kernel upgrade

The article links

Introduction

There are a number of under the project ubuntu 18.04 The server is in AWS On , Because of security issues , You need to remove the kernel from 5.3.0 Upgrade to 5.4.0.

Upgrade to the test environment for the first time, and test both ubuntu 18.04 Version of The kernel is also 5.3.0. The first upgrade went well . Software update , Then the kernel is upgraded separately . When a reboot is needed, a problem occurs .

Handling problems and solving ideas

problem 1

Unable to mount disk

The first problem you encounter

Solutions :

Upgrading the kernel results in boot The space is getting smaller and smaller , Then it leads to failure to boot into the system . Because I've met before boot When the space is full . But it was in kvm Of vm in , Can pass VNC Link repair . This is in aws What to do with it ?

resolvent :

At first, I chose to unmount the root disk of the server . Then mount to other servers in the same zone , Make repairs . Because of the problem of disk format , Never mount , To avoid wasting time , The root disk can only be expanded by snapshot recovery .

The reply was restored in the form of snapshot , In the process of snapshot recovery, the method of expanding the root disk really runs the server .

Then try to upgrade the kernel ....

problem 2

Kernel upgrade database dependency error ?

The details are as follows :

cnsre Operation and maintenance blog |Linux The system operational | Automatic operation and maintenance | Cloud computing | Operation and maintenance monitoring

Solutions :

This problem , I really have no idea . For a long time , None of them solved the problem . I also hope those who have ideas can be guided .

resolvent :

In order to quickly solve the problem of kernel upgrade , I will mysql And related dependencies have been unloaded .

problem 3

Failed to restart after upgrading ?

This problem is also the biggest problem , The most obvious expression is . There is no error reported during the upgrade , But after the upgrade, you need to restart , When the server is restarted, it cannot enter the operating system .

It's early morning 4 It's more than two o'clock , Already confused . Then restore the server to what it was before upgrading the kernel . I'm going to start the snapshot tomorrow for reproduction .

cnsre Operation and maintenance blog |Linux The system operational | Automatic operation and maintenance | Cloud computing | Operation and maintenance monitoring

Solutions :

Mount failed again ? How can you encounter mount failure ? Finally, it was found that the configuration of restart auto mount disk was not used according to the official instructions UUID The configuration enables the mount letter . Thus, the system will detect the error in the process of detecting the disk . Unable to enter the system normally .

resolvent :

If it's a physical machine , Or if you can control and guide in other ways, you can repair . But how to repair the virtual machine ? I can only repair the disk

There are two ways to access disk volumes on virtual machines

Method 1: Use EC2 Console

( Excerpt from AWS file )

If you are Linux To enable the EC2 Serial console , You can use it to troubleshoot Supported are based on Nitro The instance type of problem . The serial console can help you troubleshoot startup problems 、 Network configuration and SSH Configuration problem . The serial console can connect to your instance without a network connection . You can use Amazon EC2 Console or AWS Command line interface (AWS CLI) Access the serial console .

Before using the serial console , Please grant access to the serial console at the account level . then , establish AWS Identity and Access Management (IAM) Strategy , Grant to IAM User's access rights . Besides , Each instance using the serial console must contain at least one password based user . If your instance is inaccessible , And access to the serial console has not been configured , Please follow the method 2 Operate as described in . About Linux To configure EC2 Serial console information , see also The configuration is right EC2 Access to the serial console .

Be careful : If it's running AWS CLI Error encountered in command , Please make sure you are using the latest version of AWS CLI.

Method 2: Mount to other instances

Create a temporary rescue instance , Then put your Amazon Elastic Block Store (Amazon EBS) Remount the volume to the rescue instance . From this rescue instance , You can use GRUB Configured to boot using the previous kernel .

** Important note :** Do not perform this operation on instances supported by the instance store . Because this recovery method needs to first stop and then restart the instance , Any data on this instance will be lost . For more information , see also Determine the root device type of the instance .

  • Create... For the root volume EBS snapshot . For more information , see also establish Amazon EBS snapshot .
  • open Amazon EC2 Console .
  • Be careful : Please make sure you are in the right area .
  • Select... From the navigation pane example , Then select the damaged instance .
  • choice Instance State( Instance status )、Stop Instance( Stop instance ), And then choose Stop( stop it ).
  • stay **Storage( Storage )** Tab **Block devices( Block storage device )** Next , by /dev/sda1 or /dev/xvda choice Volume ID( volume ID).
  • Choose in turn operation Uncoiled , And then choose yes , Please separate . Write down the available area .
  • Start a rescue in the same zone EC2 example .
  • After starting the rescue instance , Select... From the navigation pane volume , Then select the root volume where the damaged instance has been detached .
  • Choose in turn operation Additional volume .
  • Select rescue instance ID (id-xxxxx), Then set up an unused device . In this example is /dev/sdf.
  • Use SSH Connect to the rescue instance .
  • function lsblk Command to view available disk devices :
lsblk
#  Output is as follows :
xvda    202:0    0   20G  0 disk 
└─xvda1 202:1    0   20G  0 part /
xvdb    202:16   0  100G  0 disk 
xvdf    202:80   0   15G  0 disk 
└─xvdf1 202:81   0   15G  0 part    #  This disk is the root disk of the fault set server 

Look at the disk format

lsblk -f
NAME    FSTYPE LABEL UUID MOUNTPOINT
xvda                                                                  
└─xvda1 ext4 cloudimg-rootfs d32458a7-7f4c-415f-9a66-b579f14fb82d /
xvdb    ext4 eb0e325a-471c-4a99-a9be-a3ee296c2405 
xvdf                                                                  
└─xvdf1 ext4 cloudimg-rootfs d32458a7-7f4c-415f-9a66-b579f14fb82d 

Mount the disk

sudo -i
mount  /dev/xvdf1 /mnt

Then view the mount Directory , It is found that the root disk has been mounted to mnt Next

cnsre Operation and maintenance blog |Linux The system operational | Automatic operation and maintenance | Cloud computing | Operation and maintenance monitoring

View the configuration file

ubuntu@ip-10-0-20-27:~$  cat /etc/fstab 
LABEL=cloudimg-rootfs   /        ext4   defaults,discard        0 0
/dev/nvme0n1        /data   ext4    defaults        0       0

View the attached documents on the official website as follows :

Automatically mount additional volumes after restart

( Excerpt from AWS Official documents )

Additional... Should be attached every time the system restarts EBS volume , Can be found in /etc/fstab Add an entry for the device in the file .

You can go to /dev/xvdf Device name used in ( Such as /etc/fstab), But it is suggested to use the equipment instead 128 Bit universal unique identifier (UUID). The device name can be changed , but UUID Will be retained throughout the lifetime of the partition . By using UUID, You can reduce the chance that the system will not start after hardware reconfiguration . For more information , see also distinguish EBS equipment .

Automatically attach additional volumes after restart

  1. ( Optional ) establish /etc/fstab Backup of files , In order to use when accidentally damaging or deleting this file during editing .

    [ec2-user ~]$ sudo cp /etc/fstab /etc/fstab.orig
    
  2. Use blkid Command to find the name of the device UUID. Write down the name of the device you want to mount after reboot UUID. You will need it in the next step .

    for example , The following command shows that two devices are attached to the instance , And shows the... Of the two devices UUID.

    [ec2-user ~]$ sudo blkid
    /dev/xvda1: LABEL="/" UUID="ca774df7-756d-4261-a3f1-76038323e572" TYPE="xfs" PARTLABEL="Linux" PARTUUID="02dcd367-e87c-4f2e-9a72-a3cf8f299c10"
    /dev/xvdf: UUID="aebf131c-6957-451e-8d34-ec978d9581ae" TYPE="xfs"
    

    about Ubuntu 18.04, Please use lsblk command .

    [ec2-user ~]$ sudo lsblk -o +UUID
    
  3. Use any text editor ( Such as /etc/fstab and nano) open vim file .

    [ec2-user ~]$ sudo vim /etc/fstab
    
  4. Add the following entry to /etc/fstab To mount the device at the specified mount point . These fields are blkid( Or for Ubuntu 18.04 Of lsblk) Back to UUID value 、 Mount point 、 File system and recommended file system mount options . More information about required fields , Please run man fstab To open fstab manual .

    In the following example , We will UUID by aebf131c-6957-451e-8d34-ec978d9581ae Mount your device to the mount point /data, And then we use xfs file system . We also use defaults and nofail sign . We specify 0 To prevent the file system from being dumped , And we specify 2 To indicate that it is a non root device .

    UUID=aebf131c-6957-451e-8d34-ec978d9581ae  /data  xfs  defaults,nofail  0  2
    

    Be careful

    If you want to start the instance without attaching this volume ( for example , After moving the volume to another instance ),nofail The attach option allows the instance to start even if an error occurs during volume attach .Debian derivatives ( Including earlier than 16.04 Of Ubuntu edition ) You must also add nobootwait Mount options .

  5. To check if the entry is valid , Please be there. /etc/fstab Run the following command in to uninstall the device , Then mount all file systems . If no error occurs , shows /etc/fstab The file is OK , Your file system will mount automatically after reboot .

    [ec2-user ~]$ sudo umount /data
    [ec2-user ~]$ sudo mount -a
    

    If you receive an error message , Please fix the error in the file .

    Warning

    /etc/fstab An error in the file may indicate that the system cannot start . Do not close /etc/fstab There is an incorrect system in the file .

    If you are not sure how to correct /etc/fstab And you created a backup file in the first step of this process , You can use the following command to restore from your backup file .

    [ec2-user ~]$ sudo mv /etc/fstab.orig /etc/fstab
    

Check the modification date and time

cnsre Operation and maintenance blog |Linux The system operational | Automatic operation and maintenance | Cloud computing | Operation and maintenance monitoring

All the problems have been solved . Next, continue to upgrade the kernel .

sudo apt-get install linux-image-5.4.0-1055-aws

cnsre Operation and maintenance blog |Linux The system operational | Automatic operation and maintenance | Cloud computing | Operation and maintenance monitoring

Wait for restart to view

cnsre Operation and maintenance blog |Linux The system operational | Automatic operation and maintenance | Cloud computing | Operation and maintenance monitoring

It's a success ...

The problem summary

  • problem 1 Updating the kernel causes the boot partition to fill up .
  • Optimize stay ubuntu When updating the kernel patch software, you should pay attention to boot、root The capacity of the partition . To avoid the failure to boot into the system after restart .
  • problem 2 Update download software , Tips were encountered while processing
  • Optimize After the test, it is found that this will happen when updating and downloading any software , It's not settled yet .
  • problem 3 Disk boot auto mount configuration problem .
  • Optimize In the future, we need to strictly follow AWS Official documents for operational deployment , Lest something like this happen again .

The article links

版权声明
本文为[Cnsre operation and maintenance]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/10/20211002145418051p.html

Scroll to Top