编程人 cdmana.com

[paper speed] super resolution series: image super resolution algorithm based on frequency separation two papers iccvw 2019 and cvprw 2020

Catalog

Frequency Separation for Real-World Super-Resolution

Abstract

Method

Guided Frequency Separation Network for Real-World Super-Resolution

Abstract

Method


 

Frequency Separation for Real-World Super-Resolution

[ICCVW 2019] [GitHub]

Abstract

Most of the recent literature on image super-resolution (SR) assumes the availability of training data in the form of paired low resolution (LR) and high resolution (HR) images or the knowledge of the downgrading operator (usually bicubic downscaling). While the proposed methods perform well on standard benchmarks, they often fail to produce convincing results in real-world settings. This is because real-world images can be subject to corruptions such as sensor noise, which are severely altered by bicubic downscaling. Therefore, the models never see a real-world image during training, which limits their generalization capabilities. Moreover, it is cumbersome to collect paired LR and HR images in the same source domain.

Raise questions : The degradation of synthetic low resolution data is not consistent with that of real low resolution data .

Recently on image super resolution (SR) Most of the literatures assume that there are pairs of low resolution (LR) And high resolution (HR) Training data for images , Or suppose you have knowledge of the degradation operator ( It's usually bicubic downscaling ). Although the proposed method performs well on the standard benchmark , But in the real world, it often can't produce convincing results . This is because real-world images may be disturbed by noise from sensors , These disturbances will be seriously changed by bicubic downscaling . therefore , The model will not see the real image during the training process , This limits their generalization ability . Besides , Collect... In the same source domain LR and HR Paired images are cumbersome .

 

To address this problem, we propose DSGAN to introduce natural image characteristics in bicubically downscaled images. It can be trained in an unsupervised fashion on HR images, thereby generating LR images with the same characteristics as the original images. We then use the generated data to train a SR model, which greatly improves its performance on real-world images. Furthermore, we propose to separate the low and high image frequencies and treat them differently during training. Since the low frequencies are preserved by downsampling operations, we only require adversarial training to modify the high frequencies. This idea is applied to our DSGAN model as well as the SR model. We demonstrate the effectiveness of our method in several experiments through quantitative and qualitative analysis. Our solution is the winner of the AIM Challenge on Real World SR at ICCV 2019.

resolvent

1) In this paper, a method of introducing natural image features into bicubic image reduction is proposed DSGAN Algorithm . It can do it in an unsupervised way HR Image training , Thus, the image with the same features as the original image is generated LR Images . Then use the generated data to train a SR Model , This greatly improves its performance in real world images .

2) Besides , In this paper, the low-frequency and high-frequency images are separated during training , And treat them differently . Because the low frequency is saved by down sampling operation , You just need antagonistic training to modify the high frequency . This idea applies to DSGAN Models and SR Model .

experimental result : Through quantitative and qualitative analysis , The validity of this method is verified . The method in 2019 year ICCV On the real SR Of AIM First place in the challenge .

 

Method

The core idea :

step 1. First pair HR The image is down sampled manually , As a result of this process LR And the real LR The degradation process is not the same , So we need to do the manual down sampling Domain migration .

step 2. Use the migrated image to SR Network training , Because the migrated image is closer to the real degraded image LR, So the training is SR The Internet can effectively restore reality LR Super resolution of images .

Here is how to design the network in the above two processes .

  • Domain migration

The domain migration section corresponds to the solutions mentioned in the summary 2). Look at this first Domain migration network :

chart 1

B It is a bicubic downscaling method , The purple field is high pass filter and low pass filter . The red triangle represents the loss function , The orange field represents the neural network .

Note the following :

1. The network is especially like Half cycleGAN. But it's just that Half , So domain migration may not be so smooth , New priors need to be provided .

2. In this paper, we use a priori , It's synthetic LR and Actual LR The difference is the high frequency component , Not the low frequency component . Because the process of down sampling has the greatest impact on the high frequency component .

3. therefore , The biggest feature of this model is   High and low frequency separation The way , Guide the network to realize Domain migration of high frequency components .

 

  • SR Network training

Use the migrated image to SR Network training , The network structure is as follows :

chart 2

Pay attention to the following points :

1.  Because the migrated image is closer to the real degraded image LR, So the training is SR The Internet can effectively restore reality LR Super resolution of images .

2. Here a new technique is used in the discriminator , In other words, the discriminator only judges whether the high frequency components belong to the same domain , The implementation is to pass the input image through a high pass filter , Only the high frequency components of two input images are distinguished .

 

Guided Frequency Separation Network for Real-World Super-Resolution

[CVPRW 2020] [GitHub]

Abstract

Training image pairs are unavailable generally in realworld super-resolution. Although the LR images can be down-scaled from HR images, some real-world characteristics (such as artifacts or sensor noise) have been removed from the degraded images. Therefore, most of state-of-theart super-resolved methods often fail in real-world scenes. In order to address aforementioned problem, we proposed an unsupervised super-resolved solution. The method can be divided into two stages: domain transformation and super-resolution. A color-guided domain mapping network was proposed to alleviate the color shift in domain transformation process. In particular, we proposed the Color Attention Residual Block (CARB) as the basic unit of the domain mapping network. The CARB which can dynamically regulate the parameters is driven by input data. Therefore, the domain mapping network can result in the powerful generalization performance. Moreover, we modified the discriminator of the super-resolution stage so that the network not only keeps the high frequency features, but also maintains the low frequency features. Finally, we constructed an EdgeLoss to improve the texture details. Experimental results show that our solution can achieve a competitive performance on NTIRE 2020 real-world super-resolution challenge.

Raise questions : Training image pairs are usually not available in real world super-resolution . although LR Images can be obtained from HR Zoom out in the image , But some real-world features ( Such as artifact or sensor noise ) Has been removed from degraded images . therefore , Most of the most advanced super-resolution methods often fail in real scenes .

resolvent : This paper presents an unsupervised super solution . The method can be divided into two stages: region transformation and super-resolution . Specific contributions include the following aspects :

1) Put forward a kind of Color guided domain mapping network , It's solved Color shift in the process of domain transformation .

2) Specially , Put forward Pay attention to the rest of the color block (CARB) As the basic unit of domain mapping network . It can adjust parameters dynamically CARB It's driven by input data . therefore , Domain mapping network has powerful generalization performance .

3) Besides , In this paper, the superresolution level is discussed The discriminator has been improved , So that the network retains high frequency characteristics at the same time , It also retains the low frequency characteristics .

4) Last , Constructed a Edge loss to improve texture detail .

experimental result : Experimental results show that , The solution of this paper can be found in 2020 Achieving good performance in the super-resolution challenge of real images .

 

Method

Here are four works for the abstract , Brief introduction respectively .

  • Color guided domain mapping network .

Overall framework :

chart 3

Note the following understanding :

1. This network is especially like Half cycleGAN The Internet , Achieve domain migration , What are the characteristics of migration ? It's the high-frequency component that's moving .

2. This paper holds that , Real low resolution images z And synthetic low resolution images x The difference between them is High frequency components . therefore ,x Migrate to z^ In the process of , want Keep the original color features ( Low frequency component ), In this way, the network only migrates high-frequency components , here z^ and z The high-frequency components of the spectrum belong to the same domain .

3. In order to fully preserve the low frequency characteristics of the source domain , Three methods are used in this paper : Color guide generator ; Low frequency Loss; perception Loss.

4. Two details are Color guide generator and Judging device . I will continue to introduce .

 

  • Pay attention to the rest of the color block (CARB)

Color guide generator :

chart 4

original text :

The top half of the network is a guided parameter network, to yield the bias (mean) and weight (variance) of CARB. The bias is the global information, so we utilize several convolutions with kernel size of 3 and three global pooling layers with kernel size of 5 to extract it. After than, the original image subtracts this global information will be fed into the sigmoid layer. The global information is used as bias, and the final output value is used as weight for CARB. For the CARB, this is a residual block. We combine spatial attention [30] and AdaIN [8] idea to enhance spatial perception. Therefore, the content and color of the original image can be maintained.

The upper half of the network is the bootstrap network , The output is CARB The offset of ( mean value ) And weights ( variance ). Bias is global information , So use a few kernel size by 3 The convolution of and three kernel size by 5 Overall situation pooling Layer to extract bias . after , Subtracting this global information from the original image will be input to sigmoid layer . Global information as bias , The final output value is CARB The weight of . about CARB, This is a residual piece . Combining spatial attention [30] and AdaIN[8] idea , Enhance spatial awareness . In this way, the content and color of the original image can be preserved .

What this method pays attention to is AdaIN, The most popular method recently . Detailed introduction can refer to a very good blog  https://zhuanlan.zhihu.com/p/158657861.

 

  • The discriminator has been improved

chart 5

There is a Guassian high-pass filter before several convolution which kernel size of 3, to extract the high frequency information. This design allows the discriminator Gz(·) to treat only the highfrequency part for real and fake image, making the training of the whole GAN more stable and fast convergent.

DZ and  DY See the picture 3.

In a few cores the size of 3 Before convolution, there is a Gaussian high pass filter to extract high frequency information . This design allows Gz(·) The discriminator only processes the high frequency part of the true and false images , Make the whole GAN The training is more stable and fast convergence .

 

  • Edge loss to improve texture detail

Edge Loss

where FE denotes Canny operator, n is the batchsize, zi ∈ Zˆ is generated by Gx→z(·), yi ∈ Y.

Here's the constraint Real high resolution images   and Generate high resolution images Of  Canny The edges are consistent .

 

 

 

 

 

 

 

 

 

 

 

 

Scroll to Top