编程知识 cdmana.com

Deep learning algorithm behind Zao

ZAO It's hot recently , Become one of the phenomenal products , It has aroused widespread concern ,ATA The students have done some analysis , Links are as follows :

https://www.atatech.org/articles/148375?spm=ata.13269325.0.0.27ad49fa0Vr2gG

The above article introduces ZAO Is based on deep fake A product evolved from algorithms , And provides deepFake lab Download address , Computer hardware requirements and so on . In this paper, from the perspective of lower level algorithm , Take you to the essence of the algorithm , To understand the ZAO How is it based on GAN To change faces .

First , Let's give you a whole flow chart of face changing :

04d3dda6e79b254527aef740c72b6311.png

picture source :Exposing DeepFake Videos By Detecting FaceWarping Artifacts

The image above shows a picture based on deepFake The general flow of face changing algorithm , First of all, for the input image (a) Face detection in the original image (b), After the face is detected, the key points are detected (c). after (c) By transforming the matrix (d) To achieve face alignment , After that, the face after straightening will enter DeepFake(GAN/CycleGAN) Face replacement , The face will be replaced later (g) Key point alignment is done by inverse transformation of transformation matrix , Finally, the original image is replaced and fused to get (i) and (h).

Here we give the general process of face replacement on images , So for short videos , You need to cut the video first , And then replace the face frame by frame , In the process of video frame replacement, there should be a face recognition network to ensure the uniformity of the replaced objects ( For example, we are going to replace the face of a swallow in a video , It is necessary to identify whether the detected face is the swallow's , You can't replace Lagerstroemia's face ), Of course, because it's video frame by frame replacement , Then in order to ensure the natural and continuity of the face in the video frame replacement over time , We need to smooth the face before and after the frame , In order to ensure a strong visual effect .

So that's the face change , General process of video face changing , Of course for ZAO for , We found that its face changing effect is better than our general face changing algorithm , Especially when the head is spinning ( Bow your head , come back , Look up ) above , The effect is very good , So we have reason to believe that ,ZAO The algorithm should use 3D Face key point detection , In this way, it will be more natural in the process of replacement .

well , Now we understand the process , Let's introduce the above in more detail DeepFake(GAN/CycleGAN) How the algorithm works . To simplify your understanding of GAN/CycleGAN The understanding of the , We also show it in a graphic way :

0b1ca9e01eb2f9160bacfc6cb2f78206.png

First , The figure above shows the simplest face replacement network , For the output face ( On the left ), The intermediate state is obtained by neural network coding ( It's often a vector or a very small image ), Then enter the decoder to restore the reconstructed face ( On the right ). We notice , The middle encoding state is equivalent to all the information of the face saved . In the figure above, we do not do face replacement operations , namely A After face encoding, the decode is still A face ,B After face encoding, the decode is still B face .

bc42abb58e1770d16d38c91dc9909fb3.png

below , If we were to B The vector of face coding is A The decoding of the face , What will happen ? Yes ,B My face will appear in the original A The position of your face , But the facial expression and some details will remain A Of . So it's a face change .

There is one more thing to note from the above picture , Because of the replaceable requirements of coding , We've got to keep the encoders of all faces consistent , In other words, all the faces before replacement are encoded with a unified encoder ( The unified red encoder above ), But for each different face, we need to use a different decoder to decode ( Different blue and green decoders above ), That's how it's done .

But if you just use the algorithm structure above , The generated face will be more fake , We can see quite obvious traces of human replacement , And in order to make the replacement more real ,CycleGan emerge as the times require , Or a simple picture to understand CycleGan The essence of the algorithm :

 

e9c4ccd94335acc12046f287471343d1.png

We can see that , At the end of the day ,CycleGan However, there is an additional loss between the fake face and the real face to reduce the gap between them , At the same time, compared with the previous A-->B, CycleGan And at the same time B-->A And narrow the gap , And the whole process presents a closed loop , So it's called Cycle.

CycleGan The cycle of training can be significantly reduced, directly B Facial A The untruth caused by decoder decoding .

Of course , In the real world , Some post-processing may be needed after the face change to make the result smoother and more natural , For example, blur the edges of your face , In the face area, do some style transfer with the original face and so on . And these are the key technologies to realize technology substitution , We're just talking about it today ZAO The algorithm of , We will not discuss the design details of some product algorithms in depth .

Finally, face ZAO, Even though he has a bully clause , But from a technical point of view , In my submission ZAO Well done , It's worth learning technically , But is it ethical to change your face , Is it a meaningful and valuable thing , It will take time to give our final answer .

Thank you for reading , I hope it will be of some help to you .

-- River crane

Link to the original text
This article is the original content of Alibaba cloud , No reprint without permission .

版权声明
本文为[Aliyun yunqi]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224130630613n.html

Scroll to Top