With the continuous development of Internet business , Many organizations have accumulated a large amount of online data , Make the most of this data , Carry out relevant data analysis 、 Feature mining 、 Algorithm modeling is the key development direction of each mechanism . However, in most industries and enterprises , Data is in the form of Islands , Because of industry competition 、 Privacy security 、 The administrative procedures are complicated , Even if it is between different departments of the same company to achieve data integration, there are many obstacles , In reality, you want to spread it all over the place 、 It's almost impossible to integrate data from various institutions , Or the cost is huge .
On the other hand , With the further development of big data , It has become a worldwide trend to attach importance to data privacy and security . This brings unprecedented challenges to the field of artificial intelligence , How to meet the safety and regulatory requirements under the premise of , Design a machine learning framework , Make AI systems more efficient 、 Use your data together accurately , It is an important topic in the development of artificial intelligence .
In the past two years , Federal Learning Technology (Federated Learning) Appearance , Working for cross team data , break “ data silos ” Provides a new solution .
Federated learning is a new basic technology of artificial intelligence , stay 2016 It was first proposed by Google , Originally used to solve the problem of Android mobile terminal users updating the model locally , Its design goal is to ensure the information security during the big data exchange 、 Protect the privacy of terminal data and personal data 、 Under the premise of ensuring legal compliance , High efficiency machine learning among multi participants or multi computing nodes . Machine learning algorithms available for federated learning are not limited to neural networks , It also includes random forest and other important algorithms . Federated learning is expected to be the foundation of the next generation of AI collaborative algorithms and networks .
In this context , Jingdong Zhilian cloud Federation learning platform came into being . This article will take you to uncover its mystery .
One 、 Jingdong Zhilian cloud Federation learning platform architecture analysis
Jingdong Zhilian cloud Federation learning platform aims to establish a _ A federated learning model based on distributed data sets ._ In the process of training , Model information interacts between agencies in encrypted form , The interaction process does not expose any organization's private data , Trained models are shared among institutions .
Not long ago , By virtue of _ Dispatching management ability 、 Data processing power 、 Algorithm implementation 、 Effect and performance and safety _ And so on , Jingdong Zhilian cloud federal learning platform passed through the Institute of information and communications 「 Big data product capability evaluation 」, He was awarded the federal certificate of assessment of basic learning ability , Get Industry Authority Recognition .
Jingdong Zhilian cloud Federation learning platform can solve the problem of data islands between government and enterprises , Release fully AI Application potential , To realize the multi-party joint modeling under the premise of privacy data security .（ Pictured 1）
▲ chart 1 Jingdong Zhilian cloud Federation learning platform ▲
Why does Jingdong Zhilian cloud federated data platform have the above features ？
The quality and quantity of data determine the upper limit of machine learning . To make the model （ Such as neural network ） To achieve better results, you may need to input more data into the model . And a lot of data need to consume more storage and computing power , At this point, we need to rely on the distributed method to provide sufficient computing power for machine learning 、 Storage and reasonable task scheduling . So is federal learning , From the picture 2 We can see that its essence is an encrypted distributed machine learning technology .
▲ chart 2 JD Zhilian cloud distributed federal learning architecture ▲
JD Zhilian cloud Federation data platform can open up the data island between partners , Let multi-party data be isolated from each other , Build a virtual common model , Release fully AI Potential , Realization “ Common prosperity ”.
▲ chart 3 Federated learning scenarios ▲
Pictured 3 Shown , Jingdong Zhilian cloud federal data platform can break through the data barrier between JD's own data and its partners , Modeling in an environment where data is isolated from each other , A common model based on Jingdong data enabling , Realize the deep mining and innovation of application scenarios .
Two 、 Main capabilities of JD Zhilian cloud Federation learning platform
1, Information encryption
Jd.com cloud federated learning platform consists of Federated learning client and JD cloud gateway , The client is mainly responsible for data encryption and scientific computing , JD Zhilian cloud gateway is responsible for transmitting the necessary encrypted parameters among the clients of each participant .
The client is handed over to the participants in the form of mirror image , The developers of federally learning participants need not care about the operating system version and the development related software environment , Load the image directly . Start the federated learning platform in the mirror , You can start federal learning training .
The main work of JD Zhilian cloud gateway includes ：_ Do system authentication for federated learning client 、 Pass the necessary encrypted parameters to each participant ._ In order to ensure the network security of all participants , Jingdong Zhilian cloud Federation learning platform adopts one-way network transmission strategy , That is, all participants can send network requests to jd.com cloud gateway , However, Jingdong Zhilian cloud gateway cannot send network requests to all participants . With the support of this strategy , Enterprises can only open up the uplink rights of the network , And turn off the downlink permissions . This effectively alleviates some participants' concerns about network security .
meanwhile , JD Zhilian cloud Federation learning platform supports two sample alignment methods , Alignment and Federation are encryption MD5 alignment , Federated encryption alignment with RSA The algorithm combines random noise , Help two parties find the same user ID, Guarantee that it's not shared ID It won't leak to each other .
2, Federation algorithm
Jingdong Zhilian cloud self-developed gradient information protection , All participants update their model parameters locally , So before the encryption gradient is sent , Add enough noise , The decryptor receives an unrecoverable encryption noise gradient , And we can _ The real gradient is restored by subtracting the noise , Then we update our model parameters . such _ The design of the system can fully protect its gradient information , At the same time, the accuracy of the model is guaranteed .
Besides , JD Zhilian cloud Federation learning analyzes the storage mode of sparse data , Pairwise addition 、 Support for number multiplication , The matrix multiplication between dense encryption number and sparse data is ingeniously realized , Efficiency is only related to the number of nonzero elements .
Jingdong Zhilian cloud Federation learning also provides _Logistic regression、XGBoost、DNN_ And so on . Support Pearson、Spearman、WOE(weight of evidence)、IV（Information Value） And so on , Provides outlier padding 、 normalization 、 Features are divided into buckets 、Count_Encoding、One-Hot And other feature processing tools .
3, Based on the latest deep learning framework
Jingdong Zhilian cloud Federation learning platform does not rely on Spark、Yarn、K8s And so on , The whole network is built on Google all-new Tensorflow2.0 And its higher order API tf.keras. Based on the two tower network , Users can define their own DNN structure . Compare with Tensorflow 1.x, new edition Tensorflow Debugging the model is easier ,API Relatively clear , And tensorflow 2.x It will also be the trend of the future .
stay FATE In the training process of the model , It uses Tensorflow Medium Sequential API , Can't do it well bottom The Internet and interactive The computation of the network is connected smoothly — During training ,bottom The result of the forward propagation of the Internet , It's not recorded in back propagation . This leads to the reverse learning propagation of JD Zhilian cloud Federation , It needs to be propagated forward again . Twice forward spread , On the one hand, it will increase the running time , On the other hand , If the network contains random numbers , It's likely to produce the wrong result . And in Jingdong Zhilian cloud Federation learning platform , It's using _Subclassing API, More flexibility , Only one forward propagation is needed during training , It can effectively reduce the running time and the instability caused by random number ._
4, Online forecasting
For different levels of security requirements , Support SaaS " API Interface online prediction 、 There are two real-time prediction schemes in the client , The former is faster , The latter is safer .
3、 ... and 、 Scenario case
at present , JD Zhilian cloud Federation learning platform has been widely used in _ retail 、 automobile 、 education 、 Risk control _ Other industries . In the automotive industry , Modeling training 2 Weeks later, , The effect of the model is significantly improved 17%, Achieve customer conversion rate and customer satisfaction ROI Double promotion , Drive enterprises to realize full link digital intelligent transformation .
A car brand has many offline 4S The store integrates online and offline data through the federal learning platform , And use machine learning technology to model together ; The model can effectively predict the number of people who buy cars in the store 、 User preference for the model , At the same time, each user's probability of arriving at the store and the model preference are scored , With SMS and telephone to reach high potential crowd , Greatly improve the efficiency of sales and the conversion rate of different models .
In terms of deployment , Jingdong Zhilian cloud federation can complete the deployment and debugging of the platform within three days , Start using within a week . It also supports visual feature analysis , No handwritten code , Select and click on the page to realize feature correlation analysis .
Recommended reading ：
Welcome to click 【 Jingdong Zhilian cloud 】, Learn about the developer community
More wonderful technology practice and exclusive dry goods analysis
Welcome to your attention 【 Jingdong Zhilian cloud Developer 】 official account
本文为[Jingdong Zhilian cloud developer]所创，转载请带上原文链接，感谢