编程知识 cdmana.com

How to solve AI data dilemma? Jingdong Zhilian cloud Federation learning platform has a good solution

With the continuous development of Internet business , Many organizations have accumulated a large amount of online data , Make the most of this data , Carry out relevant data analysis 、 Feature mining 、 Algorithm modeling is the key development direction of each mechanism . However, in most industries and enterprises , Data is in the form of Islands , Because of industry competition 、 Privacy security 、 The administrative procedures are complicated , Even if it is between different departments of the same company to achieve data integration, there are many obstacles , In reality, you want to spread it all over the place 、 It's almost impossible to integrate data from various institutions , Or the cost is huge .

On the other hand , With the further development of big data , It has become a worldwide trend to attach importance to data privacy and security . This brings unprecedented challenges to the field of artificial intelligence , How to meet the safety and regulatory requirements under the premise of , Design a machine learning framework , Make AI systems more efficient 、 Use your data together accurately , It is an important topic in the development of artificial intelligence .

In the past two years , Federal Learning Technology (Federated Learning) Appearance , Working for cross team data , break “ data silos ” Provides a new solution .

Federated learning is a new basic technology of artificial intelligence , stay 2016 It was first proposed by Google , Originally used to solve the problem of Android mobile terminal users updating the model locally , Its design goal is to ensure the information security during the big data exchange 、 Protect the privacy of terminal data and personal data 、 Under the premise of ensuring legal compliance , High efficiency machine learning among multi participants or multi computing nodes . Machine learning algorithms available for federated learning are not limited to neural networks , It also includes random forest and other important algorithms . Federated learning is expected to be the foundation of the next generation of AI collaborative algorithms and networks .

In this context , Jingdong Zhilian cloud Federation learning platform came into being . This article will take you to uncover its mystery .

One 、 Jingdong Zhilian cloud Federation learning platform architecture analysis

Jingdong Zhilian cloud Federation learning platform aims to establish a _ A federated learning model based on distributed data sets ._ In the process of training , Model information interacts between agencies in encrypted form , The interaction process does not expose any organization's private data , Trained models are shared among institutions .

Not long ago , By virtue of _ Dispatching management ability 、 Data processing power 、 Algorithm implementation 、 Effect and performance and safety _ And so on , Jingdong Zhilian cloud federal learning platform passed through the Institute of information and communications 「 Big data product capability evaluation 」, He was awarded the federal certificate of assessment of basic learning ability , Get Industry Authority Recognition .

Jingdong Zhilian cloud Federation learning platform can solve the problem of data islands between government and enterprises , Release fully AI Application potential , To realize the multi-party joint modeling under the premise of privacy data security .( Pictured 1)

▲ chart 1 Jingdong Zhilian cloud Federation learning platform ▲

Why does Jingdong Zhilian cloud federated data platform have the above features ?

The quality and quantity of data determine the upper limit of machine learning . To make the model ( Such as neural network ) To achieve better results, you may need to input more data into the model . And a lot of data need to consume more storage and computing power , At this point, we need to rely on the distributed method to provide sufficient computing power for machine learning 、 Storage and reasonable task scheduling . So is federal learning , From the picture 2 We can see that its essence is an encrypted distributed machine learning technology .

▲ chart 2 JD Zhilian cloud distributed federal learning architecture ▲

JD Zhilian cloud Federation data platform can open up the data island between partners , Let multi-party data be isolated from each other , Build a virtual common model , Release fully AI Potential , Realization “ Common prosperity ”.

▲ chart 3 Federated learning scenarios ▲

Pictured 3 Shown , Jingdong Zhilian cloud federal data platform can break through the data barrier between JD's own data and its partners , Modeling in an environment where data is isolated from each other , A common model based on Jingdong data enabling , Realize the deep mining and innovation of application scenarios .

Two 、 Main capabilities of JD Zhilian cloud Federation learning platform

1, Information encryption

Jd.com cloud federated learning platform consists of Federated learning client and JD cloud gateway , The client is mainly responsible for data encryption and scientific computing , JD Zhilian cloud gateway is responsible for transmitting the necessary encrypted parameters among the clients of each participant .

The client is handed over to the participants in the form of mirror image , The developers of federally learning participants need not care about the operating system version and the development related software environment , Load the image directly . Start the federated learning platform in the mirror , You can start federal learning training .

The main work of JD Zhilian cloud gateway includes :_ Do system authentication for federated learning client 、 Pass the necessary encrypted parameters to each participant ._ In order to ensure the network security of all participants , Jingdong Zhilian cloud Federation learning platform adopts one-way network transmission strategy , That is, all participants can send network requests to jd.com cloud gateway , However, Jingdong Zhilian cloud gateway cannot send network requests to all participants . With the support of this strategy , Enterprises can only open up the uplink rights of the network , And turn off the downlink permissions . This effectively alleviates some participants' concerns about network security .

meanwhile , JD Zhilian cloud Federation learning platform supports two sample alignment methods , Alignment and Federation are encryption MD5 alignment , Federated encryption alignment with RSA The algorithm combines random noise , Help two parties find the same user ID, Guarantee that it's not shared ID It won't leak to each other .

2, Federation algorithm

Jingdong Zhilian cloud self-developed gradient information protection , All participants update their model parameters locally , So before the encryption gradient is sent , Add enough noise , The decryptor receives an unrecoverable encryption noise gradient , And we can _ The real gradient is restored by subtracting the noise , Then we update our model parameters . such _ The design of the system can fully protect its gradient information , At the same time, the accuracy of the model is guaranteed .

Besides , JD Zhilian cloud Federation learning analyzes the storage mode of sparse data , Pairwise addition 、 Support for number multiplication , The matrix multiplication between dense encryption number and sparse data is ingeniously realized , Efficiency is only related to the number of nonzero elements .

Jingdong Zhilian cloud Federation learning also provides _Logistic regressionXGBoostDNN_ And so on . Support Pearson、Spearman、WOE(weight of evidence)、IV(Information Value) And so on , Provides outlier padding 、 normalization 、 Features are divided into buckets 、Count_Encoding、One-Hot And other feature processing tools .

3, Based on the latest deep learning framework

Jingdong Zhilian cloud Federation learning platform does not rely on Spark、Yarn、K8s And so on , The whole network is built on Google all-new Tensorflow2.0 And its higher order API tf.keras. Based on the two tower network , Users can define their own DNN structure . Compare with Tensorflow 1.x, new edition Tensorflow Debugging the model is easier ,API Relatively clear , And tensorflow 2.x It will also be the trend of the future .

stay FATE In the training process of the model , It uses Tensorflow Medium Sequential API , Can't do it well bottom The Internet and interactive The computation of the network is connected smoothly — During training ,bottom The result of the forward propagation of the Internet , It's not recorded in back propagation . This leads to the reverse learning propagation of JD Zhilian cloud Federation , It needs to be propagated forward again . Twice forward spread , On the one hand, it will increase the running time , On the other hand , If the network contains random numbers , It's likely to produce the wrong result . And in Jingdong Zhilian cloud Federation learning platform , It's using _Subclassing API, More flexibility , Only one forward propagation is needed during training , It can effectively reduce the running time and the instability caused by random number ._

4, Online forecasting

For different levels of security requirements , Support SaaS " API Interface online prediction 、 There are two real-time prediction schemes in the client , The former is faster , The latter is safer .

3、 ... and 、 Scenario case

at present , JD Zhilian cloud Federation learning platform has been widely used in _ retail automobile education Risk control _ Other industries . In the automotive industry , Modeling training 2 Weeks later, , The effect of the model is significantly improved 17%, Achieve customer conversion rate and customer satisfaction ROI Double promotion , Drive enterprises to realize full link digital intelligent transformation .

A car brand has many offline 4S The store integrates online and offline data through the federal learning platform , And use machine learning technology to model together ; The model can effectively predict the number of people who buy cars in the store 、 User preference for the model , At the same time, each user's probability of arriving at the store and the model preference are scored , With SMS and telephone to reach high potential crowd , Greatly improve the efficiency of sales and the conversion rate of different models .

In terms of deployment , Jingdong Zhilian cloud federation can complete the deployment and debugging of the platform within three days , Start using within a week . It also supports visual feature analysis , No handwritten code , Select and click on the page to realize feature correlation analysis .

Recommended reading :

Welcome to click Jingdong Zhilian cloud , Learn about the developer community

 More wonderful technology practice and exclusive dry goods analysis

 Welcome to your attention 【 Jingdong Zhilian cloud Developer 】 official account

版权声明
本文为[Jingdong Zhilian cloud developer]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224120550852l.html

Scroll to Top