编程知识 cdmana.com

Evaluating design tradeoffs in visual model-based reinforcement learning

Model free reinforcement learning has been successfully proved in a series of fields , Including robots 、 control 、 Play games and autopilot cars . These systems learn through simple trial and error , Therefore, a lot of attempts need to be made before solving a given task . by comparison , Model based reinforcement learning (MBRL)  Learning environment model ( It is usually called world model or dynamic model ), Enables agents to predict the results of potential actions , So as to reduce the amount of environmental interaction required to solve a task .

In principle, , All that is necessary for planning is to predict future rewards , It can then be used to select near optimal future actions . however , Recently, many methods , Dreamy homesick , Pei , And simple , Training signals for predicting future images are also used . But predicting future images is really necessary or helpful ? What are the benefits of doing visual MBRL The algorithm is actually derived from and also predicts future images ? The computational and presentation costs of predicting the entire image are considerable , So it's important to know if this is really useful for MBRL Research is very important .

stay “ Model 、 Pixels and rewards : Evaluate design tradeoffs in visual model-based reinforcement learning ” in , We show that predicting future images provides great benefits , And it's actually training successful vision MBRL The key factor of agency . We have developed a new open source library , be called World Models Library, It enables us to critically evaluate various world model designs , To determine the relative impact of image prediction on the return reward of each model .

World Models Library

World Models Library Designed for visual MBRL Designed for training and evaluation , The impact of each design decision on the final performance of large-scale agents across multiple tasks can be empirically studied . The library introduces a platform independent visual system MBRL Analog loop and API To seamlessly define a new world model , Planner and task or selection , And from the existing directory , These include agents ( for example , Xuan Pei ), Video mode ( for example ,SV2P ), As well as a variety of DeepMind Control Task and Planner , for example CEM and MPPI.

Use the library , Developers can study MBRL Changing factors in ( For example, model design or representation space ) Impact on agent performance on a set of tasks . The library supports training agents from scratch or on a pre collected set of tracks , And evaluating pre trained agents on a given task . Model 、 Planning algorithms and tasks can be easily mixed and matched to any desired combination .

In order to provide users with maximum flexibility , The library is using NumPy Interface construction , This interface allows you to TensorFlow、Pytorch or JAX Implement different components in . Please check this colab For a quick introduction .

The impact of image prediction

Use the world model library , We trained several world models with different image prediction levels . All these models use the same input ( Previously observed images ) To predict images and rewards , But they predict different percentages of images . As the number of image pixels predicted by the agent increases , Agent performance measured by real rewards usually improves .

Interestingly , The correlation between reward prediction accuracy and agent performance is not so strong , In some cases , More accurate reward prediction can even lead to agent performance degradation . meanwhile , There is a strong correlation between image reconstruction error and agent performance .

This phenomenon is directly related to exploration , That is, when the agent tries more actions with less risk and potential return to collect more information about unknown options in the environment . This can be shown by testing and comparing models in offline settings ( namely , Learning strategies from pre collected data sets , Not online RL, The latter learns strategies by interacting with the environment ). Offline settings ensure no exploration , And all models are trained on the same data . We observed that , Models that are more suitable for data usually perform better in offline settings , It's amazing , These models may be different from those that perform best when learning and exploring from scratch .

Conclusion

We have proved by experience that , Compared with the model that only predicts the expected reward , Predicted images can significantly improve task performance . We also show that , The accuracy of image prediction is closely related to the final task performance of these models . These findings can be used for better model design , And especially useful for any future setup where the input space is high-dimensional and the cost of collecting data is high .

If you want to develop your own models and experiments , Please go to our repository and collaboration lab , There you can find instructions on how to reproduce this work and how to use or expand the world model library .


版权声明
本文为[Blog on rainy night]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/10/20211002145715180H.html

Scroll to Top