ABSTRACT
Deep Reinforcement Learning (DRL) has great promise for learning behaviours flexibly, but can be hard to reproduce and require thousands of trials, which limits its practical use for robots. At McGill’s Mobile Robotics Lab, we have recently:
- Learned to swim with flippers in less than a dozen trials
- Reported reproducibility issues that changed the community’s empirical practices
- Developed TD3, continuous state/action DRL at world-leading performance in a dozen lines of training code
- Explored coral reefs in the turbulent littoral ocean autonomously via imitation and self-supervised learning
In this talk, I will describe the statistics and optimization insights gained from these projects. TD3 resulted from our discovery that deep actor-critic methods suffer from an overestimation bias in learning action-values that results from taking the gradient of a noisy estimator. For imitation learning, a similar analysis has identified extrapolation errors as a limiting factor in outperforming noisy experts and the Batch-Constrained Q-Learning (BCQ) approach which can do so. For model-based RL methods using Bayesian neural networks, we have analyzed sampling variance over time and increased the stability of sampling possible futures for data-efficient policy improvement. Finally, I’ll give some views on a more symbiotic relationship between robotics and DRL in the future.