Stanford Seminar - Towards Safe and Efficient Learning in the Physical World
19 Apr 2024 (8 months ago)
Safe Bayesian Optimization
- Safe Bayesian optimization addresses the challenge of learning efficiently and safely by interacting with the real world.
- It models unknown rewards and constraints with a stochastic process prior, such as Gaussian process models or Bayesian neural networks.
- Uncertainty estimates from these models guide exploration within plausibly optimal regions while ensuring constraint satisfaction.
- Safe Bayesian optimization has been successfully applied in various domains, including tuning scientific instruments, industrial manufacturing tasks, and quadruped robots.
- To scale safe Bayesian optimization to richer and more complex applications, learning informative priors is crucial.
- The speaker proposes using Bayesian meta-learning to learn priors from related tasks.
- A flexible neural architecture based on Transformer models predicts the score of the stochastic process prior.
- Empirical results demonstrate the effectiveness of the proposed approach in meta-learning probabilistic models for sequential decision-making.
Safe Reinforcement Learning
- The speaker explores theoretical questions and parametric regimes of Bayesian optimization.
- They discuss the importance of safety in tasks where conservative and certainty estimates are crucial.
- They introduce the idea of using the Gaussian process as a hyper prior and shaping it through key hyper parameters.
- They propose a frontier search algorithm to find the optimal hyper parameter settings that maximize informativeness while ensuring calibration.
- They demonstrate substantial acceleration in performance using meta-learning ideas in hardware experiments.
- They explore the application of ideas from Bayesian optimization to learning-based control, specifically model-based reinforcement learning.
- They introduce the concept of quantifying uncertainty in the dynamics of an unknown dynamical system using confidence sets.
- They suggest using epistemic uncertainty in the transition model for introspective planning to avoid unsafe states.
- They present an optimistic exploration protocol for model-based RL, where a policy is optimized under the most plausible realization of a set of plausible transition models.
- They describe a method for reducing the problem of propagating uncertainty in the dynamics model to a standard approximate dynamic programming problem.
Optimistic Exploration
- The speaker introduces a method for exploration in reinforcement learning called optimistic exploration.
- In optimistic exploration, the agent chooses where within a set of plausible next states it wants to end up, effectively controlling its luck.
- This approach is more efficient than standard policy gradients, especially when action penalties are used.
- The speaker also discusses how optimistic exploration can be combined with pessimistic constraint satisfaction to ensure safety in reinforcement learning.
- Experiments show that the optimistic-pessimistic algorithm outperforms other model-based and model-free algorithms in terms of task completion, constraint satisfaction, and safety during training.
Bridging the Sim-to-Real Gap
- The speaker concludes by discussing how optimistic exploration can be used to bridge the sim-to-real gap in reinforcement learning.
- They propose a method for training reinforcement learning agents using a learned neural network prior that is regularized towards a physics simulator.
- This approach outperforms uninformed neural network models and gray-box models that combine physics-informed priors with neural networks.
- The speaker argues that models should learn to know what they don't know, which is a key challenge in developing safe and efficient agents that can learn by interacting with the real world.