Stanford Seminar - Towards Safe and Efficient Learning in the Physical World

()

Safe Bayesian Optimization

Safe Bayesian optimization addresses the challenge of learning efficiently and safely by interacting with the real world.
It models unknown rewards and constraints with a stochastic process prior, such as Gaussian process models or Bayesian neural networks.
Uncertainty estimates from these models guide exploration within plausibly optimal regions while ensuring constraint satisfaction.
Safe Bayesian optimization has been successfully applied in various domains, including tuning scientific instruments, industrial manufacturing tasks, and quadruped robots.

To scale safe Bayesian optimization to richer and more complex applications, learning informative priors is crucial.
The speaker proposes using Bayesian meta-learning to learn priors from related tasks.
A flexible neural architecture based on Transformer models predicts the score of the stochastic process prior.
Empirical results demonstrate the effectiveness of the proposed approach in meta-learning probabilistic models for sequential decision-making.

The speaker explores theoretical questions and parametric regimes of Bayesian optimization.
They discuss the importance of safety in tasks where conservative and certainty estimates are crucial.
They introduce the idea of using the Gaussian process as a hyper prior and shaping it through key hyper parameters.
They propose a frontier search algorithm to find the optimal hyper parameter settings that maximize informativeness while ensuring calibration.
They demonstrate substantial acceleration in performance using meta-learning ideas in hardware experiments.
They explore the application of ideas from Bayesian optimization to learning-based control, specifically model-based reinforcement learning.
They introduce the concept of quantifying uncertainty in the dynamics of an unknown dynamical system using confidence sets.
They suggest using epistemic uncertainty in the transition model for introspective planning to avoid unsafe states.
They present an optimistic exploration protocol for model-based RL, where a policy is optimized under the most plausible realization of a set of plausible transition models.
They describe a method for reducing the problem of propagating uncertainty in the dynamics model to a standard approximate dynamic programming problem.

The speaker introduces a method for exploration in reinforcement learning called optimistic exploration.
In optimistic exploration, the agent chooses where within a set of plausible next states it wants to end up, effectively controlling its luck.
This approach is more efficient than standard policy gradients, especially when action penalties are used.
The speaker also discusses how optimistic exploration can be combined with pessimistic constraint satisfaction to ensure safety in reinforcement learning.
Experiments show that the optimistic-pessimistic algorithm outperforms other model-based and model-free algorithms in terms of task completion, constraint satisfaction, and safety during training.

The speaker concludes by discussing how optimistic exploration can be used to bridge the sim-to-real gap in reinforcement learning.
They propose a method for training reinforcement learning agents using a learned neural network prior that is regularized towards a physics simulator.
This approach outperforms uninformed neural network models and gray-box models that combine physics-informed priors with neural networks.
The speaker argues that models should learn to know what they don't know, which is a key challenge in developing safe and efficient agents that can learn by interacting with the real world.