240119 AA289 Annie Chen

03 Feb 2024 (9 months ago)
240119 AA289 Annie Chen

Reinforcement Learning for Autonomous Robots

  • Recent advances in autonomous robots have led to robots that can perform tasks in controlled environments.
  • However, these robots often struggle to adapt to unexpected circumstances and novel scenarios during real-world deployment.
  • Reinforcement learning provides a framework for robots to adapt autonomously, but it is challenging to apply directly during deployment due to the need for feedback, retries, and the ability to learn from scratch.

Reset-Free Reinforcement Learning

  • Reset-free reinforcement learning addresses some of these challenges by allowing robots to practice both learning the task and undoing it without human intervention.
  • Single-life reinforcement learning is introduced as a paradigm where the agent is given prior experience and must adapt to a new scenario without human intervention or supervision within a single episode.

Robust Autonomous Modulation (REALM)

  • The proposed method, Robust Autonomous Modulation (REALM), leverages the expressive power of each behavior's value function to guide behavior selection during adaptation.
  • REALM fine-tunes the value functions of pre-trained behaviors to correct for overestimation in out-of-distribution states.
  • The selection mechanism in REALM quickly identifies appropriate behaviors in a given situation, eliminating the need for a separate high-level controller or adaptation module.
  • REALM is agnostic to how the policies and value functions of the prior behaviors are trained and can provide improvements in new situations with either a small or large number of pre-trained behaviors.
  • The adaptation process in REALM happens within a single episode at test time, allowing robots to adapt to a variety of situations without the need for extensive online training.

Rome: A Simple Algorithm for Autonomous Deployment-Time Adaptation

  • Rome is a simple algorithm for autonomous deployment-time adaptation.
  • Rome outperforms prior methods in simulated and real-world experiments.
  • Rome can adapt to novel situations within a single episode.
  • Rome can handle dynamic changing payloads and unseen objects.
  • Rome can leverage parts of each relevant behavior to complete tasks.
  • Rome provides a mechanism for single-life test-time adaptation to unseen situations.

Overwhelmed by Endless Content?