Stanford Seminar - Robot Skill Acquisition: Policy Representation and Data Generation
06 Mar 2024 (10 months ago)
Robot Perception and Manipulation
- The speaker introduces their work on robot perception and manipulation, aiming to push the boundaries of robot capabilities by enabling them to perform complex tasks.
- They describe their previous workflow, which involves designing task-specific action primitives, collecting robot data, and training policies with a few learnable parameters.
- This approach requires significant engineering effort and is not general enough to represent all possible robot actions, especially those requiring high-rate and reactive behaviors.
- The speaker proposes a new workflow based on diffusion policy, which allows robots to directly learn complex manipulation skills from human demonstration data.
- Diffusion policy addresses the challenge of modeling complex action distributions, such as action multimodality, by using an iterative denoising process.
- This approach results in precise predictions and captures multimodalities in the robot action space.
- Diffusion policy is a practical framework for learning robot behaviors as long as sufficient data is available.
- Diffusion policy outperforms existing baselines on multiple robot control benchmarks.
Data Collection for Robot Learning
- Collecting high-quality robot data requires careful planning and consideration of the specific task and environment.
- Three important aspects of data for robot learning are scalability, reusability, and completeness.
- Scalable data collection methods, such as self-supervised learning and internet data, often lack critical information for robot learning.
- Scaling up data collection in simulation environments is challenging due to the high setup cost for new tasks.
- A recent project, Scaling Up and Down, addresses this problem by using large models to break down tasks into smaller subtasks and reduce engineering effort.
- The speaker introduces a framework for scaling up and distilling down robot experiences to learn a visual motor policy.
- The framework uses a large language model (LLM) to generate training data for various tasks in a simulated environment.
- The LLM helps break down tasks, narrow down the search space, and generate reward functions for subtasks.
- The system can self-correct mistakes and record recovery behaviors, providing valuable data for training.
- The distilled visual motor policy can be applied in the real world without relying on simulation states.
- The speaker highlights the importance of suboptimal data in training to enable robots to recover from failures.
- Challenges in scaling up real-world data for robots are discussed, including the need for an intuitive and standardized interface.
- The speaker proposes the "Grasping in the Wild" project as an example of an interface for collecting robot-complete data in various environments.
- Limitations of the "Grasping in the Wild" interface are identified, such as restricted visual coverage, fast camera motions, and latency discrepancies between data collection and robot deployment.
- The speaker discusses the limitations of using internet data for robot manipulation tasks due to low action diversity.
- They propose modifications to a GoPro camera to enable a large variety of manipulation tasks, including:
- Switching to a fish-eye lens for a wider field of view.
- Adding small mirrors for implicit stereo depth estimation.
- Adding sensors to the fingers for tracking gripper width, contact information, and implicit force measurement.
- The modified GoPro camera is compatible with different robot platforms.
- The speaker demonstrates the device on several hard manipulation tasks, including tossing, manual folding, and dishwashing.
- The system achieves an 80% success rate for tossing, can perform manual folding after 200 demonstrations, and can handle the complex dishwashing task with a 70% success rate.
Multi-Arm Coordination and Generalization
- The speaker emphasizes the importance of considering synchronization and coordination between multiple robot arms.
- The system is able to generalize to new situations and can correct for errors.
- The speaker introduces the Umi gripper, a low-cost, portable robotic gripper that can be easily deployed in various environments.
- The speaker discusses the challenges of collecting diverse training data for robots and how Umi gripper addresses these challenges.
- The speaker presents a generalization experiment where a robot trained with diverse data collected using Umi gripper is able to perform a rearrangement task in unseen environments and with unseen objects.
- The speaker emphasizes the importance of diverse robot action data for generalization and shows that pre-training a visual encoder on internet data is insufficient for generalization.
Challenges and Future Directions
- The speaker concludes by encouraging roboticists to leverage their unique skills and knowledge to create data for robot learning and shape the next generation of big data.
- The speaker demonstrates how with enough data, you can generalize Dev to change in environments with the same Hardware.
- Generalizing among different Hardware platforms is still hard, but the same policy can be deployed on different robot arms with the same hand.
- Generalizing to different hands requires more involved engineering, such as training a Dynamics model or a separate inverse model for robots.
- It is possible to get Yumi out in the wild to the general public to gather data, but it