RL-100

RL-100

Real-World Reinforcement Learning System

Team

1 Shanghai Qizhi Institute   2 Shanghai Jiao Tong University   3 Tsinghua University, IIIS   

4 The University of Hong Kong   5 University of North Carolina at Chapel Hill   

6 Carnegie Mellon University   7 Chinese Academy of Sciences   

* Equal contribution  

Reliable, Efficient, and Robust Real-World Robotic Manipulation Deployment

100% success across seven tasks

Soft-towel Folding
Dynamic Unscrewing
Pouring
Orange Juicing - Placing
Orange Juicing - Removal
Agile Bowling
Dynamic Push-T
Overall Juicing

Robustness to physical disturbances

Sustained counter-rotational interference
Counter-rotational interference
External pulling and lateral dragging
Dragging perturbations

Zero-shot adaptation

Changed Surface (Dynamics)
Changed Surface (Dynamics)
Different granular/fluid materials
Visual and physical interference objects
Different granular/fluid materials

Few-shot adaptation

Inverted pin arrangement
Modified container shape
Different towel material

Training efficiency

RL training curve for Bowling

Human vs. Robots

Robot vs. Human
Robot vs. Human teleoperation

Execution Efficiency

Execution Efficiency

Takeaway:
1) Execution Efficiency: CM (RL) > DDIM (RL) > DP3 (IL) > DP (IL)
2) Single action vs. action chunking: single-step control mode is used when a fast closed-loop reaction is required
while action chunking is preferred for coordination-heavy or high precision tasks where smoothing mitigates jitter and limits error compounding.

Easter egg

Easter egg

Ablation

Clip to variance
ReconvIB vs. No vs. Fix Encoder
CM vs. DDIM
2d vs. 3d

Takeaway:
1) Variance clipping is valid for stable exploration - variance clipping in the stochastic DDIM sampling process.
2) Reconstruction is crucial for visual robotic manipulation RL as it mitigates representational drift and improves sample efficiency.
3) CM effectively compresses the iterative denoising process without sacrificing control quality, enabling high-frequency deployment.
4) On a relatively clean scene - the 3D variant learns faster and attains a higher final success rate.

Thought

Execution Efficiency

Takeaway:
Training robots is like baking a cake: demonstration learning (IL) forms the sponge base, offline reinforcement learning adds the rich cream layer, and online reinforcement learning crowns it all as the cherry on top.


Related Work


Acknowledgements

We would like to thank all members of the RL-100 Team and The TEA Lab from Tsinghua University for their support.

Website modified from TWIST.
@ 2025 Kun Lei