100% success Baselines Robustness Zero-shot Few-shot Training efficiency Human vs. Robots Execution Efficiency Easter egg Ablation Thought

RL-100

Real-World Reinforcement Learning System

Paper arXiv Code Twitter YouTube Blog Zhihu

From demo to duty: RL-100 can serve continuously for ~7 hours—reliable, real-world robot helps.

Main video

Team

¹ Shanghai Qizhi Institute ² Shanghai Jiao Tong University ³ Tsinghua University, IIIS

⁴ The University of Hong Kong ⁵ University of North Carolina at Chapel Hill

⁶ Carnegie Mellon University ⁷ Chinese Academy of Sciences

^* Equal contribution

Contribution List

Kun: Project lead, Core, responsible for the original idea and codebase, overall design and training of simulation and real-world experiments, and paper writing.

Huanyu: Core, optimized training infrastructure, led the unscrew, pour and juicing tasks, contributed to paper writing, and implemented part of the simulation baseline.

Dongjie: Core, led the juicing task, contributed to paper writing, most of the real-world baselines, and implemented part of the simulation baseline.

Zhenyu: Core, responsible for the real-world robot setup, design, and data collection, and contributed to paper writing.

Lingxiao: Core, led data collection for the folding task, managed its offline training process, and explored dual-arm embodiments in the early stages.

Zhennan: Core, contributed to part of the algorithm design, improved consistency policy distillation, debugged and explored settings in simulation, and developed most of the simulation baselines.

Ziyu: Managed the Metaworld domain tasks.

Shiyu: Writing refinement

Huazhe: Principal Investigator (PI), Core, responsible for project direction and guidance, and contributed to paper writing.

Reliable, Efficient, and Robust Real-World Robotic Manipulation Deployment

100% success across seven tasks

Soft-towel Folding

Dynamic Unscrewing

Pouring

Orange Juicing - Placing

Orange Juicing - Removal

Agile Bowling

Dynamic Push-T

Overall Juicing

Baselines - DP3

Soft-towel Folding

Dynamic Unscrewing

Pouring

Orange Juicing - Placing

Orange Juicing - Removal

Agile Bowling

Dynamic Push-T

Robustness to physical disturbances

Sustained counter-rotational interference (over 5 seconds)

Counter-rotational interference

External pulling and lateral dragging

Dragging perturbations

Takeaway:
Soft‑towel Folding: Disturbances in Stage‑1 (initial grasp) and Stage‑2 (pre‑fold) each retain 90% success.
Dynamic Unscrewing: Up to 4 s of reverse force during twisting and critical visual alignment—100% success; stable recovery.
Dynamic Push‑T: Multiple drag‑style disturbances during pushing—100% success.
Overall: 95.0% average success across tested scenarios, indicating reliable recovery under unstructured perturbations.

Zero-shot adaptation

Changed Surface (Dynamics)

Different granular/fluid materials

Visual and physical interference objects

Different granular/fluid materials

Folding - unseen towel shape

Takeaway:
Dynamic Push‑T: Large friction changes—100%; added distractor shapes—80%.
Agile Bowling: Floor property changes—100%.
Pouring: Granular (nuts) → liquid (water)—90%.

Average 92.5% success across four change types without retraining.

Few-shot adaptation

Inverted pin arrangement

Modified container shape

Different towel material

Takeaway:
Soft‑towel Folding: New towel materials—100%.
Agile Bowling: Inverted pin arrangement—100%.
Pouring: New container geometry—60%.

Average 86.7% with only 1–3 hours of additional training.

Training efficiency

RL training curve for Bowling

Takeaway:
The policy achieves consistent 100% success after approximately 200 episodes of on-policy rollouts.

Human vs. Robots

Robot vs. Human

Robot vs. Human teleoperation

Takeaway:
1) he robot achieved more successful bowling trials than the five human participants under the same number of attempts, 25 successful trials (robot) vs. 14 successful trials (human).
2) For Push-T, the robot achieved more successful trials than expert human and beginner human at the same wall-clock time, 20 successful trials (robot) vs. 17 successful trials (expert) vs. 13 successful trials (beginner).

Execution Efficiency

Execution Efficiency - Folding

Takeaway:
1) Execution Efficiency: CM (RL) > DDIM (RL) > DP3 (IL) > DP (IL)
2) Single action vs. action chunking: single-step control mode is used when a fast closed-loop reaction is required
while action chunking is preferred for coordination-heavy or high precision tasks where smoothing mitigates jitter and limits error compounding.
* Execution efficiency is defined as the robot’s average task completion time. For fair comparison, we report this metric only on action-chunking tasks. Single-step control (DDIM/CM) operates at the same inference rate, system-capped at 30 Hz (e.g., by the L515 camera), so runtime is dominated by hardware rather than algorithmic differences.

Easter egg

Ablation

Clip to variance

ReconvIB vs. No vs. Fix Encoder

CM vs. DDIM

2d vs. 3d

Predict type: epsilon vs. sample

Takeaway:
1) Variance clipping is valid for stable exploration - variance clipping in the stochastic DDIM sampling process.
2) Reconstruction is crucial for visual robotic manipulation RL as it mitigates representational drift and improves sample efficiency.
3) CM effectively compresses the iterative denoising process without sacrificing control quality, enabling high-frequency deployment.
4) On a relatively clean scene - the 3D variant learns faster and attains a higher final success rate.
5) Epsilon prediction is more suitable for RL: large noise schedule for exploration.

Thought

Takeaway:
Training robots is like baking a cake: demonstration learning (IL) forms the sponge base, offline reinforcement learning adds the rich cream layer, and online reinforcement learning crowns it all as the cherry on top.

Acknowledgements

We would like to thank all members of the RL-100 Team and The TEA Lab from Tsinghua University for their support.

Website modified from TWIST.
@ 2025 Kun Lei

RL-100

Real-World Reinforcement Learning System

From demo to duty: RL-100 can serve continuously for ~7 hours—reliable, real-world robot helps.

Main video

Team

Contribution List

Reliable, Efficient, and Robust Real-World Robotic Manipulation Deployment

100% success across seven tasks

100% success across seven tasks, 1X speed

Baselines - DP3

Baseline - 2D DP

Robustness to physical disturbances

Zero-shot adaptation

Few-shot adaptation

Training efficiency

Human vs. Robots

Execution Efficiency

Easter egg

Ablation

Thought

Acknowledgements

RL-100

Real-World Reinforcement Learning System

From demo to duty: RL-100 can serve continuously for ~7 hours—reliable, real-world robot helps.

Main video

Team

Contribution List

Reliable, Efficient, and Robust Real-World Robotic Manipulation Deployment

100% success across seven tasks Click to switch to 1x speed demo

100% success across seven tasks, 1X speed

Baselines - DP3 Click to switch to 2D DP

Baseline - 2D DP

Robustness to physical disturbances

Zero-shot adaptation

Few-shot adaptation

Training efficiency

Human vs. Robots

Execution Efficiency

Easter egg

Ablation

Thought

Related Work

Acknowledgements

100% success across seven tasks

Baselines - DP3