立即登录
v1.2.5
1 Shanghai Qizhi Institute 2 Shanghai Jiao Tong University 3 Tsinghua University, IIIS
4 The University of Hong Kong 5 University of North Carolina at Chapel Hill 6 Carnegie Mellon University 7 Chinese Academy of Sciences * Equal contribution
Takeaway:
1) Execution Efficiency: CM (RL) > DDIM (RL) > DP3 (IL) > DP (IL)
2) Single action vs. action chunking: single-step control mode is used when a fast closed-loop reaction is required
while action chunking is preferred for coordination-heavy or high precision tasks where smoothing mitigates jitter and limits error compounding.
Takeaway:
1) Variance clipping is valid for stable exploration - variance clipping in the stochastic DDIM sampling process.
2) Reconstruction is crucial for visual robotic manipulation RL as it mitigates representational drift and improves sample efficiency.
3) CM effectively compresses the iterative denoising process without sacrificing control quality, enabling high-frequency deployment.
4) On a relatively clean scene - the 3D variant learns faster and attains a higher final success rate.
Takeaway:
Training robots is like baking a cake: demonstration learning (IL) forms the sponge base, offline reinforcement learning adds the rich cream layer, and online reinforcement learning crowns it all as the cherry on top.
We would like to thank all members of the RL-100 Team and The TEA Lab from Tsinghua University for their support.