🤖 AI Summary
Researchers introduced Recap (RL with Experience & Corrections via Advantage‑conditioned Policies), a training pipeline that takes vision‑language‑action (VLA) robot models beyond imitation learning by combining demonstrations, expert “coaching” interventions, and reinforcement learning from on‑robot experience. They used Recap to convert π0.6 into π*0.6 and demonstrated robust, high‑throughput performance on long‑horizon real‑world tasks — making espresso drinks, folding 50 novel laundry items in a new home, and assembling 59 factory boxes — achieving multi‑hour uninterrupted runs. On the hardest tasks Recap more than doubled throughput and reduced failure rates by twofold or more, bringing VLAs to practically useful reliability and speed.
Technically, Recap addresses compounding errors and credit assignment that plague imitation‑only policies by (1) pretraining with offline RL, (2) fine‑tuning on demonstrations, and (3) collecting on‑robot data annotated with expert teleoperator corrections and episode rewards. A learned value function predicts task progress (e.g., negative steps‑to‑completion); the policy is conditioned on the change in that value (the advantage), allowing all data — including “bad” trajectories — to be used while signaling which actions improved outcomes. At runtime the advantage‑conditioned VLA is prompted to select high‑advantage actions, yielding policies that outperform their training data. This approach scales to large VLA models and points to a practical path for robots to learn robust, generalizable manipulation behaviors from practice.
Loading comments...
login to comment
loading comments...
no comments yet