ZERO-SHOT ROBOTIC MANIPULATION WITH PROGRESSIVE IMAGE-EDITING
Title: |
ZERO-SHOT ROBOTIC MANIPULATION WITH PROGRESSIVE IMAGE-EDITING |
DNr: |
Berzelius-2025-199 |
Project Type: |
LiU Berzelius |
Principal Investigator: |
Quantao Yang <yquantao@gmail.com> |
Affiliation: |
Kungliga Tekniska högskolan |
Duration: |
2025-06-04 – 2026-01-01 |
Classification: |
10201 |
Keywords: |
|
Abstract
To operate effectively in unstructured environments, generalist robots must be able to recognize and reason about novel objects and situations—many of which are absent from their training data. We propose a method that leverages an image-editing diffusion model as a high-level planner, capable of generating intermediate subgoals for a low-level controller to execute. Specifically, we finetune InstructPix2Pix on a combination of human demonstration videos and robot rollouts, enabling it to synthesize plausible future observations—subgoals—conditioned on the robot's current view and a language instruction. Concurrently, we train a low-level, goal-conditioned policy using the same robot data to execute these subgoals.