ZERO-SHOT ROBOTIC MANIPULATION WITH PROGRESSIVE IMAGE-EDITING
Title: ZERO-SHOT ROBOTIC MANIPULATION WITH PROGRESSIVE IMAGE-EDITING
DNr: Berzelius-2025-199
Project Type: LiU Berzelius
Principal Investigator: Quantao Yang <yquantao@gmail.com>
Affiliation: Kungliga Tekniska högskolan
Duration: 2025-06-04 – 2026-01-01
Classification: 10201
Keywords:

Abstract

To operate effectively in unstructured environments, generalist robots must be able to recognize and reason about novel objects and situations—many of which are absent from their training data. We propose a method that leverages an image-editing diffusion model as a high-level planner, capable of generating intermediate subgoals for a low-level controller to execute. Specifically, we finetune InstructPix2Pix on a combination of human demonstration videos and robot rollouts, enabling it to synthesize plausible future observations—subgoals—conditioned on the robot's current view and a language instruction. Concurrently, we train a low-level, goal-conditioned policy using the same robot data to execute these subgoals.