Visual-RRT: Finding Paths toward Visual-Goals via Differentiable Rendering


1KAIST    2Samsung Electronics
* Equal Contribution
Motion planning toward visual goals. (a) RRT planners efficiently explore the C-space from the start configuration qstart yet require explicit goal configurations qgoal, limiting their direct use with visual goals. (b) Visual gradient-based methods minimize the rendering loss with respect to the goal image Igoal, but often struggle to reach the desired configuration. (c) Our vRRT integrates sampling-based exploration and visual-gradient exploitation, enabling the planner to efficiently discover path toward Igoal. Gray regions in (a–c) indicate unreachable parts of the C-space. (d) Planning progression over time: while gradient-based optimization stagnates in local minima (top), our method continues to explore and successfully reaches the visual goal (bottom). In each step of our visualization, multiple robot poses represent parallel exploration and exploitation directions sampled during a single expansion step.

Visual Goal Motion Planning

Visual-goal motion planning extends classical motion planning by replacing explicit goal configurations qgoal with goal images Igoal. While traditional RRT-based planners guide tree expansion toward known configuration targets, visual-goal planning navigates configuration space using only visual similarity to the goal image as feedback. This formulation is essential for vision-centric applications where goals are demonstrated visually but precise joint angles are unavailable to obtain.

Image-Goal Motion Planning

We compare vRRT against Prof.Robot and reference RRT* solutions across three robot platforms. Each video shows the planned trajectory overlaid on the scene. vRRT successfully discovers collision-free paths that closely match RRT* solutions while operating purely from visual goals.

Franka Emika Panda


Prof. Robot
Ours
RRT*

UR5e


Prof. Robot
Ours
RRT*

Fetch


Prof. Robot
Ours
RRT*

Real-world Validation

We deploy vRRT on a physical Fetch mobile manipulator to validate sim-to-real transfer. Each video compares the robot execution (left) with the Gaussian Splatting rendering (right) of the planned trajectory. vRRT successfully plans and executes paths in real-world environments, demonstrating effective transfer to physical deployment.

Scene 1
Scene 2
Scene 3
Scene 4

vRRT with Generated Goal Images

We use an image generation model to create goal images of a Franka robot from natural language prompts, and demonstrate vRRT successfully plans executable paths to these synthesized targets despite domain gap.


Video-goal Motion Planning

In practical robotics applications, goals are often demonstrated through videos—for instance, a human operator recording a desired manipulation outcome from multiple angles. We validate vRRT on the Panda-3Cam-Azure dataset, which captures real Franka robot configurations, simulating such demonstration scenarios. Given only the video observations without explicit joint angles, vRRT successfully recovers the demonstrated poses.

Dr. Robot
Ours

BibTeX

@inproceedings{lee2026visualrrt,
  title     = {Visual-RRT: Finding Paths toward Visual-Goals via Differentiable Rendering},
  author    = {Sebin Lee and Jumin Lee and Taeyeon Kim and Youngju Na and Woobin Im and Sungeui Yoon},
  booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year      = {2026}
}