CLAD: Constrained Latent Action Diffusion for Vision-Language Procedure Planning


This is a companion discussion topic for the original entry at https://arxiv.org/abs/2503.06637