Vision-Language Action Models, a type of task that users might want to perform with SAM 3 and LLMs.
Latent Space