What are the three pillars of Cursor's cloud agent workflow?

Pillar 1: the model actually tests the changes it makes. Pillar 2: it returns a video demonstrating what it built. Pillar 3: the user has full remote control access to the VM to verify or tweak the work.

What is grind mode in Cursor's long-running agent concept?

Grind mode is a long-running agent workflow with a completion criteria and a plan, which only proceeds once alignment is achieved. It emphasizes starting with a plan and not stopping until the feature is completed, even if it means days of work.

How do Cursor's sub-agents help manage complex tasks?

Sub-agents are built-in or model-invoked helpers that handle specific subtasks (like exploration or testing). The main agent delegates work to sub-agents to manage long-running or context-heavy tasks, enabling better organization and summarization of results.

What role does memory and persistence play in cloud agents?

The team envisions memory and persistence of the VM state as important for continuity. They discuss ideas like memory persistence and dynamic file contexts to allow agents to rehydrate and continue work after a pause, moving toward a 'brain in a box' model.

What is the purpose of Bugbot Autofix?

Bugbot Autofix automatically fixes its own issues, and sometimes the context of the original agent that created a PR helps identify the right fixes for regressions.

Why do they emphasize videos over raw diffs for code review in some cases?

Videos provide an alignment artifact that is easier to inspect than large diffs. They can quickly communicate what the change does and whether it’s worth merging or iterating, though they’re not a complete substitute for code review.

What is the DataDog MCP used for in Cursor's cloud agents?

DataDog MCP is used to diagnose cloud agent problems by spooling up sub-agents to explore logs and potential issues, reducing debugging time significantly.

What is the difference between cloud agents and local agents in Cursor’s setup?

Cloud agents run on dedicated VMs with their own environments, which avoids port conflicts and allows parallel execution. Local agents typically use work trees and can be more tightly coupled to a developer’s machine, but scaling to larger teams favors cloud-based isolation.

What future capabilities do they expect for cloud agents?

They anticipate broader model choices, model swarms, better production pipelines, and more enterprise-grade controls (like production onboarding, release strategies, and memory management) to support larger teams and continuous delivery.

Key Moments

Cursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor

Latent Space Podcast

Science & Technology5 min read72 min video

Mar 6, 2026|5,193 views|74|2

Save to Pod

Key Moments

On this page

TL;DR

Cursor's cloud agents: test-driven, multi-model AI coding in a VM

Key Insights

Cloud agents enable end-to-end software work by giving models access to a full VM, enabling testing and live execution rather than just code generation.

The workflow rests on three pillars: the model actually tests changes, it returns a video demo of what it did, and you have full remote control (VNC) to the VM for hands-on exploration.

Video-centered demos dramatically improve alignment and onboarding, reducing underspecification and making complex changes easier to review before merging.

Slash commands (e.g., /repro, /fix) and Bugbot autofix automate testing, bug reproduction, and bug fixes, accelerating the feedback loop.

A spectrum of model strategies—best-of-end, parallel agents, and model swarms/councils—emerges to yield synergistic results beyond any single model.

Team workflows and collaboration are central: Slack becomes a development IDE, marketplace MCPs standardize capabilities, and production-readiness requires scalable review and devx tooling.

CLOUD AGENTS: END-TO-END PROGRAMMING WITH A BRAIN IN A BOX

Cursor describes cloud agents as a paradigm shift from token-by-token code generation to fully embodied, end-to-end software development inside a dedicated cloud VM. The model is onboarded with the actual development environment, runs code, tests end-to-end, and interacts with DevX tooling as any human engineer would. The vision is to render the bottlenecks of context and capability moot by providing a persistent, self-contained brain-in-a-box that can onboard, configure, and operate in a real workspace. This unlocks not just incremental edits but the ability to drive new features through autonomous, testable iterations inside a VM.

THREE PILLARS: TESTING, VIDEO DEMOS, AND REMOTE ACCESS

Cursor organizes cloud-agent work around three core pillars. Pillar one, the model actually tests changes—starting servers, running builds, and validating end-to-end behavior rather than returning diffs alone. Pillar two provides a video of what the agent built, offering a tangible artifact to review quickly and gauge whether the implementation aligns with intent. Pillar three grants full remote control of the VM, allowing engineers to inspect, modify, and experiment directly. This combination accelerates iteration and reduces reliance on large diffs for assessment.

VIDEOS AS ALIGNMENT TOOLS AND ONBOARDING SHORTCUTS

Videos have become a crucial alignment tool, especially when requests risk underspecification. A video demonstrates the exact changes and interactions, clarifying expectations before code reviews begin. While a video isn’t a substitute for reading code, it provides a practical entry point that reduces back-and-forth and sets a shared reference. The workflow often starts with watching a video to decide whether to merge or iterate, followed by live exploration via the VM to confirm the path forward.

DEVX ACCESS: FULL VM, TESTING, AND HANDS-ON INSPECTION

Beyond the video, engineers have real, hands-on access to the VM and its terminal. This enables direct experimentation, debugging, and refinement. The ability to hover UI elements, run commands, and explore the environment makes the agent’s decisions tangible and testable. Cursor emphasizes that even when videos suffice to convey progress, a live preview and control remain indispensable for final validation, giving teams confidence at merge time and enabling rapid follow-ups when needed.

FROM FRONT-END TO BACK-END: WIDE-RANGING DEMOS AND USE CASES

Demos span front-end and back-end changes, such as improved secret-management error handling and more robust end-to-end flows. For example, the agent can detect oversized secrets, open DevTools, simulate heavy input, and generate precise error messages. Such backend and frontend improvements demonstrate the model’s ability to handle real-world constraints, not just surface-level changes. The breadth of use cases—ranging from UI tweaks to backend validations—highlights the versatility of cloud agents in producing cohesive, deployable features.

ONBOARDING, AUTONOMY, AND THE GROWTH OF DEVX

A key insight is that the autonomy of models—enabled by pixels, a full VM, and on-branch onboarding—shifts the bottleneck from generation to integration and testing. Cursor notes that the shift toward autonomous onboarding and hands-on execution significantly expands what developers can accomplish. As models improve (e.g., Opus, CodeEx), this autonomy grows, enabling more ambitious tasks, broader tests, and more rapid moves from draft to production-ready work.

PARALLELISM, BEST-OF-END, AND MODEL SWARMS

The team explores composed model strategies, including best-of-end (running multiple models on the same prompt) and parallel agents. They also discuss model swarms or councils, where outputs from different providers are synthesized into a superior result. The takeaway is that diverse models can compensate for each other’s weaknesses, offering a richer, more robust foundation for cloud-agent work. This approach enables more reliable exploration, evaluation, and iteration across complex tasks.

BUG FIXING, REPROS, AND TRANSCRIPTS AS DEBUG TOOLS

Cursor emphasizes practical tooling around cloud agents: slash commands like /repro and /fix streamline bug reproduction and fixes, while Bugbot autofix accelerates remediation. The team also experiments with sharing transcripts to debug, fork conversations, or allow external agents to review a past session. These features turn complex debugging into repeatable, auditable workflows, enabling faster resolution and clearer accountability across engineers and agents alike.

TEAMWORK, SLACK AS AN IDE, AND MARKETPLACE FOR MCPs

Collaboration is central to the cloud-agent model. Slack serves as an enlarged IDE, where issue channels, follow-ups, and cross-functional input drive agent activity. A marketplace for MCPs (machine capabilities/plugins) makes it easier to scale capabilities across teams. This ecosystem approach helps distribute expertise, standardize capabilities, and accelerate adoption in larger organizations, aligning human and AI efforts within familiar collaboration tools.

PRODUCTION READINESS, PIPELINES, AND THE ROAD AHEAD

Getting code from draft to production introduces new bottlenecks even as agents accelerate development. Cursor discusses pipelines, release strategies, and automated checks to detect regressions, with the aim of scaling up compute and creating safe, repeatable deployment patterns. The horizon includes tighter integration with tooling for production readiness, security, and performance, along with deeper exploration of memory persistence, multi-model configurations, and enhanced transparency in agent decision-making.

FUTURE DIRECTIONS: MEMORY, REPLICATION, AND ENTERPRISE SCALING

Looking ahead, Cursor envisions longer-lived VM sessions with memory persistence, more nuanced production workflows, and enterprise-grade collaboration features. Debates include how much to centralize hosting versus distributing compute, and how to balance user control with automation. The ongoing evolution also touches on model selection defaults, the role of a shared ‘brain in a box,’ and the potential expansion of cloud agents into broader organizational contexts with standardized, secure deployment and governance.

Mentioned in This Episode

●Software & Apps

●Tools

●Concepts

●People Referenced

Common Questions

Cursor's cloud agents give the model full control of a remote VM, enabling end-to-end testing, building, and deployment. This setup lets the model onboard itself, run code, test changes, and generate demonstration videos, which reduces underspecification and alignment problems. The system emphasizes testing, demonstration videos, and remote VM access as a three-pillar workflow.

Topics

Cloud Agents Code Automation Robotic Software Development Video-driven Validation VM Onboarding Sub-agents Memory Persistence Bugbot DataDog MCP Slack MCP Autotab Grind Mode Model Swarms Production Readiness

Mentioned in this video

Software & Apps

Tanstack Router

Technical library referenced in the UI stack; praised by the speakers.

DataDog MCP

Managed control plane for cloud agent telemetry/logs; DataDog MCP used for cloud agent diagnosis.

DevTools

Developer Tools used by the agents to test secrets handling and errors; example of a diagnostic workflow.

Bugbot Autofix

Feature enabling automatic fixes suggested by Bugbot with a single click.

autotab

The product Cursor used as part of cloud agents; cursor mentions buying autotab and repackaging it as part of the cloud agents stack.

Bugbot

Automated bug-fixing assistant integrated into Cursor workflows; Bugbot Autofix capable of auto-fixing some issues.

Slack MCP

Workspace-integrated model control plugins for Slack; enables integration of Cursor with Slack workflows.

People

Aman

Cursor colleague referenced during discussion of team scale and enterprise adoption.

Wilson

Developer who started grind-mode experiments that influenced the long-running agent workflow.

Matthew Fria

Team member acknowledged for collaboration on the cloud agents launch video.

Andre Kapathy

Industry observer referenced discussing the broader adoption of cloud agents and platform shifts.

Rio

Team member who selected the wallpaper for the video; design detail.

Sam Whitmore

Co-creator on the podcast; discusses practicalities of videos, testing, and alignment artifacts.

Alexi

Team member mentioned as part of the chapter/brand work surrounding the video.

Jonas Nelle

Co-creator; referenced in the discussion about Slack threads and team workflows.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free