Cursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
Key Moments
Cursor's cloud agents: test-driven, multi-model AI coding in a VM
Key Insights
Cloud agents enable end-to-end software work by giving models access to a full VM, enabling testing and live execution rather than just code generation.
The workflow rests on three pillars: the model actually tests changes, it returns a video demo of what it did, and you have full remote control (VNC) to the VM for hands-on exploration.
Video-centered demos dramatically improve alignment and onboarding, reducing underspecification and making complex changes easier to review before merging.
Slash commands (e.g., /repro, /fix) and Bugbot autofix automate testing, bug reproduction, and bug fixes, accelerating the feedback loop.
A spectrum of model strategies—best-of-end, parallel agents, and model swarms/councils—emerges to yield synergistic results beyond any single model.
Team workflows and collaboration are central: Slack becomes a development IDE, marketplace MCPs standardize capabilities, and production-readiness requires scalable review and devx tooling.
CLOUD AGENTS: END-TO-END PROGRAMMING WITH A BRAIN IN A BOX
Cursor describes cloud agents as a paradigm shift from token-by-token code generation to fully embodied, end-to-end software development inside a dedicated cloud VM. The model is onboarded with the actual development environment, runs code, tests end-to-end, and interacts with DevX tooling as any human engineer would. The vision is to render the bottlenecks of context and capability moot by providing a persistent, self-contained brain-in-a-box that can onboard, configure, and operate in a real workspace. This unlocks not just incremental edits but the ability to drive new features through autonomous, testable iterations inside a VM.
THREE PILLARS: TESTING, VIDEO DEMOS, AND REMOTE ACCESS
Cursor organizes cloud-agent work around three core pillars. Pillar one, the model actually tests changes—starting servers, running builds, and validating end-to-end behavior rather than returning diffs alone. Pillar two provides a video of what the agent built, offering a tangible artifact to review quickly and gauge whether the implementation aligns with intent. Pillar three grants full remote control of the VM, allowing engineers to inspect, modify, and experiment directly. This combination accelerates iteration and reduces reliance on large diffs for assessment.
VIDEOS AS ALIGNMENT TOOLS AND ONBOARDING SHORTCUTS
Videos have become a crucial alignment tool, especially when requests risk underspecification. A video demonstrates the exact changes and interactions, clarifying expectations before code reviews begin. While a video isn’t a substitute for reading code, it provides a practical entry point that reduces back-and-forth and sets a shared reference. The workflow often starts with watching a video to decide whether to merge or iterate, followed by live exploration via the VM to confirm the path forward.
DEVX ACCESS: FULL VM, TESTING, AND HANDS-ON INSPECTION
Beyond the video, engineers have real, hands-on access to the VM and its terminal. This enables direct experimentation, debugging, and refinement. The ability to hover UI elements, run commands, and explore the environment makes the agent’s decisions tangible and testable. Cursor emphasizes that even when videos suffice to convey progress, a live preview and control remain indispensable for final validation, giving teams confidence at merge time and enabling rapid follow-ups when needed.
FROM FRONT-END TO BACK-END: WIDE-RANGING DEMOS AND USE CASES
Demos span front-end and back-end changes, such as improved secret-management error handling and more robust end-to-end flows. For example, the agent can detect oversized secrets, open DevTools, simulate heavy input, and generate precise error messages. Such backend and frontend improvements demonstrate the model’s ability to handle real-world constraints, not just surface-level changes. The breadth of use cases—ranging from UI tweaks to backend validations—highlights the versatility of cloud agents in producing cohesive, deployable features.
ONBOARDING, AUTONOMY, AND THE GROWTH OF DEVX
A key insight is that the autonomy of models—enabled by pixels, a full VM, and on-branch onboarding—shifts the bottleneck from generation to integration and testing. Cursor notes that the shift toward autonomous onboarding and hands-on execution significantly expands what developers can accomplish. As models improve (e.g., Opus, CodeEx), this autonomy grows, enabling more ambitious tasks, broader tests, and more rapid moves from draft to production-ready work.
PARALLELISM, BEST-OF-END, AND MODEL SWARMS
The team explores composed model strategies, including best-of-end (running multiple models on the same prompt) and parallel agents. They also discuss model swarms or councils, where outputs from different providers are synthesized into a superior result. The takeaway is that diverse models can compensate for each other’s weaknesses, offering a richer, more robust foundation for cloud-agent work. This approach enables more reliable exploration, evaluation, and iteration across complex tasks.
BUG FIXING, REPROS, AND TRANSCRIPTS AS DEBUG TOOLS
Cursor emphasizes practical tooling around cloud agents: slash commands like /repro and /fix streamline bug reproduction and fixes, while Bugbot autofix accelerates remediation. The team also experiments with sharing transcripts to debug, fork conversations, or allow external agents to review a past session. These features turn complex debugging into repeatable, auditable workflows, enabling faster resolution and clearer accountability across engineers and agents alike.
TEAMWORK, SLACK AS AN IDE, AND MARKETPLACE FOR MCPs
Collaboration is central to the cloud-agent model. Slack serves as an enlarged IDE, where issue channels, follow-ups, and cross-functional input drive agent activity. A marketplace for MCPs (machine capabilities/plugins) makes it easier to scale capabilities across teams. This ecosystem approach helps distribute expertise, standardize capabilities, and accelerate adoption in larger organizations, aligning human and AI efforts within familiar collaboration tools.
PRODUCTION READINESS, PIPELINES, AND THE ROAD AHEAD
Getting code from draft to production introduces new bottlenecks even as agents accelerate development. Cursor discusses pipelines, release strategies, and automated checks to detect regressions, with the aim of scaling up compute and creating safe, repeatable deployment patterns. The horizon includes tighter integration with tooling for production readiness, security, and performance, along with deeper exploration of memory persistence, multi-model configurations, and enhanced transparency in agent decision-making.
FUTURE DIRECTIONS: MEMORY, REPLICATION, AND ENTERPRISE SCALING
Looking ahead, Cursor envisions longer-lived VM sessions with memory persistence, more nuanced production workflows, and enterprise-grade collaboration features. Debates include how much to centralize hosting versus distributing compute, and how to balance user control with automation. The ongoing evolution also touches on model selection defaults, the role of a shared ‘brain in a box,’ and the potential expansion of cloud agents into broader organizational contexts with standardized, secure deployment and governance.
Mentioned in This Episode
●Tools & Products
●People Referenced
Common Questions
Cursor's cloud agents give the model full control of a remote VM, enabling end-to-end testing, building, and deployment. This setup lets the model onboard itself, run code, test changes, and generate demonstration videos, which reduces underspecification and alignment problems. The system emphasizes testing, demonstration videos, and remote VM access as a three-pillar workflow.
Topics
Mentioned in this video
Technical library referenced in the UI stack; praised by the speakers.
Managed control plane for cloud agent telemetry/logs; DataDog MCP used for cloud agent diagnosis.
Developer Tools used by the agents to test secrets handling and errors; example of a diagnostic workflow.
Feature enabling automatic fixes suggested by Bugbot with a single click.
Cursor colleague referenced during discussion of team scale and enterprise adoption.
The product Cursor used as part of cloud agents; cursor mentions buying autotab and repackaging it as part of the cloud agents stack.
Automated bug-fixing assistant integrated into Cursor workflows; Bugbot Autofix capable of auto-fixing some issues.
Workspace-integrated model control plugins for Slack; enables integration of Cursor with Slack workflows.
Developer who started grind-mode experiments that influenced the long-running agent workflow.
Team member acknowledged for collaboration on the cloud agents launch video.
Industry observer referenced discussing the broader adoption of cloud agents and platform shifts.
Team member who selected the wallpaper for the video; design detail.
Co-creator on the podcast; discusses practicalities of videos, testing, and alignment artifacts.
Team member mentioned as part of the chapter/brand work surrounding the video.
Co-creator; referenced in the discussion about Slack threads and team workflows.
More from Latent Space
View all 13 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
66 minMeasuring Exponential Trends Rising (in AI) — Joel Becker, METR
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free