Why Every Agent Needs a Box — Aaron Levie, Box
Key Moments
Agents need a box: Box explains enterprise data, identity, and workflow in AI agents.
Key Insights
Data in Box becomes an active enterprise knowledge base for onboarding, sales, roadmaps, and collaboration, not just a stored resource.
Identity and access controls are foundational: agent UIs, sandboxed workspaces, and clear liability boundaries are essential for safe autonomous agent work.
The shift from 'agents are you' to 'agents autonomously or semi-autonomously acting' demands re-engineered workflows and strong context engineering.
Evaluation and observability of agents (agent evals) are critical; Box uses structured tests (Apex, internal labs) to track progress across models.
Real-world enterprise adoption will require new infrastructure, professional services, and multi-year effort to rework data, processes, and governance.
INTRODUCING THE BOX-AGENT PARADIGM
Aaron Levie emphasizes a core shift: we no longer simply code tasks for humans; we talk to agents who execute, and humans only review at best. This requires a redesigned work model where the platform adapts to the capabilities of agents, not the other way around. Box’s key insight is that enterprise data, stored with robust permissions and collaboration features, becomes a powerful resource for agents to use across contexts—from onboarding new employees to guiding sales conversations. Early adopters stand to gain compounding returns as agents scale, but the path to deployment is gradual and complex.
ENTERPRISE DATA AS A VALUABLE ASSET
The transcript frames corporate files—contracts, research, memos, marketing materials—as a living data resource once AI agents can access and reason over them. Historically, humans engaged with data in active projects and largely forgot older material. With agents, that archival data becomes a persistent source of answers and context. The challenge is to structure, govern, and securely expose this data so agents can retrieve it accurately and safely, while end users gain faster, more informed answers within established governance boundaries.
HUMAN-AGENT INTERFACES AND SANDboxes
Levie explains that agents can operate in two modes: on behalf of a user (as an extension of the human) and as autonomous collaborators. In both modes, there is a sandboxed environment where the agent has access to a subset of tools and data. The implication is not naive automation but careful orchestration: humans collaborate with agents, oversee critical decisions, and restrict data exposure to minimize risk. Open Claw exemplifies a real-world step toward autonomous agents, reinforcing the need for well-defined boxes (workspaces) around agent activity.
IDENTITY, ACCESS CONTROLS, AND DATA GOVERNANCE
A central theme is the need for agent identities and governance around data access. The human who creates an agent bears significant liability for its actions, yet agents must be restricted from seeing data outside their scope. Box envisions an identity layer that can coordinate between human and agent access, with policies to prevent prompt injection, data leakage, and overbroad permissions. The conversation touches on partial data access, sub-workspaces, and collaboration boundaries—recognizing that effective enterprise AI requires both oversight and flexibility.
CONTEXT ENGINEERING AND THE LIMITS OF MODELS
The chat dives into context engineering as a core problem: models lack perfect search and broad context windows, so the workflow must combine robust search, ranking, and contextual constraints. Token limits force engineers to design systems that extract the right context from millions of documents into a workable subset. The distinction between coding (where context is more controllable) and general knowledge work (where data is messy and diverse) highlights the need for disciplined data architecture and search strategies to make agents reliable in practice.
READ-WRITE WORKFLOWS AND CONTENT CREATION CHALLENGES
A practical debate centers on read vs write tasks. Reading Box data to answer questions works differently from writing or generating documents, slide decks, or PDFs. Formatting quirks, font inconsistencies, and layout issues in generated content pose real UX challenges; agents can draft, but humans judge finish quality. Box envisions native read-write agents that operate within a sandbox workspace, writing outputs as artifacts inside Box while maintaining downstream governance and collaboration controls.
CANONICAL DATA AND THE CHALLENGE OF MISSING DOCUMENTS
The discussion highlights the problem of canonical data—like a complete, up-to-date list of all office addresses—that often doesn’t live in a single document. Agents must recognize gaps, avoid hallucinations, and understand when data is incomplete. This drives a push toward canonical sources or authoritative datasets within the enterprise, enabling agents to verify, cross-check, and prune results. The result is a more trustworthy agent that can consistently locate the right information without duplicating or fabricating it.
EVALS, APEX, AND INDUSTRY PROGRESS
Box describes its agent eval program, including the Apex eval and internal benchmarks that test both model capabilities and harness robustness. The results show meaningful performance leaps across model families, underscoring the importance of rigorous evaluation to separate real progress from hype. Levie emphasizes that public benchmarks are less informative than private, industry-specific evals that mirror how customers actually use agents, reinforcing the need for ongoing observability and iteration.
INSURING SAFETY: SECURITY INCIDENTS AND OVERSIGHT
A recurring concern is the potential for security incidents as agents access enterprise data. The interviewer stimulates discussion about where liability lies and how to prevent agents from inadvertently leaking sensitive information. The consensus is that a robust identity and governance framework plus sandboxing are non-negotiable. Enterprises will demand policy-driven controls, auditing trails, and fail-safes to ensure agents operate within safe, approved boundaries rather than freely roaming across data silos.
OPERATIONALIZATION: A STARTUP WITHIN A COMPANY
The agent initiative is described as an internal startup within Box, with a core team focused on the agent stack and governance. This center-of-excellence style approach coordinates with broader engineering, security, and product teams, ensuring the agent layer integrates with Box’s data platform. The group’s success hinges on cross-functional collaboration, with a dedicated set of people building the enabling infrastructure, data pipelines, and evaluation tools to scale agent usage company-wide.
WRITE CAPABILITIES AND CONTENT CREATION
While read tasks are foundational, the real value lies in write capabilities—agents creating and organizing content, supporting enterprise output, and building workspace artifacts. The transcript notes the current difficulty with highly polished content like PowerPoint, yet progress is steadily advancing. Box envisions native agents that handle end-to-end content creation within Box workspaces, leveraging model capabilities while ensuring outputs remain auditable and compliant with governance standards.
FUTURE OUTLOOK: PROFESSIONAL SERVICES, COMPETITION, AND ADOPTION
Levie foresees a multi-year journey for mainstream enterprise adoption, including dedicated professional services to help organizations reengineer workflows, data schemas, and governance to be agent-ready. The market will likely see a wave of agent-focused consultancies and toolchains as firms like Box connect with financial, legal, healthcare, and public sector customers. Competition will intensify as labs and vendors push to deliver better context, safety, and ROI, while enterprises curate evaluation pipelines to choose the right mix of tools.
Mentioned in This Episode
●People Referenced
Desk-level cheat sheet: practical takeaways from the agent-box discussion
Practical takeaways from this episode
Do This
Avoid This
Common Questions
The core idea is that every agent needs a box to operate in a shared, governed workspace that protects data while enabling autonomous work and collaboration with humans. This box acts as a sandboxed memory and data store for each agent’s tasks.
Topics
Mentioned in this video
Box co-founder discussing agent-integration strategies and enterprise AI readiness.
Chroma CEO referenced as a guest and discussion partner on context and agents.
Box CTO mentioned as a core member of the agent/eval teams.
Head of AI at Box discussed in relation to evals and strategy.
Investor mentioned as part of Box's funding history (Arrington party influence).
Economist cited for the 'production function' question in closing.
More from Latent Space
View all 11 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
66 minMeasuring Exponential Trends Rising (in AI) — Joel Becker, METR
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free