Why Every Agent Needs a Box — Aaron Levie, Box

Latent Space PodcastLatent Space Podcast
Science & Technology5 min read77 min video
Mar 5, 2026|3,905 views|78|9
Save to Pod

Key Moments

TL;DR

Agents need a box: Box explains enterprise data, identity, and workflow in AI agents.

Key Insights

1

Data in Box becomes an active enterprise knowledge base for onboarding, sales, roadmaps, and collaboration, not just a stored resource.

2

Identity and access controls are foundational: agent UIs, sandboxed workspaces, and clear liability boundaries are essential for safe autonomous agent work.

3

The shift from 'agents are you' to 'agents autonomously or semi-autonomously acting' demands re-engineered workflows and strong context engineering.

4

Evaluation and observability of agents (agent evals) are critical; Box uses structured tests (Apex, internal labs) to track progress across models.

5

Real-world enterprise adoption will require new infrastructure, professional services, and multi-year effort to rework data, processes, and governance.

INTRODUCING THE BOX-AGENT PARADIGM

Aaron Levie emphasizes a core shift: we no longer simply code tasks for humans; we talk to agents who execute, and humans only review at best. This requires a redesigned work model where the platform adapts to the capabilities of agents, not the other way around. Box’s key insight is that enterprise data, stored with robust permissions and collaboration features, becomes a powerful resource for agents to use across contexts—from onboarding new employees to guiding sales conversations. Early adopters stand to gain compounding returns as agents scale, but the path to deployment is gradual and complex.

ENTERPRISE DATA AS A VALUABLE ASSET

The transcript frames corporate files—contracts, research, memos, marketing materials—as a living data resource once AI agents can access and reason over them. Historically, humans engaged with data in active projects and largely forgot older material. With agents, that archival data becomes a persistent source of answers and context. The challenge is to structure, govern, and securely expose this data so agents can retrieve it accurately and safely, while end users gain faster, more informed answers within established governance boundaries.

HUMAN-AGENT INTERFACES AND SANDboxes

Levie explains that agents can operate in two modes: on behalf of a user (as an extension of the human) and as autonomous collaborators. In both modes, there is a sandboxed environment where the agent has access to a subset of tools and data. The implication is not naive automation but careful orchestration: humans collaborate with agents, oversee critical decisions, and restrict data exposure to minimize risk. Open Claw exemplifies a real-world step toward autonomous agents, reinforcing the need for well-defined boxes (workspaces) around agent activity.

IDENTITY, ACCESS CONTROLS, AND DATA GOVERNANCE

A central theme is the need for agent identities and governance around data access. The human who creates an agent bears significant liability for its actions, yet agents must be restricted from seeing data outside their scope. Box envisions an identity layer that can coordinate between human and agent access, with policies to prevent prompt injection, data leakage, and overbroad permissions. The conversation touches on partial data access, sub-workspaces, and collaboration boundaries—recognizing that effective enterprise AI requires both oversight and flexibility.

CONTEXT ENGINEERING AND THE LIMITS OF MODELS

The chat dives into context engineering as a core problem: models lack perfect search and broad context windows, so the workflow must combine robust search, ranking, and contextual constraints. Token limits force engineers to design systems that extract the right context from millions of documents into a workable subset. The distinction between coding (where context is more controllable) and general knowledge work (where data is messy and diverse) highlights the need for disciplined data architecture and search strategies to make agents reliable in practice.

READ-WRITE WORKFLOWS AND CONTENT CREATION CHALLENGES

A practical debate centers on read vs write tasks. Reading Box data to answer questions works differently from writing or generating documents, slide decks, or PDFs. Formatting quirks, font inconsistencies, and layout issues in generated content pose real UX challenges; agents can draft, but humans judge finish quality. Box envisions native read-write agents that operate within a sandbox workspace, writing outputs as artifacts inside Box while maintaining downstream governance and collaboration controls.

CANONICAL DATA AND THE CHALLENGE OF MISSING DOCUMENTS

The discussion highlights the problem of canonical data—like a complete, up-to-date list of all office addresses—that often doesn’t live in a single document. Agents must recognize gaps, avoid hallucinations, and understand when data is incomplete. This drives a push toward canonical sources or authoritative datasets within the enterprise, enabling agents to verify, cross-check, and prune results. The result is a more trustworthy agent that can consistently locate the right information without duplicating or fabricating it.

EVALS, APEX, AND INDUSTRY PROGRESS

Box describes its agent eval program, including the Apex eval and internal benchmarks that test both model capabilities and harness robustness. The results show meaningful performance leaps across model families, underscoring the importance of rigorous evaluation to separate real progress from hype. Levie emphasizes that public benchmarks are less informative than private, industry-specific evals that mirror how customers actually use agents, reinforcing the need for ongoing observability and iteration.

INSURING SAFETY: SECURITY INCIDENTS AND OVERSIGHT

A recurring concern is the potential for security incidents as agents access enterprise data. The interviewer stimulates discussion about where liability lies and how to prevent agents from inadvertently leaking sensitive information. The consensus is that a robust identity and governance framework plus sandboxing are non-negotiable. Enterprises will demand policy-driven controls, auditing trails, and fail-safes to ensure agents operate within safe, approved boundaries rather than freely roaming across data silos.

OPERATIONALIZATION: A STARTUP WITHIN A COMPANY

The agent initiative is described as an internal startup within Box, with a core team focused on the agent stack and governance. This center-of-excellence style approach coordinates with broader engineering, security, and product teams, ensuring the agent layer integrates with Box’s data platform. The group’s success hinges on cross-functional collaboration, with a dedicated set of people building the enabling infrastructure, data pipelines, and evaluation tools to scale agent usage company-wide.

WRITE CAPABILITIES AND CONTENT CREATION

While read tasks are foundational, the real value lies in write capabilities—agents creating and organizing content, supporting enterprise output, and building workspace artifacts. The transcript notes the current difficulty with highly polished content like PowerPoint, yet progress is steadily advancing. Box envisions native agents that handle end-to-end content creation within Box workspaces, leveraging model capabilities while ensuring outputs remain auditable and compliant with governance standards.

FUTURE OUTLOOK: PROFESSIONAL SERVICES, COMPETITION, AND ADOPTION

Levie foresees a multi-year journey for mainstream enterprise adoption, including dedicated professional services to help organizations reengineer workflows, data schemas, and governance to be agent-ready. The market will likely see a wave of agent-focused consultancies and toolchains as firms like Box connect with financial, legal, healthcare, and public sector customers. Competition will intensify as labs and vendors push to deliver better context, safety, and ROI, while enterprises curate evaluation pipelines to choose the right mix of tools.

Desk-level cheat sheet: practical takeaways from the agent-box discussion

Practical takeaways from this episode

Do This

Align your workflow to leverage agents (not expect agents to fully replace humans).
Invest in an enterprise data workspace with clear access controls and governance.
Develop explicit read/write processes for knowledge work to reduce data slop.

Avoid This

Don't assume infinite context windows solve all data retrieval problems.
Don't deploy autonomous agents without a strong oversight structure and accountability.

Common Questions

The core idea is that every agent needs a box to operate in a shared, governed workspace that protects data while enabling autonomous work and collaboration with humans. This box acts as a sandboxed memory and data store for each agent’s tasks.

Topics

Mentioned in this video

More from Latent Space

View all 11 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free