What is meant by 'easy mode' versus 'hard mode' for enterprise agents?

Easy mode is when the agent behaves as an extension of you (the human), using your accounts and data; hard mode envisions autonomous agents acting independently with their own access controls and risk boundaries. The hard mode raises questions about liability, privacy, and data governance.

What are 'agent identities' and why do they matter for security?

Agent identities refer to how agents authenticate and authorize access to enterprise data, distinct from human user identities, to prevent data leakage and ensure appropriate permissions in collaborative environments. Box argues this is a critical area for security design.

What is 'context engineering' and why is it hard?

Context engineering is about assembling the right data context for prompts and agent actions, given access controls and data formats. It's hard because you must balance data availability with privacy and ensure the agent doesn’t misuse or misinterpret data.

What is 'op eval' (Apex, Opus, Gemini) and why is it important?

Agent eval frameworks like Apex, Opus, and Gemini are used to benchmark and improve agent performance across industries. Box discusses how rapid jumps in model performance are observed in these evals, reflecting progress in agent capabilities.

Why is PowerPoint a tricky write-area for AI agents?

AI-generated presentations can suffer from formatting drift (font changes, element shifts) that are immediately visible to end users; this makes write-heavy outputs like slides especially sensitive to accuracy and layout fidelity.

What is Box’s 'file-system' metaphor for company data?

Box frames the enterprise as a permission-governed file system, where agents operate within a structured workspace and data access is controlled to avoid leaks, with the ability to share or restrict data as needed.

What does 'Devril' refer to in Box’s chat about future roles?

Devril is used humorously to describe micro-consulting or developer-relations-like work within a company—emphasizing the rising need for specialized roles to implement, adapt, and support agent-enabled workflows.

Who funded Box in its early days and what’s the story around it?

Aaron Levie describes how early investor interactions emerged from a TechCrunch house party influence, highlighted by Emily Milton introducing Box to DFJ for Series A funding.

What will knowledge graphs versus wiki-like memory look like for agents?

The speakers discuss a practical preference for wiki-like persistent memory (markdown/wiki-style links) over heavy graph structures, suggesting agents can leverage a lightweight, adaptable memory to organize knowledge.

Key Moments

Why Every Agent Needs a Box — Aaron Levie, Box

Latent Space Podcast

Science & Technology5 min read77 min video

Mar 5, 2026|5,717 views|99|11

Save to Pod

Key Moments

On this page

TL;DR

Agents need a box: Box explains enterprise data, identity, and workflow in AI agents.

Key Insights

Data in Box becomes an active enterprise knowledge base for onboarding, sales, roadmaps, and collaboration, not just a stored resource.

Identity and access controls are foundational: agent UIs, sandboxed workspaces, and clear liability boundaries are essential for safe autonomous agent work.

The shift from 'agents are you' to 'agents autonomously or semi-autonomously acting' demands re-engineered workflows and strong context engineering.

Evaluation and observability of agents (agent evals) are critical; Box uses structured tests (Apex, internal labs) to track progress across models.

Real-world enterprise adoption will require new infrastructure, professional services, and multi-year effort to rework data, processes, and governance.

INTRODUCING THE BOX-AGENT PARADIGM

Aaron Levie emphasizes a core shift: we no longer simply code tasks for humans; we talk to agents who execute, and humans only review at best. This requires a redesigned work model where the platform adapts to the capabilities of agents, not the other way around. Box’s key insight is that enterprise data, stored with robust permissions and collaboration features, becomes a powerful resource for agents to use across contexts—from onboarding new employees to guiding sales conversations. Early adopters stand to gain compounding returns as agents scale, but the path to deployment is gradual and complex.

ENTERPRISE DATA AS A VALUABLE ASSET

The transcript frames corporate files—contracts, research, memos, marketing materials—as a living data resource once AI agents can access and reason over them. Historically, humans engaged with data in active projects and largely forgot older material. With agents, that archival data becomes a persistent source of answers and context. The challenge is to structure, govern, and securely expose this data so agents can retrieve it accurately and safely, while end users gain faster, more informed answers within established governance boundaries.

HUMAN-AGENT INTERFACES AND SANDboxes

Levie explains that agents can operate in two modes: on behalf of a user (as an extension of the human) and as autonomous collaborators. In both modes, there is a sandboxed environment where the agent has access to a subset of tools and data. The implication is not naive automation but careful orchestration: humans collaborate with agents, oversee critical decisions, and restrict data exposure to minimize risk. Open Claw exemplifies a real-world step toward autonomous agents, reinforcing the need for well-defined boxes (workspaces) around agent activity.

IDENTITY, ACCESS CONTROLS, AND DATA GOVERNANCE

A central theme is the need for agent identities and governance around data access. The human who creates an agent bears significant liability for its actions, yet agents must be restricted from seeing data outside their scope. Box envisions an identity layer that can coordinate between human and agent access, with policies to prevent prompt injection, data leakage, and overbroad permissions. The conversation touches on partial data access, sub-workspaces, and collaboration boundaries—recognizing that effective enterprise AI requires both oversight and flexibility.

CONTEXT ENGINEERING AND THE LIMITS OF MODELS

The chat dives into context engineering as a core problem: models lack perfect search and broad context windows, so the workflow must combine robust search, ranking, and contextual constraints. Token limits force engineers to design systems that extract the right context from millions of documents into a workable subset. The distinction between coding (where context is more controllable) and general knowledge work (where data is messy and diverse) highlights the need for disciplined data architecture and search strategies to make agents reliable in practice.

READ-WRITE WORKFLOWS AND CONTENT CREATION CHALLENGES

A practical debate centers on read vs write tasks. Reading Box data to answer questions works differently from writing or generating documents, slide decks, or PDFs. Formatting quirks, font inconsistencies, and layout issues in generated content pose real UX challenges; agents can draft, but humans judge finish quality. Box envisions native read-write agents that operate within a sandbox workspace, writing outputs as artifacts inside Box while maintaining downstream governance and collaboration controls.

CANONICAL DATA AND THE CHALLENGE OF MISSING DOCUMENTS

The discussion highlights the problem of canonical data—like a complete, up-to-date list of all office addresses—that often doesn’t live in a single document. Agents must recognize gaps, avoid hallucinations, and understand when data is incomplete. This drives a push toward canonical sources or authoritative datasets within the enterprise, enabling agents to verify, cross-check, and prune results. The result is a more trustworthy agent that can consistently locate the right information without duplicating or fabricating it.

EVALS, APEX, AND INDUSTRY PROGRESS

Box describes its agent eval program, including the Apex eval and internal benchmarks that test both model capabilities and harness robustness. The results show meaningful performance leaps across model families, underscoring the importance of rigorous evaluation to separate real progress from hype. Levie emphasizes that public benchmarks are less informative than private, industry-specific evals that mirror how customers actually use agents, reinforcing the need for ongoing observability and iteration.

INSURING SAFETY: SECURITY INCIDENTS AND OVERSIGHT

A recurring concern is the potential for security incidents as agents access enterprise data. The interviewer stimulates discussion about where liability lies and how to prevent agents from inadvertently leaking sensitive information. The consensus is that a robust identity and governance framework plus sandboxing are non-negotiable. Enterprises will demand policy-driven controls, auditing trails, and fail-safes to ensure agents operate within safe, approved boundaries rather than freely roaming across data silos.

OPERATIONALIZATION: A STARTUP WITHIN A COMPANY

The agent initiative is described as an internal startup within Box, with a core team focused on the agent stack and governance. This center-of-excellence style approach coordinates with broader engineering, security, and product teams, ensuring the agent layer integrates with Box’s data platform. The group’s success hinges on cross-functional collaboration, with a dedicated set of people building the enabling infrastructure, data pipelines, and evaluation tools to scale agent usage company-wide.

WRITE CAPABILITIES AND CONTENT CREATION

While read tasks are foundational, the real value lies in write capabilities—agents creating and organizing content, supporting enterprise output, and building workspace artifacts. The transcript notes the current difficulty with highly polished content like PowerPoint, yet progress is steadily advancing. Box envisions native agents that handle end-to-end content creation within Box workspaces, leveraging model capabilities while ensuring outputs remain auditable and compliant with governance standards.

FUTURE OUTLOOK: PROFESSIONAL SERVICES, COMPETITION, AND ADOPTION

Levie foresees a multi-year journey for mainstream enterprise adoption, including dedicated professional services to help organizations reengineer workflows, data schemas, and governance to be agent-ready. The market will likely see a wave of agent-focused consultancies and toolchains as firms like Box connect with financial, legal, healthcare, and public sector customers. Competition will intensify as labs and vendors push to deliver better context, safety, and ROI, while enterprises curate evaluation pipelines to choose the right mix of tools.

Mentioned in This Episode

●People Referenced

Desk-level cheat sheet: practical takeaways from the agent-box discussion

Practical takeaways from this episode

Do This

Align your workflow to leverage agents (not expect agents to fully replace humans).

Invest in an enterprise data workspace with clear access controls and governance.

Develop explicit read/write processes for knowledge work to reduce data slop.

Avoid This

Don't assume infinite context windows solve all data retrieval problems.

Don't deploy autonomous agents without a strong oversight structure and accountability.

Common Questions

The core idea is that every agent needs a box to operate in a shared, governed workspace that protects data while enabling autonomous work and collaboration with humans. This box acts as a sandboxed memory and data store for each agent’s tasks.

Topics

Agent-box Relationship Enterprise Data Governance Agent Evals Knowledge Memory (wiki Vs Graph)Data Access Controls Sandboxed Agent Workspace PowerPoint Generation With AI Devril / DevRel Infrastructure For Agents LLM Prompting In Enterprises Liability & Oversight In AI Agents Box Platform OpenAI / AI Labs Enterprise Adoption

Mentioned in this video

People

Aaron Levy

Box co-founder discussing agent-integration strategies and enterprise AI readiness.

Jeff Uber

Chroma CEO referenced as a guest and discussion partner on context and agents.

Ben

Box CTO mentioned as a core member of the agent/eval teams.

Yash

Head of AI at Box discussed in relation to evals and strategy.

Emily Milton

Investor mentioned as part of Box's funding history (Arrington party influence).

Tyler Cowan

Economist cited for the 'production function' question in closing.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free