What were the first AI agents built by DataDog?

DataDog built three core agents: an automated AI SRE for debugging, a Bits AI dev agent for writing and developing code, and a Security Analyst agent for investigating suspicious signals.

What is the 'bitter lesson' for AI agents?

The bitter lesson for agents is that general methods leveraging new off-the-shelf models will ultimately win. Constant tweaks to specific agents may become obsolete as models rapidly improve, so focusing on providing tools and functions to smart models is key.

Why is it important for UX designers to consider agents?

As agents become first-class users of applications and tools, UX designers need to shift their focus beyond just human users. They should ensure that interfaces and APIs are accessible and intuitive for automated agents.

What does 'proactive over reactive' mean for AI agents?

It means agents should run in the background, triggered by events, rather than solely relying on direct human interaction or chat commands. They should operate autonomously like employees carrying out tasks without constant instruction.

How can companies ensure their AI agents are durable and reliable?

Using specialized tools designed for agent durability, like Temporal, can help manage failures, timeouts, and other issues. It's also crucial to sandbox agents appropriately to prevent unintended consequences.

What are the key considerations for evaluating AI agents?

Evaluation should be a continuous process, starting with offline testing on representative datasets, followed by online monitoring of live interactions (observability data). It's crucial to keep this data living by feeding real-world usage back into training and evaluation.

What is the future of AI agents in terms of interaction and UI?

The future includes multimodal interactions (voice, vision), generative UIs that adapt on the fly, and more sophisticated human-agent and agent-agent collaboration. Agents will act more autonomously and communicate more fluently.

Key Moments

AI Dev 26 x SF | Diamond Bishop: The Next 100 Agents. Building the Agent Native Office

DeepLearning.AI

Education5 min read27 min video

May 22, 2026|27 views|5

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

On this page

TL;DR

Scaling AI agents from one to hundreds requires companies to prioritize agent-native interfaces and robust evaluation frameworks, not just advanced models, to avoid production failures and ensure long-term viability.

Key Insights

By 2026, Datadog observed a significant shift from demo agents to production-ready agents saving real time and effort across companies.

Companies should adopt an 'agent-native' interface mandate, similar to the 'Bezos API mandate,' ensuring all team functionalities are accessible via agents, not just human users.

A key mistake early in agent development was the lack of strong evaluation frameworks, leading to prolonged debugging and difficulty assessing true improvement.

Framework adoption for agent development has doubled in the past year as companies focus on productionizing agents, with options like OpenAI's Agents, Langgraph, and Pantic becoming popular.

The future of enterprise AI agents will heavily feature 'learning on the job' through reinforcement learning and human observation, necessitating robust data logging for continuous improvement.

Future agent capabilities will include longer horizons for task execution (up to 12+ hours), advanced multimodal interaction (computer vision, voice), and generative UI for on-the-fly customization.

The shift from single agents to 'agent offices' necessitates new infrastructure and strategies.

The journey of building AI agents is evolving from creating one or two for specific tasks to deploying hundreds across an organization. Diamond Bishop from Datadog highlights this transition, emphasizing that scaling AI agents to power a 'next-gen enterprise' requires moving beyond individual agent development to building platforms that support diverse agent workloads safely and efficiently. This evolution is fueled by advancements in AI models that are becoming more powerful and accessible. However, the real challenge lies not in the intelligence of the models themselves, but in the infrastructure and methodologies required to manage and deploy these agents at scale. As more companies move from impressive demos to production agents that deliver tangible time and effort savings, the focus shifts to building 'agent offices' capable of handling this complexity.

Early agent development focused on core functionalities like SRE and code generation.

Datadog’s initial foray into AI agents focused on automating critical functions previously handled by human teams. This included an 'automated AI SRE' agent designed to debug problems automatically, inspired by the increasing complexity of codebases, especially those generated partly by AI. Complementing this, the 'Bits AI dev' agent was built to write and develop code based on identified errors. A third key agent developed was the 'security analyst' agent, which investigates suspicious signals and automates initial responses to potential security issues, mirroring the investigative process of human analysts. These early agents demonstrated the potential for AI to take on complex, time-consuming tasks within IT operations, development, and security.

Empowering agent-native interfaces and proactive operations is crucial for adoption.

A significant lesson learned is the need for an 'agent-native' approach to user experience (UX). Traditionally, UX design focuses on human users, but in an agent-driven future, agents themselves become first-class users of applications and APIs. Bishop advocates for a 'Bezos API mandate' equivalent for agents, ensuring all team functionalities are accessible through agent-friendly interfaces, whether MCPs, APIs, or skills. This means not just supporting non-browser-based interactions but actively designing for them. Furthermore, agents should operate proactively rather than reactively. Instead of waiting for commands, agents should run in the background, triggered by events, much like human employees operate in a business. This proactive stance requires durable infrastructure, such as solutions like Temporal, to handle potential failures and ensure continuous operation. Chat interfaces, while useful, should not be the primary mode of interaction; event-driven triggers are more efficient for background agents.

Robust and continuous evaluation is critical to agent effectiveness.

One of the most cited mistakes in early agent development was the lack of a strong evaluation framework. Without rigorous evaluation, it's difficult to determine if an agent is actually improving or whether added tools and tweaks are beneficial. Bishop stresses the importance of a multi-stage evaluation process: 1. **Offline Eval:** Using representative, measurable, and rerunnable datasets to test base performance. 2. **Online Data:** Incorporating observability data, clicks, and user interactions to understand performance in the wild. 3. **Living Evals:** Continuously feeding real-world data back into offline datasets to account for drift and evolving usage patterns. This continuous feedback loop is essential for maintaining agent efficacy over time. The process can even be aided by agents designed to evaluate other agents, creating a 'who watches the watchman' scenario for automated improvement.

Embracing framework and model agnosticism accelerates adaptation.

Given the rapid pace of model development, companies should adopt a strategy of being framework and model agnostic. The 'bitter lesson' suggests that general methods leveraging new off-the-shelf models will prevail. This means building agents with flexible tools and functions, and being prepared to swap out underlying models as better ones become available. Frameworks like OpenAI's Agents, LangGraph, and Pantic can provide useful building blocks, but organizations should avoid being locked into a single one. Multimodality is also key, as different models excel at different tasks. Companies should be able to test and switch models for various use cases without significant re-engineering. Maintaining memory and context across model updates is crucial for retaining learned improvements and customer insights.

Multiplayer capabilities and agent-to-agent communication are the next frontier.

The concept of 'multiplayer' is expanding beyond human-to-human collaboration to include agent-to-agent and human-agent collaboration. This involves not just shared repositories but transparency into agent skills, tools, and actions, fostering learning and remixing of agent capabilities. A 'tools hub' or 'skills hub' can facilitate this. Human-agent collaboration goes beyond simple human-in-the-loop feedback; it includes agents sharing their actions and explanations with humans, and humans demonstrating tasks to agents, potentially leading to new RPA-like paradigms. Secure agent-to-agent communication is also vital, often managed within an enclave or cluster with restricted network access to prevent unauthorized interactions and ensure safety.

Future predictions include learning on the job, synthetic environments, and multimodal interaction.

Looking ahead, expect a surge in agents that can 'learn on the job' through reinforcement learning and human observation, requiring companies to log data for continuous feedback. Synthetic environments will allow for product-specific world modeling, enabling agents to interact with virtual versions of services and simulated users. Durable agents capable of long-horizon tasks (12+ hours) will become more common, though managing token costs will be a concern. The evolution of authorization (Ozero) for agents acting on behalf of users is a critical, yet underdeveloped, area. Multimodal capabilities, including direct computer interaction with applications and voice-based real-time communication, are on the horizon, promising higher-bandwidth interactions. Finally, generative UI will enable dynamic, on-the-fly creation of custom user interfaces for dashboards and services.

Mentioned in This Episode

●Software & Apps

●Companies

●Concepts

●People Referenced

Building and Scaling AI Agents: Key Principles

Practical takeaways from this episode

Do This

Prioritize an agent-first UX: ensure agents are first-class users of your systems and APIs.

Implement agent-friendly interfaces (APIs, MCPs) for all functionalities.

Be proactive: run agents in the background, triggered by events, not just chat.

Ensure agent durability with tools that handle failures and timeouts.

Start with a robust eval strategy: offline, online, and continuous living evals.

Leverage frameworks and stay model-agnostic to adapt to evolving AI models.

Encourage multiplayer collaboration: allow visibility into agent skills, shared tools, and remixing.

Explore human-agent teaming and agent-to-agent communication.

Build custom, code-first agents; infrastructure should handle the execution.

Prepare for learning agents (RL) and synthetic environments for product simulation.

Consider durable agents for long-running tasks and robust Auth for agent actions.

Develop multimodal capabilities (voice, vision) and generative UIs for agents.

Avoid This

Don't assume UX designers only need to focus on human users; consider agents as clients.

Don't rely solely on chat as the primary modality for triggering agents.

Don't let agents run without proper sandboxing or oversight.

Don't neglect evaluation; avoid 'vibe coding' without clear metrics.

Don't reinvent the wheel: utilize existing frameworks rather than building from scratch.

Don't be overly prescriptive with frameworks and models at a company level; allow team flexibility.

Don't neglect memory and context to maintain learning across different models.

Don't assume agents can have the same permissions as humans without proper Auth handling.

Don't assume current agent capabilities are the final state; prepare for rapid evolution.

Common Questions

DataDog focuses on observability and helping companies scale their AI agents. They are developing AI agents for their own products (like SRE, Dev, and Security) and also aiming to help other companies build and manage their own custom AI agent fleets effectively.

Topics

AI & Machine Learning Technology & Innovation Programming & Software Future Of AI Multi-Agent Systems AI Agent Development Agent Infrastructure Evaluating AI Human-agent Collaboration

Mentioned in this video

Software & Apps

Cortana

An AI assistant developed by Microsoft, previously used with Windows Phone.

Tinker

A startup mentioned in the context of reinforcement learning for enterprise.

Bits AI SRE

An AI agent developed by DataDog that acts as an automated SRE to debug problems.

Dispatch Agents.ai

An early experiment by the speaker to provide tools for agents to communicate with each other.

Bits AI Dev

An AI agent developed by DataDog focused on writing and developing code based on identified errors.

Gas Town

A platform mentioned in relation to Steve Jagg, advised against for production use.

Security Analyst Agent

An AI agent developed by DataDog that investigates suspicious security signals.

Pantic

A framework mentioned as a good option for agent development.

Slack

A communication platform where the speaker interacts with coworkers, compared to agent communication.

Companies

Microsoft

A technology company where the speaker previously worked on Cortana.

OpenAI

Mentioned as a provider of agent frameworks.

DataDog

The speaker's company, specializing in observability for SaaS applications.

Anthropic

Mentioned as a provider of agent frameworks and models.

Thinking Labs

A company mentioned alongside Tinker in the context of enterprise RL.

Temporal

A company whose tools are used by DataDog to ensure agent durability and problem resolution.

Figma

Mentioned as a past trend for 'X for Figma' pitches, compared to current agent collaboration trends.

Thinking Machines

A company associated with the startup Tinker, in the space of enterprise RL.

Apple

Mentioned in relation to a classic approach to Oauth and agent permissions.

People

Yan Lun

Mentioned as someone who might disagree with the accelerating pace of AI futures.

Steve Jagg

Mentioned in relation to Gas Town, with a cautionary note about its production readiness.

Concepts

Bezos API Mandate

A past mandate related to API accessibility, now being paralleled for agent-friendly interfaces.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free