AI Dev 25 x NYC | Scott Yak: Building MCP Servers That Make Agents More Effective

DeepLearning.AIDeepLearning.AI
Education3 min read28 min video
Dec 5, 2025|718 views|9|1
Save to Pod

Key Moments

TL;DR

MCP servers centralize agent tools, improving effectiveness and simplifying evaluation.

Key Insights

1

Consolidating agent tools into an MCP server simplifies development and enhances agent capabilities.

2

MCP servers act as a product, providing direct value to customers and third-party agents.

3

Agent-agnostic and tool-agnostic evaluation strategies are crucial for MCP servers due to their diverse user base.

4

Automated generation of evaluation scenarios significantly reduces development time and effort.

5

Integrating evaluations into the development cycle, visualized through LLM observability, makes the process more efficient and enjoyable.

6

MCP servers enable a self-optimization loop where evaluations inform code improvements, leading to better tool performance.

THE STRATEGIC ADVANTAGE OF MCP SERVERS

The core message emphasizes that consolidating agent tools into a Managed Connectable Platform (MCP) server can transform the evaluation process, making it a source of joy rather than a pain. Scott Yak from Datadog explains that Datadog, an observability platform, uses these servers to empower agents, moving beyond mere data visualization to actionable insights. By centralizing tools, MCP servers reduce duplicated effort for agent teams, streamline tool usage, and allow for remote accessibility, effectively turning tools into a product that directly benefits customers and third-party agents like Cursor and Cloud Code.

ARCHITECTING THE AGENTIC WORKFLOW

The typical agent workflow involves multiple steps, starting with the agent managing its context window using system prompts and tool descriptions obtained from the MCP server. User requests are added, and an LLM decides the next action, potentially calling an MCP server tool. The MCP server processes this request, interacts with back-end services, performs business logic like filtering or post-processing, and returns a response to the agent. This response is added to the agent's context, allowing for iterative loops until the task is complete, ultimately providing a result to the user. This structured interaction ensures agents can leverage backend capabilities effectively.

EVALUATION PHILOSOPHY FOR MCP SERVERS

MCP server developers face a unique evaluation challenge because they control little about the agents using their services. Therefore, Datadog adopts an agent-agnostic and tool-agnostic evaluation philosophy. This approach focuses on the final outcome rather than specific tool calls or agent behaviors. By not optimizing for any particular agent, the MCP server remains flexible and ergonomic for all users, including simpler agents. This strategy allows for the use of faster, cheaper evaluation methods, making the evaluation process itself more efficient and less burdensome.

THE POWER OF AUTOMATED EVALUATION SCENARIO GENERATION

Manually creating comprehensive evaluation scenarios for MCP servers is a daunting task, potentially requiring thousands of scenarios to cover all functionalities and edge cases across different products like logs, metrics, and traces. The video highlights a more efficient method: generating these scenarios automatically. This involves taking a natural language question, converting it into a structured query language (like a CQ query), and then transforming that back into a natural language question for which the answer is known. This process, aided by coding agents and product documentation, can yield hundreds of labeled eval scenarios from a single documentation page, vastly accelerating the evaluation setup.

VISUALIZING AND OPTIMIZING WITH OBSERVABILITY

Once evaluation scenarios are generated and run, visualizing the results is key. Datadog uses LLM observability to display evaluation outcomes on a dashboard, showing which tool calls succeeded or failed. This data is crucial for debugging and improvement. By analyzing failure patterns, developers can identify areas needing optimization, whether it's in prompts, tool descriptions, or deeper backend logic. The ability to group these evaluations by capability further highlights areas of weakness, guiding development efforts towards enhancing specific functionalities.

THE SELF-OPTIMIZATION LOOP AND BENEFITS

The integration of MCP servers, automated evaluations, and LLM observability creates a powerful self-optimization loop. Developers can analyze evaluation failures, use coding agents to suggest and implement code improvements in tool descriptions or backend logic, and then re-run evaluations to confirm the fix. This iterative process, often taking only a few minutes from code change to evaluation result, makes development cyclical and enjoyable. For agent builders, this means fewer tool call failures to manage, and for the MCP server team, it provides motivation for continuous improvement, ultimately leading to a better experience for all users.

Common Questions

An MCP server consolidates an agent's tools into a single, remote server. This simplifies tool management, reduces duplication of effort, and allows tools to serve multiple agents, turning them into a product that offers a better user experience.

Topics

Mentioned in this video

More from DeepLearningAI

View all 65 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free