AI Dev 25 x NYC | Scott Yak: Building MCP Servers That Make Agents More Effective
Key Moments
MCP servers centralize agent tools, improving effectiveness and simplifying evaluation.
Key Insights
Consolidating agent tools into an MCP server simplifies development and enhances agent capabilities.
MCP servers act as a product, providing direct value to customers and third-party agents.
Agent-agnostic and tool-agnostic evaluation strategies are crucial for MCP servers due to their diverse user base.
Automated generation of evaluation scenarios significantly reduces development time and effort.
Integrating evaluations into the development cycle, visualized through LLM observability, makes the process more efficient and enjoyable.
MCP servers enable a self-optimization loop where evaluations inform code improvements, leading to better tool performance.
THE STRATEGIC ADVANTAGE OF MCP SERVERS
The core message emphasizes that consolidating agent tools into a Managed Connectable Platform (MCP) server can transform the evaluation process, making it a source of joy rather than a pain. Scott Yak from Datadog explains that Datadog, an observability platform, uses these servers to empower agents, moving beyond mere data visualization to actionable insights. By centralizing tools, MCP servers reduce duplicated effort for agent teams, streamline tool usage, and allow for remote accessibility, effectively turning tools into a product that directly benefits customers and third-party agents like Cursor and Cloud Code.
ARCHITECTING THE AGENTIC WORKFLOW
The typical agent workflow involves multiple steps, starting with the agent managing its context window using system prompts and tool descriptions obtained from the MCP server. User requests are added, and an LLM decides the next action, potentially calling an MCP server tool. The MCP server processes this request, interacts with back-end services, performs business logic like filtering or post-processing, and returns a response to the agent. This response is added to the agent's context, allowing for iterative loops until the task is complete, ultimately providing a result to the user. This structured interaction ensures agents can leverage backend capabilities effectively.
EVALUATION PHILOSOPHY FOR MCP SERVERS
MCP server developers face a unique evaluation challenge because they control little about the agents using their services. Therefore, Datadog adopts an agent-agnostic and tool-agnostic evaluation philosophy. This approach focuses on the final outcome rather than specific tool calls or agent behaviors. By not optimizing for any particular agent, the MCP server remains flexible and ergonomic for all users, including simpler agents. This strategy allows for the use of faster, cheaper evaluation methods, making the evaluation process itself more efficient and less burdensome.
THE POWER OF AUTOMATED EVALUATION SCENARIO GENERATION
Manually creating comprehensive evaluation scenarios for MCP servers is a daunting task, potentially requiring thousands of scenarios to cover all functionalities and edge cases across different products like logs, metrics, and traces. The video highlights a more efficient method: generating these scenarios automatically. This involves taking a natural language question, converting it into a structured query language (like a CQ query), and then transforming that back into a natural language question for which the answer is known. This process, aided by coding agents and product documentation, can yield hundreds of labeled eval scenarios from a single documentation page, vastly accelerating the evaluation setup.
VISUALIZING AND OPTIMIZING WITH OBSERVABILITY
Once evaluation scenarios are generated and run, visualizing the results is key. Datadog uses LLM observability to display evaluation outcomes on a dashboard, showing which tool calls succeeded or failed. This data is crucial for debugging and improvement. By analyzing failure patterns, developers can identify areas needing optimization, whether it's in prompts, tool descriptions, or deeper backend logic. The ability to group these evaluations by capability further highlights areas of weakness, guiding development efforts towards enhancing specific functionalities.
THE SELF-OPTIMIZATION LOOP AND BENEFITS
The integration of MCP servers, automated evaluations, and LLM observability creates a powerful self-optimization loop. Developers can analyze evaluation failures, use coding agents to suggest and implement code improvements in tool descriptions or backend logic, and then re-run evaluations to confirm the fix. This iterative process, often taking only a few minutes from code change to evaluation result, makes development cyclical and enjoyable. For agent builders, this means fewer tool call failures to manage, and for the MCP server team, it provides motivation for continuous improvement, ultimately leading to a better experience for all users.
Mentioned in This Episode
●Software & Apps
●Tools
●Companies
●Concepts
Common Questions
An MCP server consolidates an agent's tools into a single, remote server. This simplifies tool management, reduces duplication of effort, and allows tools to serve multiple agents, turning them into a product that offers a better user experience.
Topics
Mentioned in this video
Referred to as 'LM' and 'LLM', these models are used by agents to decide on actions, including making tool calls. They are also utilized in the process of generating eval scenarios and analyzing results.
A DataDog product used for instrumenting and visualizing evaluation results. It helps in analyzing failure patterns, identifying areas for improvement, and can be accessed through the MCP server.
Short for evaluations, used to assess agent performance and identify failure modes like hallucination and output formatting issues. The speaker aims to make evals a source of joy rather than a pain by using MCP servers.
A feature available in the server that allows passing a specific timestamp to simulate a point in time, making eval scenarios unambiguous and repeatable regardless of when they are run.
A tool within the DataDog MCP server that allows agents to search through logs. It's used as an example in demonstrating the agent-MCP server interaction and in creating eval scenarios.
More from DeepLearningAI
View all 65 summaries
1 minThe #1 Skill Employers Want in 2026
1 minThe truth about tech layoffs and AI..
2 minBuild and Train an LLM with JAX
1 minWhat should you learn next? #AI #deeplearning
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free