How is infrastructure evolving to support AI agents?

Infrastructure is becoming more advanced and complicated, moving beyond basic physical hardware. The software layer, platform, and orchestration layer are critical for supporting the scalability and reliability required by agentic systems.

What are the key aspects of agent development discussed by the panelists?

Key aspects include tool use for connecting LLMs to the environment, the ability for agents to spend more time on complex problem-solving (hours or days), and the potential for reinforcement learning to create self-improving systems.

How can developers mitigate hallucinations in AI agent applications?

While researchers work on reducing LLM errors, developers must embrace that errors will happen. Mitigation involves building robust tooling and infrastructure, such as version rollback and automated debugging, to reduce the impact of mistakes.

Why are benchmarks becoming devalued in the AI industry?

Benchmarks are devalued partly because new models and benchmarks are released rapidly, and claims of improvement can be overwhelming. There's a growing consensus that evaluations should shift from components to whole systems.

What is the best way to learn and stay updated in the fast-paced AI field?

It's crucial to ignore the hype and engagement farming on social media and instead invest time in education, such as short or long courses on deep learning. Building things publicly and focusing on solving real problems also provides valuable intuition.

Beyond coding, what other developer tasks can AI assist with?

AI can significantly help with tasks surrounding code, such as documentation and debugging. The goal is a holistic approach that extends AI agents to cover the entire application development lifecycle.

What is the future outlook for AI beyond LLMs?

Panelists predict growth in areas like robotics and AI for science, including protein and material discovery. Developers should prepare to learn new frameworks and adapt to these emerging fields.

Key Moments

AI Dev 25 | Panel Discussion: Building AI Application in 2025

DeepLearning.AI

Entertainment3 min read32 min video

Mar 27, 2025|2,353 views|33

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

AI experts discuss building agentic systems in 2025, focusing on infrastructure, reliability, and evaluation.

Key Insights

Agentic systems are the focus for 2025 and beyond, driving demand for advanced infrastructure.

Reliability and accurate evaluation of AI systems remain critical challenges, especially with LLM hallucinations.

The shift is from model releases to product and agent releases, emphasizing practical applications.

Open-source tools and community education are crucial for democratizing AI development.

Domain expertise combined with AI scaling capabilities will be increasingly valuable.

Benchmarking needs to evolve from component evaluation to system-level assessment.

THE ASCENDANCE OF AGENTS AND INFRASTRUCTURE NEEDS

The panel emphasized that 2025 is the year of 'agents,' driving a significant shift from AI research to practical application building. This trend necessitates a robust and scalable infrastructure, moving beyond basic physical hardware to sophisticated software layers, platforms, and orchestration. The increasing complexity of AI workloads, particularly inference driven by agentic systems, requires advanced solutions that can ensure reliability and efficiency, fundamentally changing how developers will interact with and build applications. The infrastructure must now support code generation and other agentic tasks effectively.

ADVANCING AGENT CAPABILITIES

Agent capabilities are evolving in three key areas: enhanced tool use for environment interaction, enabling complex problem-solving over extended periods (hours to days), and the integration of reinforcement learning for self-improvement. The potential for agents to operate and learn autonomously, drawing from vast diverse experiences akin to pre-training, promises significantly stronger AI systems. This evolution is likened to self-driving car autonomy levels, with current agent technology at a foundational stage but rapidly progressing towards greater independence and reliability over time.

THE ROLE OF OPEN SOURCE AND CREATING AGENT SYSTEMS

Platforms like Hugging Face are democratizing AI development by providing accessible tools, datasets, and educational resources for building agentic systems. The release of simple agent libraries and popular courses signifies a community-wide demand to learn and utilize these tools. This trend reflects a broader industry shift from focusing solely on model releases to prioritizing product and agent releases, making AI more accessible and practical for a wider audience. Startups are increasingly focused on building useful applications powered by LLMs rather than the LLMs themselves.

ADDRESSING RELIABILITY AND EVALUATION CHALLENGES

A major concern in building AI applications, particularly agentic ones, is the inherent unreliability and hallucination of underlying LLMs. Mitigating these issues requires a dual approach: ongoing research to improve LLM robustness and developing robust tooling and infrastructure around agents to manage inevitable errors. This includes systems that allow for reproducibility, formal verification where possible, and strategies to minimize the impact of mistakes, such as quick rollbacks and automated debugging. Leading with evaluation and clearly defining metrics are essential for making rational deployment decisions.

THE EVOLUTION OF BENCHMARKING AND EVALUATION

The conversation highlighted the limitations of traditional benchmarks, which often focus on isolated components rather than holistic systems. The consensus is that evaluation must shift towards benchmarking agentic systems and end-to-end applications, reflecting real-world use cases. While all benchmarks are imperfect surrogates, they provide valuable information when interpreted correctly. The development of new, ecologically valid benchmarks, especially for complex tasks like code generation, is crucial for understanding AI system failures and driving progress in the field.

STRATEGIES FOR NAVIGATING THE RAPIDLY EVOLVING AI LANDSCAPE

To keep pace with the dynamic AI field, developers are advised to focus on continuous education and building practical experience rather than succumbing to hype. Investing in understanding core problems, data, and fundamental concepts is key. The panel also suggested looking towards future trends like robotics and AI for science, encouraging open-mindedness and a willingness to learn new frameworks. Building publicly and sharing lessons learned is also vital for gaining intuition about what works and where the field is headed.

THE INCREASING VALUE OF DOMAIN EXPERTISE AND COMMUNITY

Despite the rapid advancements in AI, deep domain expertise is becoming even more valuable. As AI enables scaling capabilities, specialized knowledge in fields like coding, medicine, or legal services allows professionals to build more impactful applications. The panel stressed the importance of translating this expertise into AI-driven solutions. Building and engaging within AI communities is essential for mutual support, learning, and collective progress in navigating the complex and exciting future of AI development.

Mentioned in This Episode

●Software & Apps

●Tools

●Companies

●Concepts

●People Referenced

Common Questions

The primary trend is the move from focusing on model releases to agent and product releases. Agentic systems, capable of more autonomous actions and complex problem-solving, are becoming central to AI development.

Topics

Domain Expertise

Mentioned in this video

Software & Apps

SW Bench

One of the few existing evaluations specifically for code agents.

Gaia benchmark

A benchmark created a year prior that was challenging for agents, showcasing significant progress in recent months.

Open LLM Leaderboards

A benchmark previously run by Hugging Face, now being phased out in favor of agentic evaluations.

Small Agent

A simple-to-use agent library released by Hugging Face, requiring minimal code.

People

Thomas Wolf

Co-founder and Chief Science Officer of Hugging Face.

Studies & Research

Entropic

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free