State-Of-The-Art Prompting For AI Agents

Y CombinatorY Combinator
Science & Technology4 min read32 min video
May 30, 2025|324,158 views|6,220|161
Save to Pod

Key Moments

TL;DR

Prompt engineering is crucial for AI agents, focusing on detailed instructions, metaprompting, and the owner's deep understanding of users.

Key Insights

1

Prompt engineering has evolved from a workaround to a critical component for effective AI agent interaction.

2

Detailed, structured prompts, including role assignment, task breakdown, and output formatting, are essential for AI agent performance.

3

Metaprompting, where prompts can dynamically generate or improve themselves, is a powerful technique for enhancing AI capabilities.

4

The 'forward-deployed engineer' model, emphasizing deep user understanding and rapid software iteration, is key for founders in AI.

5

Evals (evaluations) are considered the true intellectual property, providing the context and data necessary for prompt improvement and AI success.

6

Different LLMs exhibit distinct 'personalities' and require varying levels of guidance, impacting how prompts should be constructed.

THE EVOLVING ROLE OF PROMPT ENGINEERING

Prompt engineering, initially seen as a temporary fix for interacting with large language models (LLMs), has become a fundamental aspect of AI development. It's likened to early-stage software development in 1995, where tools were nascent, and to managing human personnel, requiring clear communication for optimal decision-making. This evolution highlights the growing sophistication and necessity of precisely guiding AI systems.

DECONSTRUCTING ADVANCED PROMPTS

Effective prompts for AI agents are highly detailed and structured. They typically begin with defining the LLM's role, followed by a clear breakdown of the task, and a high-level plan with step-by-step instructions. Crucial elements include specifying desired output formats for integration with other systems and providing 'important things to keep in mind' to prevent deviation. The use of markdown-like formatting and XML-style tags further aids LLM comprehension and adherence to instructions, resembling programming more than natural language writing.

THE POWER OF METAPROMPTING AND DYNAMIC GENERATION

Metaprompting, where an AI can generate or refine its own prompts, is a significant advancement. Techniques like 'prompt folding' allow a prompt to dynamically create specialized versions based on previous queries or user input. This self-improvement mechanism enables prompts to become more robust and tailored, especially when dealing with complex tasks or when manual prompt writing becomes inefficient, effectively turning prompt creation into an iterative, AI-assisted process.

THE 'FORWARD-DEPLOYED ENGINEER' PARADIGM

The concept of a 'forward-deployed engineer' (FDE), originating from Palantir, is highly relevant for AI startup founders. FDEs embed themselves with users, deeply understanding their workflows and challenges to rapidly build and iterate on software solutions. This hands-on, empathetic approach, focusing on user needs and delivering tangible value quickly through demos, allows founders to outmaneuver larger, more established companies and build crucial domain expertise which forms their defensible moat.

THE CRITICAL IMPORTANCE OF EVALUATIONS (EVALS)

While prompts are important, evaluations (evals) are identified as the true crown jewels of AI companies. Evals provide the necessary context and objective measures to understand why a prompt is written a certain way and how to improve it. They are derived from deep, qualitative understanding of user needs and reward functions, often requiring founders to be intimately familiar with niche domains. This deep understanding, codified into evals, is what truly differentiates successful AI products and creates lasting competitive advantage.

NAVIGATING LLM PERSONALITIES AND RELIABILITY

Different LLMs exhibit distinct 'personalities,' impacting how they respond to prompts and rubrics. Some models are more rigid and strictly adhere to instructions, while others are more flexible and can reason through exceptions, similar to human employees. A critical aspect of ensuring reliability is building 'escape hatches' into prompts, instructing the AI to ask for clarification rather than hallucinate when information is insufficient. This requires careful prompt design to manage AI behavior and prevent undesirable outputs.

STRUCTURING AI INTERACTIONS WITH SYSTEM, DEVELOPER, AND USER PROMPTS

An emerging architectural pattern involves structuring prompts into three layers: system, developer, and user prompts. The system prompt defines the core API and company operations, while the developer prompt adds specific customer context and nuances. The user prompt, if applicable, captures direct end-user requests. This layered approach helps in creating scalable, general-purpose AI products while allowing for necessary customization without turning into a bespoke consulting service for each client.

THE ROLE OF CONTINUOUS IMPROVEMENT AND DEBUGGING

Continuous improvement, akin to the Kaizen principle, is vital in prompt engineering. Users can leverage LLMs themselves to critique and refine existing prompts. This involves feeding a prompt into the LLM and asking it to suggest improvements or identify weaknesses. Furthermore, detailed debugging information, such as thinking traces and error reports within the LLM's output, is invaluable for developers to pinpoint issues and iteratively enhance prompt effectiveness, mirroring software development's test-driven approach.

OPTIMIZING FOR PRODUCTION AND SCALABILITY

For production environments, especially where latency is critical, a common strategy involves using larger, more capable models for prompt refinement and then distilling those refined prompts into smaller, faster models. This process ensures high performance and acceptable response times, crucial for user experience in applications like voice AI. Additionally, using LLMs to automatically extract and ingest examples from customer data can streamline the process of customizing prompts for various clients.

THE CHALLENGE OF GENERALIZATION VS. SPECIALIZATION

A significant challenge for vertical AI agents is balancing flexibility for special-purpose logic with the need to avoid becoming a consulting firm. The concept of 'forking and merging' prompts across customers addresses this by defining which parts of a prompt are company-wide standards versus customer-specific. This allows for scalable solutions that can adapt to diverse customer workflows and preferences without requiring entirely new prompt development for each unique client.

Prompt Engineering Best Practices

Practical takeaways from this episode

Do This

Define clear roles for the LLM.
Break down tasks and outline high-level plans.
Use markdown-style formatting for structure.
Provide examples to help LLMs reason about complex tasks.
Consider using XML tags for better LLM parsing.
Implement an escape hatch for the LLM if information is insufficient.
Use 'debug info' in the response format for developer feedback.
For metaprompting, give the LLM the role of an expert prompt engineer.
Leverage LLM thinking traces (like Gemini's) for debugging prompts.
Focus on user needs and codify them into specific evals.
Founders should act as forward-deployed engineers, embedding with users.
Use well-refined prompts from larger models to improve smaller, faster models (distillation).
Provide LLMs with clear rubrics for scoring, but acknowledge exceptions.
Understand the distinct personalities and steerability of different models (e.g., Claude vs. Llama).

Avoid This

Don't allow LLMs to hallucinate when information is lacking; instruct them to ask for clarification.
Don't rely solely on sales tactics; demonstrate working software quickly.
Don't treat prompts as the sole 'crown jewel'; evals are crucial for understanding and improvement.
Avoid becoming a consulting company by building generic yet flexible prompts.
Don't assume LLMs will strictly adhere to rubrics; some offer more flexibility (e.g., Gemini 2.5 Pro).

Common Questions

Metaprompting involves using an AI model to refine or generate prompts. It's powerful because it can dynamically create better versions of prompts, overcome task complexity with examples, and improve output quality, much like a skilled prompt engineer.

Topics

Mentioned in this video

More from Y Combinator

View all 110 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free