Key Moments
The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org
Key Moments
Expert guide to prompt engineering, covering techniques, taxonomy, optimization, and security risks.
Key Insights
Prompt engineering involves a taxonomy of techniques categorized by problem-solving strategy.
Few-shot prompting requires careful attention to exemplar quality, ordering, and formatting.
Chain-of-thought and decomposition techniques aid complex problem-solving by breaking down reasoning.
Ensembling and self-criticism offer methods to improve prompt output reliability.
Prompt injection and jailbreaking are critical security concerns, requiring distinct approaches.
Automatic prompt engineering tools like DSPy can significantly outperform human efforts.
Multimodal prompting (video, audio) presents new challenges and complexities.
Structured output prompting and robust evaluation are crucial for reliable AI applications.
FOUNDATIONS OF PROMPT ENGINEERING
Sander Schulhoff's journey into AI began with early exposure to Java and a fascination with deep learning, leading to research in reinforcement learning and NLP. His early work on diplomacy and the Minecraft reinforcement learning competition provided practical AI experience. A pivotal moment was using GPT-3 for translation, sparking his interest in prompting. This led to the creation of LearnPrompting.org, a comprehensive resource that democratized knowledge about prompting techniques before the widespread adoption of tools like ChatGPT.
RESEARCH AND TAXONOMY OF PROMPTING
Schulhoff's significant contributions include two major papers: 'Hack a Prompt' and 'The Prompt Report'. 'Hack a Prompt' analyzed 600,000 malicious prompts, leading to a best paper award at EMLP. 'The Prompt Report' systematically reviewed thousands of papers, creating a formal taxonomy of prompting techniques. This taxonomy categorizes methods based on problem-solving strategies, such as generating thought, ensembling, self-criticism, decomposition, and zero-shot/few-shot approaches, providing a structured understanding of the field.
ANALYSIS OF PROMPTING TECHNIQUES
The discussion delved into specific techniques, questioning the efficacy of role and emotion prompting on modern models for accuracy-based tasks, citing experiments where an 'idiot' prompt outperformed a 'genius' one. Few-shot prompting advice centered on exemplar quality, ordering, format, quantity, distribution, and similarity, highlighting that exemplar ordering can dramatically impact accuracy. The exploration also touched on chain-of-thought (CoT) and its variations, noting that while models are improving, explicit CoT prompts remain valuable for ensuring consistent reasoning, especially at scale.
ADVANCED PROMPTING STRATEGIES
Decomposition techniques aim to break down problems into smaller, manageable sub-problems, akin to thought generation but focused on solving distinct parts. Ensembling involves using multiple prompts or samplings to derive a final answer, with self-consistency being a related sampling method. Self-criticism involves a model reviewing and refining its own output. While advanced techniques like 'Tree of Thoughts' and 'Skeleton of Thought' exist, simpler decomposition prompts often suffice. The conversation also touched on the complexity and cost associated with complex prompting systems.
AUTOMATIC PROMPT ENGINEERING AND SECURITY
Automatic Prompt Engineering (APE) tools, particularly DSPy, have shown remarkable efficiency, surpassing human prompt engineering in certain tasks by automating optimization. However, APE typically requires ground truth labels, making open-ended generation tasks harder to optimize. The discussion then shifted to prompt security, differentiating prompt injection (developer and user input conflict) from jailbreaking (user input only). Schulhoff advocates for 'prompt hacking' as a catch-all term due to the conflation of these terms. A notable discovered attack was 'context overflow', exploiting model limitations by creating a lengthy prompt to force a specific output.
MULTIMODAL AND STRUCTURED OUTPUT ANALYSIS
The field is rapidly expanding into multimodal prompting, with video and audio generation presenting unique challenges. Prompting video models, in particular, is noted as being significantly more complex than text due to increased degrees of freedom. Structured output prompting, while exciting, may potentially impact model creativity and accuracy, though OpenAI's new features aim to mitigate this. Evaluating AI outputs, especially on linear scales, requires careful prompt design to assign meaning to scores and avoid model biases, underscoring the importance of robust evaluation systems in AI engineering.
THE FUTURE OF PROMPT ENGINEERING AND AI DEVELOPMENT
Schulhoff emphasizes that prompt engineering is a fundamental skill for everyone, not just a specialized role, although dedicated prompt engineers are valuable for AI companies. The rise of the 'AI Engineer' or 'Generative AI Architect' signifies a move beyond pure prompting towards more integrated coding and system design. Upcoming projects include Hack a Prompt 2.0, aiming to create a dataset of real-world harms (misinformation, harassment, agentic security risks) to improve model safety tuning and benchmarking, with ambitious goals for prizes and hacker participation.
Mentioned in This Episode
●Software & Apps
●Companies
●Organizations
●Books
●Studies Cited
●Concepts
●People Referenced
Key Prompt Engineering Design Advice
Practical takeaways from this episode
Do This
Avoid This
Common Questions
Prompt injection occurs when both developer and user inputs are present in a prompt, and the user input overrides the developer's instructions. Jailbreaking, conversely, happens when only user input is involved, without any developer instructions being present.
Topics
Mentioned in this video
An earlier large language model that Sander Schulhoff used for a translation task, marking his first introduction to prompting.
Mentioned as a specialized tool for prompt engineering.
Mentioned as an example of a powerful but costly model, with pricing information provided ($5 per million input tokens).
A benchmark used to test the efficacy of role prompting, where Sander's experiments showed the 'idiot prompt' outperformed the 'genius prompt'.
An AI music generation tool that Sander has used and found impressive, especially its voice output.
A text-to-video generation model that Sander has not yet had hands-on experience with but notes the difficulty in prompting for specific animations.
A website and resource created by Sander Schulhoff to consolidate information about prompting techniques, which gained significant popularity and continues to be supported.
The platform where the Prompt Report was initially published, receiving millions of views and significant engagement from the AI community.
Associated with the Evol Instruct paper and researchers who have made advancements in training LLMs.
A video generation model that Sander found difficult to prompt for precise animations, highlighting the complexity of multimodal prompting.
Sander's go-to tool for prompt engineering over the last couple of years.
Mentioned as a specialized tool for prompt engineering.
Researchers from Google are working on video models, an area of interest for future developments in multimodal AI.
A Python library for programmatically optimizing LLM calls, recommended by Sander for its efficiency and ease of use in prompt engineering, often outperforming manual efforts.
Mentioned as a specialized tool for prompt engineering.
A smaller, more cost-effective version of GPT-4 Turbo, priced at $0.15 per million input tokens, useful for tasks where multiple calls can be cheaper than one large call.
Used as a benchmark for prompt injection attacks in the Hack prompts competition and as a tool for generating paper drafts (though cautioned against for full paper generation).
A recent model release mentioned in conjunction with papers like Orca and Evol Instruct, relevant to training Chain of Thought into models.
An AI music generation tool that Sander has used and is impressed by, particularly its voice capabilities.
A framework mentioned in the context of structured output prompting.
The underlying architecture of modern LLMs like GPT-3, mentioned in the context of role prompting potentially working better on older models than current Transformers.
A prompting technique that involves making the model output its reasoning steps, which was known when LearnPrompting.org was created and is a significant area of research.
A significant paper related to training advanced reasoning capabilities into models, mentioned alongside Evol Instruct as relevant to LLM development.
A prompting technique discussed in relation to decomposition, where reasoning is viewed as taking actions.
An ensembling technique that involves sampling multiple responses from the same prompt with high temperature to improve output reliability.
An attack technique discovered during the Hack prompts competition where a very long prompt fills the context window, causing the model to overflow and potentially execute malicious instructions.
Vulnerabilities in prompts where user input can override developer instructions, a topic central to the Hack prompts competition and subsequent research.
A prompting technique discussed in relation to training models effectively, particularly from researchers associated with WizardLM.
A hybrid teaming company mentioned as investors and collaborators in cybersecurity challenges.
Mentioned as a specialized tool for prompt engineering.
A major AI research company that sponsored the Hack prompts competition and whose models are frequently discussed in the context of prompting techniques and security.
One of the companies whose researchers contributed to the Prompt Report, indicating their involvement in advanced AI research.
Contributed researchers to the Prompt Report, highlighting their significant role in AI development and research.
Released a paper on 'Diversity Empowered Intelligence' (DEI), an agent approach that performed well on SweetBench scores, viewed as a challenge to First Order Scale AI.
The company that first discovered prompt injection, they were a significant sponsor of the Hack prompts competition.
One of the universities whose researchers contributed to the Prompt Report, underscoring the collaborative nature of the research.
One of the universities with researchers contributing to the Prompt Report, showcasing broad academic involvement.
Mentioned in the context of misinformation surrounding the Prompt Report, where an unrelated blog post attributed the paper's authorship incorrectly.
Published a paper on using a billion personas (roles) to generate synthetic data for fine-tuning models.
Contributed researchers to both the Prompt Report and Hack prompts initiatives, highlighting a long-standing relationship.
His 'Six Thinking Hats' approach is used as an analogy for multi-agent systems and Chain of Thought prompting, illustrating problem-solving from different angles.
A pioneer in adversarial AI from OpenAI, mentioned in relation to the HackaPrompt competition.
A professor at Maryland who worked with Sander Schulhoff on research related to Diplomacy, initially an NLP project.
Mentioned in relation to discussions about Diplomacy and its interesting story concerning prompting in agents.
Author of 'The Prompt Support', creator of LearnPrompting.org, and lead researcher on the 'Prompt Report' and 'Hack prompts' papers.
A student of Professor Jordan Boyd Graber who worked with Sander Schulhoff on the Diplomacy project and is now a postdoc at Princeton.
Author of the 'Tree of Thought' paper, discussed in the context of decomposition techniques in prompting.
More from Latent Space
View all 172 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free