Why is security often the last consideration in AI projects?

Historically, platforms are developed quickly and security assessments happen late in the process. The talk emphasizes that this is not ideal and argues for building secure systems from the start by integrating testing into development workflows and IDEs.

What is the 'lethal trifecta' in agent security?

The lethal trifecta describes three risk factors: untrusted input, access to sensitive information, and an outbound communication channel for exfiltration. If an agent can access all three, it is fundamentally insecure, so teams should design to disable or constrain at least one of the three.

What is Prompu and how is it used?

Prompu is an open-source tool used to evaluate and security-test GenAI agents. It emphasizes developer-focused integration (CLI, CI/CD, PR feedback) to help catch security issues early in the development cycle.

What makes jailbreaks in AI models interesting to researchers?

Jailbreaks are varied and creative, ranging from playful prompts to social-engineering styles, and researchers track how informal inputs may bypass guardrails. The discussion highlights that jailbreaks evolve and that informal prompts can lower defenses.

How does Prompu scale red-teaming testing?

Prompu automates red-teaming by running thousands of conversations to test for data leaks, access controls, and other security issues, vastly increasing breadth and speed compared to manual testing.

What is the background of Ian and how did Prompu come about?

Ian previously led the developer platform at Discord and shifted to AI-focused security work as AI grew popular. He describes the early days of agent testing, the open-source origins of Prompu, and how the product evolved to address enterprise security concerns.

Key Moments

Why Social Engineering Now Works on Machines

a16z

Gaming3 min read29 min video

Dec 2, 2025|369 views|13|1

Save to Pod

Key Moments

TL;DR

Agents reshape security: lethal trifecta, social engineering, and dev-first testing.

Key Insights

Security must move left: test and secure agents during development, not after production.

The lethal trifecta: untrusted input, sensitive data, and outbound exfil are the core risk.

Automated red-teaming via conversational prompts enables scalable detection of weaknesses.

Jailbreaks and jailbreak-like prompts require safer prompts and data controls; no silver bullet.

Deployment varies: MCP adoption is evolving; CI/CD and IDE integration are essential.

THE RISE OF AGENTS AND SECURITY AS A FIRST-CLASS CONCERN

Agents are the next frontier in enterprise AI security. The conversation frames 2026 as the year of the agent, with large corporations moving from internal chatbots to agents that can operate across Salesforce and other critical systems. A repeating theme is that security has historically been treated as an afterthought amid rapid deployment. Prompu, which began as an open-source testing tool, targets developers—providing quick feedback, CI/CD integration, and IDE support so that secure agents can be built and tested before production.

THE LETHAL TRIFECTA: UNTRUSTED INPUT, SENSITIVE DATA, OUTBOUND EXFIL

The lethal trifecta is a mental model for agent risk: untrusted input, access to sensitive data, and an outbound channel to exfiltrate results. If an agent can take in outside data, access PII, and communicate externally, it becomes inherently insecure. The rule of two variants—often attributed to Meta or Simon Wilson— emphasizes that any two of these conditions are not enough to guarantee safety; you must design controls to constrain or separate the three. Real-world data leakage often comes via subtle paths: web scraping, document uploads, or UI rendering that leaks data through images or markdown.

SOCIAL ENGINEERING FOR MACHINES: CONVERSATIONAL ATTACKS AND AUTOMATED RED TEAMS

A core insight is that testing and attacking AI agents now resemble social engineering more than traditional vulnerability checks. Prompu uses adversarial objectives that unfold as conversations with the model, simulating a red team at scale—thousands of dialogues that probe guardrails, data access, and policy boundaries. Rather than signatures, the attacks are contextual and dynamic, tailored to business context and user roles. This scale-first approach makes it possible to reveal relentlessly hidden weaknesses quickly, critiquing both data handling and decision paths the agent might take in production.

JAILBREAKS, INJECTIONS, AND THE ETERNAL QUEST FOR SAFE PROMPTS

Jailbreaks and injections are the familiar faces of the space, but the danger is evolving beyond signatures. A jailbreak is not a single trick; it’s a creative prompt that reshapes how an agent views permission and context. Prompu researchers track a wide range of jailbreaks—from emotionally framed prompts to quirky dialects—that can peel back guardrails. The takeaway is not a silver bullet, but a reminder: safety must be baked into the model’s reasoning and the surrounding data controls, not tacked on after an incident.

ARCHITECTURE, TOOLS, AND DEPLOYMENT PATTERNS IN AGENT SECURITY

On the deployment side, there’s a spectrum from grassroots use on local machines to enterprise-grade frameworks. Many teams build agentic systems without embracing a formal MCP (model control plane) early, while others prototype in MCPs before production. Prompu’s approach centers on developer ergonomics—CLI, CI/CD integration, code analysis, and even IDE prompts—so testing becomes a natural part of code reviews. This evolution mirrors security patterns from other technologies: shift left, automate, and integrate governance into the everyday tooling developers actually use.

BACKGROUND, LESSONS, AND A VISION FOR THE FUTURE

Ian Webster’s path from building on the Discord platform to founding Prompu shapes the current narrative: agents will redefine how organizations test, secure, and deploy AI. He describes 2026 as a pivotal year for agent adoption and security maturation, with enterprise customers planning to wire agents into critical systems. The message is pragmatic: embrace safety early, implement automated adversarial testing, and leverage open-source tools to speed learning. For teams watching the space, Prompu offers practical entry points and a reality check on how fast this field is moving.

Mentioned in This Episode

●Software & Apps

●Tools

●People Referenced

Agent Security Quick Start: Do's & Don'ts

Practical takeaways from this episode

Do This

Integrate agent security checks into CI/CD early in the development cycle.

Use automated red-teaming with NL-based prompts to uncover data leakage and access-control issues.

Limit exfiltration channels and carefully protect PII and sensitive data.

Embed security feedback into PRs and IDE workflows to normalize secure agent development.

Avoid This

Don't treat security as an afterthought; avoid shipping agents without testing.

Don't rely solely on jailbreak techniques; consider broader vectors like data leakage and access control.

Common Questions

An agent is an LLM that is allowed to take actions and interact with external systems. It can be wired to APIs, data sources, and other tools, enabling it to operate in real-world scenarios. The discussion centers on testing and securing these agents before they reach production.