Why Social Engineering Now Works on Machines
Key Moments
Agents reshape security: lethal trifecta, social engineering, and dev-first testing.
Key Insights
Security must move left: test and secure agents during development, not after production.
The lethal trifecta: untrusted input, sensitive data, and outbound exfil are the core risk.
Automated red-teaming via conversational prompts enables scalable detection of weaknesses.
Jailbreaks and jailbreak-like prompts require safer prompts and data controls; no silver bullet.
Deployment varies: MCP adoption is evolving; CI/CD and IDE integration are essential.
THE RISE OF AGENTS AND SECURITY AS A FIRST-CLASS CONCERN
Agents are the next frontier in enterprise AI security. The conversation frames 2026 as the year of the agent, with large corporations moving from internal chatbots to agents that can operate across Salesforce and other critical systems. A repeating theme is that security has historically been treated as an afterthought amid rapid deployment. Prompu, which began as an open-source testing tool, targets developers—providing quick feedback, CI/CD integration, and IDE support so that secure agents can be built and tested before production.
THE LETHAL TRIFECTA: UNTRUSTED INPUT, SENSITIVE DATA, OUTBOUND EXFIL
The lethal trifecta is a mental model for agent risk: untrusted input, access to sensitive data, and an outbound channel to exfiltrate results. If an agent can take in outside data, access PII, and communicate externally, it becomes inherently insecure. The rule of two variants—often attributed to Meta or Simon Wilson— emphasizes that any two of these conditions are not enough to guarantee safety; you must design controls to constrain or separate the three. Real-world data leakage often comes via subtle paths: web scraping, document uploads, or UI rendering that leaks data through images or markdown.
SOCIAL ENGINEERING FOR MACHINES: CONVERSATIONAL ATTACKS AND AUTOMATED RED TEAMS
A core insight is that testing and attacking AI agents now resemble social engineering more than traditional vulnerability checks. Prompu uses adversarial objectives that unfold as conversations with the model, simulating a red team at scale—thousands of dialogues that probe guardrails, data access, and policy boundaries. Rather than signatures, the attacks are contextual and dynamic, tailored to business context and user roles. This scale-first approach makes it possible to reveal relentlessly hidden weaknesses quickly, critiquing both data handling and decision paths the agent might take in production.
JAILBREAKS, INJECTIONS, AND THE ETERNAL QUEST FOR SAFE PROMPTS
Jailbreaks and injections are the familiar faces of the space, but the danger is evolving beyond signatures. A jailbreak is not a single trick; it’s a creative prompt that reshapes how an agent views permission and context. Prompu researchers track a wide range of jailbreaks—from emotionally framed prompts to quirky dialects—that can peel back guardrails. The takeaway is not a silver bullet, but a reminder: safety must be baked into the model’s reasoning and the surrounding data controls, not tacked on after an incident.
ARCHITECTURE, TOOLS, AND DEPLOYMENT PATTERNS IN AGENT SECURITY
On the deployment side, there’s a spectrum from grassroots use on local machines to enterprise-grade frameworks. Many teams build agentic systems without embracing a formal MCP (model control plane) early, while others prototype in MCPs before production. Prompu’s approach centers on developer ergonomics—CLI, CI/CD integration, code analysis, and even IDE prompts—so testing becomes a natural part of code reviews. This evolution mirrors security patterns from other technologies: shift left, automate, and integrate governance into the everyday tooling developers actually use.
BACKGROUND, LESSONS, AND A VISION FOR THE FUTURE
Ian Webster’s path from building on the Discord platform to founding Prompu shapes the current narrative: agents will redefine how organizations test, secure, and deploy AI. He describes 2026 as a pivotal year for agent adoption and security maturation, with enterprise customers planning to wire agents into critical systems. The message is pragmatic: embrace safety early, implement automated adversarial testing, and leverage open-source tools to speed learning. For teams watching the space, Prompu offers practical entry points and a reality check on how fast this field is moving.
Mentioned in This Episode
●Software & Apps
●Tools
●People Referenced
Agent Security Quick Start: Do's & Don'ts
Practical takeaways from this episode
Do This
Avoid This
Common Questions
An agent is an LLM that is allowed to take actions and interact with external systems. It can be wired to APIs, data sources, and other tools, enabling it to operate in real-world scenarios. The discussion centers on testing and securing these agents before they reach production.
Topics
Mentioned in this video
Founder and CEO of Prompt Fu; expert in agent security testing
Open-source tool for evaluating and testing GenAI security; developer-friendly CLI; integrates into CI/CD
Person who coined the term 'lethal trifecta' in agent security discussions
Framework used for building agent implementations
Framework used for building agent implementations
Early system-administrator security testing tool
Co-creator of the Satan tool; security researcher
Co-creator associated with Satan; security researcher
More from a16z Deep Dives
View all 38 summaries
72 minAI Copilots Are a Dead End. Here's What Actually Works | Kavak CEO
63 minTemporal CEO on AI Agents & The Future of Software | Deep Dives with a16z
47 minBraintrust CEO on Where Engineering Actually Matters in AI
20 minHow Palantir Scaled: Why the Best Software Is Built Backwards
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free