Why Social Engineering Now Works on Machines

a16za16z
Gaming3 min read29 min video
Dec 2, 2025|354 views|13|1
Save to Pod

Key Moments

TL;DR

Agents reshape security: lethal trifecta, social engineering, and dev-first testing.

Key Insights

1

Security must move left: test and secure agents during development, not after production.

2

The lethal trifecta: untrusted input, sensitive data, and outbound exfil are the core risk.

3

Automated red-teaming via conversational prompts enables scalable detection of weaknesses.

4

Jailbreaks and jailbreak-like prompts require safer prompts and data controls; no silver bullet.

5

Deployment varies: MCP adoption is evolving; CI/CD and IDE integration are essential.

THE RISE OF AGENTS AND SECURITY AS A FIRST-CLASS CONCERN

Agents are the next frontier in enterprise AI security. The conversation frames 2026 as the year of the agent, with large corporations moving from internal chatbots to agents that can operate across Salesforce and other critical systems. A repeating theme is that security has historically been treated as an afterthought amid rapid deployment. Prompu, which began as an open-source testing tool, targets developers—providing quick feedback, CI/CD integration, and IDE support so that secure agents can be built and tested before production.

THE LETHAL TRIFECTA: UNTRUSTED INPUT, SENSITIVE DATA, OUTBOUND EXFIL

The lethal trifecta is a mental model for agent risk: untrusted input, access to sensitive data, and an outbound channel to exfiltrate results. If an agent can take in outside data, access PII, and communicate externally, it becomes inherently insecure. The rule of two variants—often attributed to Meta or Simon Wilson— emphasizes that any two of these conditions are not enough to guarantee safety; you must design controls to constrain or separate the three. Real-world data leakage often comes via subtle paths: web scraping, document uploads, or UI rendering that leaks data through images or markdown.

SOCIAL ENGINEERING FOR MACHINES: CONVERSATIONAL ATTACKS AND AUTOMATED RED TEAMS

A core insight is that testing and attacking AI agents now resemble social engineering more than traditional vulnerability checks. Prompu uses adversarial objectives that unfold as conversations with the model, simulating a red team at scale—thousands of dialogues that probe guardrails, data access, and policy boundaries. Rather than signatures, the attacks are contextual and dynamic, tailored to business context and user roles. This scale-first approach makes it possible to reveal relentlessly hidden weaknesses quickly, critiquing both data handling and decision paths the agent might take in production.

JAILBREAKS, INJECTIONS, AND THE ETERNAL QUEST FOR SAFE PROMPTS

Jailbreaks and injections are the familiar faces of the space, but the danger is evolving beyond signatures. A jailbreak is not a single trick; it’s a creative prompt that reshapes how an agent views permission and context. Prompu researchers track a wide range of jailbreaks—from emotionally framed prompts to quirky dialects—that can peel back guardrails. The takeaway is not a silver bullet, but a reminder: safety must be baked into the model’s reasoning and the surrounding data controls, not tacked on after an incident.

ARCHITECTURE, TOOLS, AND DEPLOYMENT PATTERNS IN AGENT SECURITY

On the deployment side, there’s a spectrum from grassroots use on local machines to enterprise-grade frameworks. Many teams build agentic systems without embracing a formal MCP (model control plane) early, while others prototype in MCPs before production. Prompu’s approach centers on developer ergonomics—CLI, CI/CD integration, code analysis, and even IDE prompts—so testing becomes a natural part of code reviews. This evolution mirrors security patterns from other technologies: shift left, automate, and integrate governance into the everyday tooling developers actually use.

BACKGROUND, LESSONS, AND A VISION FOR THE FUTURE

Ian Webster’s path from building on the Discord platform to founding Prompu shapes the current narrative: agents will redefine how organizations test, secure, and deploy AI. He describes 2026 as a pivotal year for agent adoption and security maturation, with enterprise customers planning to wire agents into critical systems. The message is pragmatic: embrace safety early, implement automated adversarial testing, and leverage open-source tools to speed learning. For teams watching the space, Prompu offers practical entry points and a reality check on how fast this field is moving.

Agent Security Quick Start: Do's & Don'ts

Practical takeaways from this episode

Do This

Integrate agent security checks into CI/CD early in the development cycle.
Use automated red-teaming with NL-based prompts to uncover data leakage and access-control issues.
Limit exfiltration channels and carefully protect PII and sensitive data.
Embed security feedback into PRs and IDE workflows to normalize secure agent development.

Avoid This

Don't treat security as an afterthought; avoid shipping agents without testing.
Don't rely solely on jailbreak techniques; consider broader vectors like data leakage and access control.

Common Questions

An agent is an LLM that is allowed to take actions and interact with external systems. It can be wired to APIs, data sources, and other tools, enabling it to operate in real-world scenarios. The discussion centers on testing and securing these agents before they reach production.

Topics

Mentioned in this video

More from a16z Deep Dives

View all 38 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free