What are the limitations of Vapy's voice AI for developers?

When Vapy was tested, it lacked visual feedback to confirm voice recognition or indicate when it was speaking. It also struggled with interruptions, failing to pause or acknowledge the user's attempt to interject, which breaks the illusion of a natural conversation.

How does Retail AI's voice agent handle unexpected conversational turns?

Retail AI's debt collection agent demonstrated an ability to adapt when the user revealed they were not Aaron but Steve. It correctly switched to addressing the user as Steve, showing it could learn from the conversation, though it then shut down rather than probing further.

What is the benefit of visual workflows for AI agents like those from GumLoop?

Visual workflows on a canvas, as seen with GumLoop, provide a clear overview of the steps an AI agent will take. This allows users to monitor, control, and customize the agent's autonomous decision-making process, which is crucial for overseeing complex tasks.

How does AnswerGrid build trust in the AI-generated data it provides?

AnswerGrid incorporates source citations directly within its structured outputs. By allowing users to click on each data point and see the original source, it enables real-time validation and helps users trust the information provided, combating potential AI hallucinations.

What are the design challenges for prompt-to-output AI interfaces like Polyat?

A key challenge is managing user engagement during potentially long generation times and ensuring the output matches the prompt. Giving feedback on what the AI understood or missed, and enabling iterative edits to refine prompts, are crucial for a good user experience.

How does Zuni's adaptive AI interface work for email?

Zuni analyzes incoming emails and dynamically suggests relevant response options, changing the available 'buttons' based on context. This aims to streamline the process, with a potential future where the best draft response is already pre-filled for user approval or modification.

What trade-off does Aril.ai make in its AI video generation?

Aril.ai trades video fidelity for immediacy by initially showing a blurry version of the generated video with accurate audio. This allows users to iterate on the script quickly, and only after confirmation does it proceed with the full, time-consuming video generation process.

Key Moments

AI Interfaces Of The Future | Design Review

Y Combinator

Science & Technology3 min read37 min video

Feb 27, 2025|168,631 views|4,150|100

YC Y Combinator

Save to Pod

Key Moments

TL;DR

AI interfaces are evolving beyond chat, featuring voice, agents, adaptive UIs, video generation, and visual workflows.

Key Insights

AI interfaces are shifting from static nouns (buttons, forms) to dynamic verbs (workflows, automation, suggestions).

Real-time feedback, latency visualization, and multimodal cues are crucial for natural voice AI interactions.

AI agents can autonomously perform tasks, requiring visual workflow tools like canvases for user oversight and control.

Prompt-based interfaces are improving with suggested prompts, multi-modal input, and iterative refinement capabilities.

Adaptive UIs dynamically change based on content, offering context-aware actions and shortcuts.

AI video generation balances fidelity and immediacy, using blurred previews and iterative generation to keep users engaged.

SHIFTING FROM NOUNS TO VERBS IN SOFTWARE DESIGN

Traditional software interfaces are built around static elements like text, forms, and buttons, which represent 'nouns.' However, the advent of AI introduces a new paradigm focused on dynamic actions and workflows, often referred to as 'verbs.' These include tasks like auto-completion, data gathering, and process automation. The challenge lies in developing intuitive ways to represent and interact with these 'verbs' visually on a screen, as current tools are not yet fully equipped to 'draw' these dynamic actions.

ENHANCING VOICE AI WITH MULTIMODAL FEEDBACK AND LATENCY INSIGHTS

The review of Vappy highlights the importance of multimodal cues in voice AI. Providing visual feedback when the microphone is active and when the AI is responding helps users understand the system's status, especially if audio is partially obscured. Displaying latency in milliseconds offers transparency and builds user intuition about conversational naturalness. The ability to handle interruptions and maintain a human-like conversational flow is also critical for adoption.

AI AGENTS AND VISUAL WORKFLOWS FOR AUTONOMOUS TASKS

AI agents offer autonomous capabilities to interact with websites and perform complex tasks. Tools like GumLoop utilize a canvas-based interface, resembling a modern flowchart, to visualize these multi-step processes. This visual representation, with color-coded nodes for different actions, allows users to understand, control, and monitor the agent's execution, especially for non-linear decision trees, making complex automation more manageable.

IMPROVING PROMPT-BASED INTERFACES WITH INTERACTIVITY AND FEEDBACK

Platforms like AnswerGrid and Polyat demonstrate improvements in prompt-based interfaces. AnswerGrid uses suggested prompts as clickable buttons to ease user input and allows for adding data columns dynamically, turning a simple query into a structured output. Polyat offers multimodal input (voice, image) and features iterative refinement for design changes, aiming to provide feedback on how well AI understood prompts and to facilitate incremental updates, reducing the need for full regeneration.

ADAPTIVE INTERFACES THAT DYNAMICALLY CHANGE CONTEXTUALLY

Adaptive AI interfaces modify their layout and options based on the user's current context, such as the content of an email. Zuni, an email app, suggests contextually relevant responses as shortcuts, adapting the available actions to the specific email. This approach moves away from static, button-heavy interfaces toward dynamic UIs that present only the most relevant tools, improving efficiency and reducing cognitive load.

AI VIDEO GENERATION BALANCING FIDELITY AND IMMEDIATE FEEDBACK

Argil.ai showcases AI video creation with deepfake technology. To manage user expectations and facilitate iteration, the platform initially provides a blurry preview with synchronized audio. Only after user confirmation does it initiate the full, time-consuming video generation process. This 'fidelity vs. immediacy' trade-off allows for quicker feedback loops, enabling users to iterate on scripts and prompts efficiently before committing to the final lengthy rendering.

THE FUTURE OF SOFTWARE REIMAGINED THROUGH AI-NATIVE DESIGN

The current landscape of AI interfaces represents a foundational shift, akin to the emergence of touch interfaces years ago. We are moving beyond simple chat interfaces to AI-native components across various modalities, including voice, video, and autonomous agents. This transformation necessitates reimagining existing software components and exploring new interaction models that keep users in control while harnessing the power of AI for complex tasks and creative outputs.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●People Referenced

Common Questions

Traditional interfaces primarily use 'nouns' like text fields and buttons. AI interfaces are shifting towards 'verbs', focusing on workflows, auto-completion, and suggesting actions, which requires new tools to represent these dynamic processes on screen.

Topics

AI Interfaces Deepfake Technology Adaptive UI User Experience Design LLM Applications

Mentioned in this video

Software & Apps

Vapy

A voice AI tool for developers to build, test, and deploy voice agents quickly. Demonstrated its capabilities and limitations regarding feedback and interruption handling.

AnswerGrid

An AI tool for generating answers at scale, featuring a user-friendly prompt interface with suggested examples and the ability to add custom data columns to structured outputs. Highlighted its source citation feature for trust.

GumLoop

An AI automation tool that visualizes agent workflows on a canvas, allowing users to design and monitor AI processes. Emphasized its potential as a standard interface for controlling AI agents.

Polyat

An AI product designer that allows users to create and iterate on designs using prompts, generate production-ready code, and make iterative changes. Showcased its design capabilities and the challenge of user feedback.

Companies

Aril.ai

Offers an AI video studio that creates production-quality videos using deepfake technology. Demonstrated the process of creating custom scripts, changing camera angles, and managing the trade-off between video fidelity and generation time.

Retail AI

A company offering voice AI to supercharge call operations, demonstrated with a debt collection scenario. Highlighted its ability to adapt to conversational changes and the importance of latency.