What are the technical requirements for developing a voice agent?

You need an AssemblyAI API key. For development, use Claude Code with either the Claude app or VS Code plugin. A Python backend and HTML frontend are used for authentication and UI representation.

Can the Voice Agent API handle multiple languages?

Yes, the Voice Agent API supports English, Spanish, French, German, Italian, and Portuguese, which aligns with the Universal Speech Model 3 Pro. Expansion of language support is planned.

How can I integrate custom terms or jargon into the voice agent?

You can use the 'key terms' feature within the Voice Agent API. This biases the speech-to-text model to recognize specific, potentially uncommon words relevant to your business.

What is the best way to ensure a voice agent is accurate and reliable?

For complex actions like booking an appointment after checking availability, use 'progressive tool reveal.' This involves updating the agent's configuration to reveal tools sequentially, reducing hallucinations and improving accuracy.

How can I reduce latency in my voice agent's responses?

You can adjust the 'turn detection' settings, specifically the minimum and maximum silence duration. Reducing these parameters can decrease response time, but be mindful of potentially ending turns too early.

How can I deploy and share my created voice agent?

Platforms like Railway allow for easy deployment. After preparing your project in a GitHub repository, Railway can deploy it with a single click, generating a shareable URL for your agent.

Can the voice agent provide a summary of a call after it's finished?

The Voice Agent API does not currently support built-in call summarization. However, you can use AssemblyAI's LLM Gateway product to store transcripts and generate summaries post-call.

Key Moments

Build a Voice Agent in an Hour with Claude Code | AssemblyAI Workshop

AssemblyAI

Science & Technology5 min read60 min video

Jun 10, 2026|141 views|8

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

Build a fully functional voice agent in under an hour using a single API and LLM, no complex coding required. The catch? You still need to feed it specific business context to make it truly useful.

Key Insights

A complete appointment booking voice agent, including a front-end UI, tool calling, and deployment, can be built in approximately one hour using the AssemblyAI Voice Agent API and Claude Code without pre-written code or custom tools.

The AssemblyAI Voice Agent API offers a vertically integrated solution, handling speech-to-text, LLM orchestration, and text-to-speech (TTS), enabling a simpler developer experience and competitive pricing at $0.45/hour.

Claude Code, when provided with the AssemblyAI Voice Agent API documentation URL, can autonomously scaffold a Python backend and HTML frontend, including secure token generation for authentication, demonstrating the power of coding agents for API integration.

Key terms can be provided to the voice agent API to bias the speech-to-text model towards recognizing specific, uncommon words or product names not found in standard dictionaries, improving accuracy for business-specific vocabulary.

To manage latency in voice agents configured with multiple APIs, simplifying tool execution and minimizing the number of tools used is more effective than solely optimizing individual API response times.

AssemblyAI plans to introduce session history in their dashboard, allowing users to review call audio, tool call events, and transcripts, which can be leveraged for performance scoring and post-processing with tools like their LLM Gateway.

Rapid voice agent development with AssemblyAI and Claude Code

This workshop demonstrates how to build a fully functional appointment booking voice agent in approximately one hour, utilizing the AssemblyAI Voice Agent API and Claude Code. The process requires no pre-written code, custom tools, or complex server setups, relying solely on the API's documentation URL and a few prompts to Claude. By the end, participants can have a working voice agent with a frontend UI, tool-calling capabilities, and a shareable deployment. The presenter, Dan Ince, a Product Manager at AssemblyAI, highlights that the voice agent API is designed to integrate directly into an application, allowing developers to build unique voice agent experiences tailored for their specific business needs, rather than embedding within existing systems.

The AssemblyAI Voice Agent API: An integrated and affordable solution

AssemblyAI positions itself as a speech model company, offering APIs for building products like meeting recorders and call center transcription tools. The Voice Agent API is presented as a unified solution that handles the entire pipeline: speech-to-text, Large Language Model (LLM) orchestration, and text-to-speech (TTS). This vertical integration allows AssemblyAI to offer competitive pricing at $0.45 per hour, significantly cheaper than many market alternatives. The API focuses on simplicity and developer-friendliness, aiming to bridge the gap for building voice agents in a more intuitive, application-centric manner, akin to the ease of use found with database platforms like Supabase.

Leveraging Claude Code for scaffolded backend and frontend development

A key aspect of the workshop is the use of Claude Code, an AI coding assistant, to rapidly scaffold the necessary backend and frontend components. By providing Claude Code with the URL to the AssemblyAI Voice Agent API documentation, it can autonomously crawl the docs, understand the API’s functionality, and generate a Python backend with a temporary token generation for secure frontend authentication, and an HTML frontend. This process eliminates the need for developers to manually write boilerplate code or delve deeply into API specifics, showcasing the effectiveness of coding agents in accelerating development and abstracting away complexity, particularly for tasks like handling audio transport over WebSockets.

Integrating tool calling and customizing the user interface

Once the basic application structure is in place, the next step involves teaching Claude Code to add tool-calling capabilities to the agent. For the appointment booking use case, this means defining a tool to create an appointment. While the workshop simulates tool results due to time constraints, it emphasizes that the API supports real-world integrations with services like Calendly or internal CRMs. Developers can implement the handling of tool call events in their code to execute actual business logic. Furthermore, the UI can be quickly customized. By prompting Claude with a simple request, such as 'Make this a car mechanic booking agent UI,' the agent's appearance and persona can be transformed, turning a generic template into a specific application like Apex Auto for MOT bookings.

Enhancing voice agent accuracy and user experience

Several features contribute to the agent's effectiveness and user experience. 'Key terms' can be provided to the voice agent API to improve the accuracy of the speech-to-text model for uncommon or business-specific words. The API supports multiple languages, including English, Spanish, French, German, Italian, and Portuguese, with plans to expand language support and develop more localized voice accents. To address potential latency issues, especially with multiple API calls, the advice is to simplify the tool execution workflow rather than solely optimizing individual API speeds. For speech input quality, enabling acoustic echo cancellation is recommended when not using headphones to prevent audio feedback loops.

Progressive tool reveal and deployment through Railway

A design pattern called 'progressive tool reveal' is introduced to manage complex multi-step processes, such as checking availability before booking. This involves dynamically updating the agent's configuration to reveal tools sequentially, reducing hallucinations and increasing accuracy by focusing the agent on a single, relevant tool at each stage. For deployment, the workshop suggests using Railway, a platform that simplifies the process. By creating a GitHub repository for the developed agent and connecting it to Railway, users can achieve a one-click deployment, resulting in a shareable, continuously running web application, suitable for personal use or even starting a business.

Future developments and performance measurement

AssemblyAI is continuously evolving its offerings. While TTS is currently integrated within the voice agent API, it is on the roadmap for public release. The company plans to add session history to the dashboard, allowing users to review call recordings, transcripts, and tool call events, which aids in performance analysis and scoring. For post-processing needs like call summarization, the LLM Gateway product is recommended. While direct support for call summarization within the voice agent API isn't immediate, it's a target for future development. Users can also leverage custom prompts and existing human-to-human call transcripts to tune their agents for optimal performance, reflecting the growing trend of AI-driven voice interactions across various platforms.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

Building a Voice Agent with Claude Code: Quick Guide

Practical takeaways from this episode

Do This

Use Claude Code (app or VS Code plugin) for development.

Provide API documentation links to Claude for context.

Develop a Python backend for authentication and an HTML frontend.

Use temporary tokens for secure API key handling in the frontend.

Prompt Claude to add tools and customize the agent's function.

Leverage Claude's ability to search and scrape documentation.

Consider 'progressive tool reveal' for multi-step actions.

Adjust turn detection settings (min/max silence) for latency control.

Enable acoustic echo cancellation if not using headphones.

Deploy your agent using platforms like Railway.

Use LLM Gateway for post-processing tasks like call summaries.

Avoid This

Do not place your API key directly in the frontend code.

Avoid overly specific prompt engineering; guide Claude instead.

Do not attempt to manually code complex audio transport over websockets.

Be cautious with aggressive latency reduction, as it may reduce accuracy.

Do not rely solely on default settings for all use cases; customize as needed.

Common Questions

You'll need an AssemblyAI account for an API key and use a tool like Claude Code. Provide Claude with the Voice Agent API documentation links and instructions for the agent you want to build.

Topics

AI & Machine Learning Technology & Innovation Programming & Software Voice Agents Coding Agents AI Development Developer Tools API Integration Speech Technology Application Deployment

Mentioned in this video

Software & Apps

Claude Code

A tool or interface used with Claude for coding and development, utilized to scaffold backend and frontend applications. The speaker uses it by providing documentation links and instructions.

Voice Agent API

AssemblyAI's API for building voice agents, integrating speech-to-text, LLM, and text-to-speech capabilities. It's designed to be developer-friendly and embeddable into applications.

Cal.com

A scheduling platform mentioned as an example of a service that can be integrated with the voice agent API for booking appointments.

Superbase

Mentioned as an example of a user-friendly developer tool, drawing a parallel to the desired ease of use for the Voice Agent API.

Python

The backend language chosen for building the appointment setting voice agent, used in conjunction with Claude Code.

HTML

The frontend language used to build the user interface for the voice agent application.

VS Code

The Integrated Development Environment (IDE) used by the speaker to set up the project and write code, specifically with the Claude Code plugin.

Universal Speech Model 3 Pro

AssemblyAI's speech-to-text model that supports multiple languages and is used within the Voice Agent API.

Claude

An AI model used extensively in the workshop for coding assistance, generating code, searching documentation, and iterating on the voice agent.

LLM Gateway

A product from AssemblyAI that can be used for post-processing tasks like call summarization. It can be integrated with the voice agent by prompting Claude to use it.

Siri

Apple's virtual assistant, used as an example of an intuitive and interactive voice agent that can perform actions beyond just speaking.

Companies

AssemblyAI

A speech model company that provides APIs for building transcription models, voice agents, and other speech-related technologies. They offer a full pipeline including speech-to-text, LLM, and text-to-speech.

Aura Studio

Mentioned as the name of the studio for the appointment setting agent demo. The agent's name is David.

Twilio

A platform mentioned in the context of connecting the voice agent API to a phone call process, suggesting integration possibilities for telephony.

GitHub

A platform used for version control and hosting code repositories. It's required for deploying the agent to Railway.

Apex Auto

The name of the fictional company the voice agent is programmed to represent in the MOT booking demo.

Products

railway

A platform used for deploying applications. The workshop demonstrares deploying the built voice agent to Railway, highlighting its ease of use and free tier.

Tons app

Another name for the Portoola app, recommended as a great example of a UI-driven voice agent experience.

Media

Portoola

A companion voice agent friend app mentioned as a prime example of a well-designed, UI-driven voice agent experience.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free