Prompt Engineering Workshop: Universal-3 Pro

AssemblyAIAssemblyAI
Science & Technology3 min read61 min video
Feb 19, 2026|512 views|19
Save to Pod

Key Moments

TL;DR

Universal 3 Pro demo: promptable STT, multilingual, tagging, streaming.

Key Insights

1

Promptable transcription lets you steer transcripts with natural language prompts, enabling more control over style, clarity, and content.

2

Out-of-the-box Universal 3 Pro surpasses Universal 2 in accuracy; applying prompts further improves context, disfluencies, and meaning.

3

Explicit prompts for disfluencies, hesitations, repetitions, stutters, and colloquialisms significantly shape how the transcript reflects natural speech.

4

Code-switching and language preservation are supported by targeted prompts; six native languages plus API access to 99 languages, with Universal 4 on the roadmap.

5

Audio tagging, PII redaction, and diarization are actionable prompts; streaming speaker labeling is experimental but promising.

6

Practical workflows include evaluating prompts on your data, balancing domain-specific vs. generic prompts, and understanding pricing (prompting adds a small per-hour cost).

INTRODUCTION AND CONTEXT

The session opens with introductions from Ryan, who leads Assembly AI's customer-facing teams, and colleagues Zach and Griffin from the applied AI engineering team. They frame Universal 3 Pro as a promptable speech-to-text model that can customize transcripts via natural language prompts. A live comparison tool is introduced to pit Universal 2 against Universal 3 Pro, using a GitLab meeting transcription as a baseline and then applying prompts to demonstrate improvements. The group emphasizes hands-on demos, live debugging, and leaving ample time for Q&A, with the event being recorded for later sharing.

BASELINE VS PROMPTING: SETTING THE SCENE

The left side of the demo shows Universal 3 Pro in its baseline form, while the right side introduces prompts to steer transcription. Early focus centers on preserving linguistic patterns, with a key realization: a generic instruction like disfluencies is too vague. Through iterative prompting, they refine to specify disfluencies as filler words, hesitations, repetitions, stutters, false starts, and colloquialisms. The result is a clearer, more context-aware transcript that better captures natural speech while maintaining flexibility to be more or less literal depending on needs.

PROMPTING MECHANICS: HOW TO INSTRUCT THE MODEL

The presenters walk through constructing prompts, starting with mandatory instructions to preserve linguistic patterns and then adding an always-go-with-your-best-guess rule based on context. They show how stronger, authoritative prompts push the model to infer missing words rather than skip uncertain segments. The discussion also covers risks like overfitting prompts to a single file and the value of testing prompts against diverse datasets. Live commentary highlights how the model interprets speech patterns such as ums and uhhs as discourse signals rather than noise.

MULTI-LANGUAGE CODE-SWITCHING AND PRESERVING LANGUAGES

A Miami corpus example demonstrates code-switching with Spanglish and the importance of preserving original languages and scripts. The team explains language support: six native languages (English, Spanish, French, German, Italian, Portuguese) plus API access to 99 languages; Universal 4 is in development for broader coverage. A key takeaway is instructing the model to keep code-switching intact rather than translating, which yields transcripts that reflect actual speaker behavior and multilingual contexts more faithfully.

AUDIO TAGGING, PII REDACTION, AND DIARIZATION

The workflow introduces audio tagging for non-speech events (coughs, laughter, noise, silence, unclear portions) as an experimental feature. A critical decision point is choosing between ‘unclear’ and ‘mask’ tagging, which interacts with organizational style guides and potential profanity handling. PII redaction remains available, with prompts guiding privacy considerations. Speaker diarization is described as experimental, with streaming speaker labeling on the roadmap. The session hints at future integration that fuses model-based speaker tags with native diarization for more robust separation across chunks.

PRACTICAL USES, EVALUATION, AND PRICING

The panel discusses evaluating prompts on your own data, using domain-specific prompts (medical, legal, finance) or more generic prompts to adapt to unknown contexts. They distinguish between key terms prompting and open-field prompting: the features are mutually exclusive at the parameter level but can be combined by embedding key terms within a broader open prompt. A live note on streaming and pricing clarifies that prompting adds about five cents per hour on top of the base rate. The takeaway is to experiment, document results, and consult documentation for ongoing updates.

Prompting Do's and Don'ts for Universal 3 Pro

Practical takeaways from this episode

Do This

Be explicit and authoritative in prompts (e.g., 'always transcribe with your best guess', 'prioritize medications').
Use code-switching/preserve original languages; avoid translating mixed-language segments unless needed.
Enable and leverage audio tagging (laughter, coughs, silence) to enrich transcripts.
Consider PII redaction and 'unclear' outputs to handle sensitive data in transcripts.
Iterate prompts with the Prompt Repair Wizard and run small-scale evals before scaling.

Avoid This

Don’t overfit prompts to a single file; risk poor generalization across datasets.
Don’t rely solely on model judgments; validate with human review when necessary.
Avoid vague or soft prompts; use clear, task-relevant commands to improve accuracy.

Pricing: Universal 3 Pro with and without prompting

Data extracted from this episode

ModelBase price per hourPrompting price per hour
Universal 3 Pro21c26c

Common Questions

Universal 3 Pro is a promptable speech-to-text model that can be customized with natural-language prompts to influence transcription output. The session compared baseline Universal 2 with Universal 3 Pro and demonstrated prompt-based improvements starting around 119 seconds.

Topics

Mentioned in this video

More from AssemblyAI

View all 14 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free