Universal-3 Pro Technical Overview
Key Moments
Universal 3 Pro enables prompt-driven, customized transcripts with improved accuracy.
Key Insights
Prompting shapes transcript output, affecting style, formatting, context clues, entity accuracy, speaker attribution, and audio-event tagging.
Universal 3 Pro outperforms the prior model (Universal 2) even before applying prompts, with clearer corrections and improved meaning.
Verbatim vs. standard prompts produce markedly different results, adding hesitations and fillers when requested.
API support lets you specify a prompt alongside the Universal 3 Pro model to tailor transcripts for specific use cases.
Practical demonstrations show real-world differences on a sample file, illustrating how prompts influence accuracy and readability.
INTRODUCTION AND CONTEXT
Today’s overview starts with Ryan from Assembly AI introducing Universal 3 Pro, the latest in their speech-to-text lineup. This model is notable because it accepts a text prompt alongside an audio file, enabling customized transcripts tuned to a user’s particular use case and customers. Alongside the announcement, Assembly AI points to a prompt engineering guide that explains how prompts can shape output, including style, formatting, context cues, and speaker or event tagging. The demonstration uses a GitLab SEC growth data science staff meeting as the sample, paired with a comparison app that pits Universal 2 against Universal 3 Pro before prompting.
NEW FEATURES AND CAPABILITIES OF UNIVERSAL-3 PRO
Universal 3 Pro extends the transcription task beyond words to produce outputs tailored by user prompts. The model supports a range of capabilities described in the guide: increasing or reducing disfluencies, altering style and formatting, adding context-aware clues, improving entity accuracy, and providing speaker attribution and audio-event tags. There is also mention of model code-switching, enabling cross-linguistic or domain-specific phrasing. The team emphasizes that many capabilities remain under documentation and discovery, suggesting a living feature set. The core idea is to give customers control over the transcript’s tone, structure, and annotative details to fit their workflows.
PROMPTING CAPABILITIES EXPLORED
To illustrate how prompting changes results, the team runs side-by-side comparisons using a single file. They show UniversaI 2 on the left and Universal 3 Pro on the right, initially without prompts to establish a baseline. Early observations highlight corrections to broken words, capitalization of proper nouns, and a clearer rendering of meaning—such as re-phrasing a question about arrival time. The takeaway is that prompts can nudge the model toward more accurate, readable transcripts from the very first pass, even before applying any customized instructions.
OUT-OF-THE-BOX PERFORMANCE VS PROMPTED
With the baseline established, the team tests a simple prompt and compares it to no prompt. The differences are subtle but noticeable: the prompt can improve sequence of words, punctuation hints, and the handling of ambiguous phrases. They also point to the possibility of using more verbose prompts to drive specific behavior. The demonstration suggests that even lightweight prompts yield tangible gains in readability and correctness, indicating that prompt design matters as much as model choice.
VERBATIM PROMPTS AND DETAIL EXTRACTION
Next, they turn to a verbatim-style prompt to capture hesitations and fillers. By selecting a verbatim option, transcripts show more ums and hesitations, with the model displaying the stumbles in real time. The visualized transcript confirms that a single prompt choice can reframe how the audio is translated—shifting from a cleaner, summarized render to a more detailed, verbatim record. This capability is particularly relevant for meeting minutes, legal deposits, and research notes where exact phrasing and pauses matter.
PROMPT STRATEGIES AND BEST PRACTICES
Beyond simple versus verbatim prompts, the team discusses best practices for prompt design. They warn that prompts influence context, emphasis, and even whether certain words are treated as disfluencies or essential terms. They point to the prompt engineering guide as a resource and encourage teams to experiment with different prompt styles to align transcripts with their downstream tasks. The underlying message is that prompting is a design lever that can dramatically alter the usefulness of the output.
USING THE API: PROMPT AND MODEL SELECTION
Operationalizing prompts requires using the API: the speech models parameter can request Universal 3 Pro, and the prompt parameter can inject instructions. The quick start guide is recommended to get teams started, and the team invites users to experiment with prompts to understand the tradeoffs. This section makes clear that users can embark on custom transcription workflows without writing bespoke models, by leveraging the combination of model choice and prompt design.
HANDS-ON RESULTS: A PRACTICAL EXAMPLE
In the practical demonstration, the SEC growth data science meeting is used to show real differences. The panel discusses acronyms like MLOps and APAC and notes how capitalization and entity recognition improve with the prompt. They also highlight how the plain transcript initially misinterpreted a phrase, while the prompt-adjusted version restored intended meaning. The side-by-side comparison reinforces that Universal 3 Pro benefits from both stronger out-of-the-box accuracy and targeted prompting, enabling teams to tailor outputs for compliance, analytics, or customer-facing documentation.
IMPACT ON USE CASES AND QUALITY ATTRIBUTES
For organizations, the combination of prompt-driven control and improved baseline accuracy expands the set of viable use cases. Unknown terms, domain-specific entities, and speaker attribution can be tuned via prompts, while code-switching supports multilingual or cross-domain transcription. The model’s ability to tag audio events and maintain consistency across speakers can streamline downstream NLP tasks, indexing, and search. The example demonstrates improved brand-naming of terms and a more faithful rendering of meaning, ensuring transcripts meet legal, technical, or customer-service expectations.
FUTURE DIRECTIONS, DOCUMENTATION, AND COMMUNITY FEEDBACK
The presentation closes with a note that the prompt engineering guide continues to evolve and that there are additional capabilities under development. Assembly AI invites users to try Universal 3 Pro via the API, provide feedback, and help shape future documentation. The take-away is that the platform aims to be a robust, customizable transcription tool adaptable to diverse industries, with transparent documentation, ongoing improvements, and an open channel for user-driven enhancements.
Mentioned in This Episode
●Tools & Products
●People Referenced
Prompting Quick Reference: Do's and Don'ts for Universal 3 Pro
Practical takeaways from this episode
Do This
Avoid This
Common Questions
Universal 3 Pro is Assembly AI's speech-to-text model that supports a text prompt input to customize output. Prompts can affect style, formatting, context, entity accuracy, speaker attribution, and other transcript attributes, enabling tailored results for different use cases.
Topics
Mentioned in this video
More from AssemblyAI
View all 14 summaries
1 minUniversal-3 Pro Streaming: Subway test
2 minUniversal-3 Pro: Office Icebreakers
20 minBuilding Quso.ai: Autonomous social media, the death of traditional SaaS, and founder lessons
61 minPrompt Engineering Workshop: Universal-3 Pro
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free