What are the different ways to transcribe audio with Assembly AI?

Assembly AI offers two primary methods for transcription: providing a publicly accessible URL to an audio file, or uploading the audio file directly to their platform.

Can Assembly AI identify different speakers in an audio recording?

Yes, Assembly AI's conversational intelligence models include speaker diarization, which can label different speakers throughout the transcript. This feature can be enabled by setting 'speaker_labels' to true when starting a transcription job.

How can LLMs be used with Assembly AI and Postman?

Assembly AI's LLM framework, Lemur, can be accessed via API through Postman to perform tasks like extracting action items, answering questions about the audio content, or generating summaries from transcribed meetings.

What kind of questions can I ask Assembly AI's LLM about my audio files?

You can ask a wide range of questions, such as identifying common topics, determining who spoke about specific issues, or understanding the main challenges and proposed solutions discussed in the audio.

What happens if an API request in Postman takes longer than 30 seconds?

If a request exceeds 30 seconds in the browser, it might error out. To handle longer requests, you can download a desktop agent that runs in the background, allowing you to continue using the browser interface for extended processing times.

Can Assembly AI extract action items from meeting recordings?

Yes, Assembly AI's Lemur framework can process meeting transcripts to identify and list potential action items, which can be very useful for project management and follow-up.

Key Moments

How to use @postman to test LLMs with audio data (Transcribe and Understand)

AssemblyAI

Science & Technology2 min read21 min video

May 13, 2024|3,251 views|66|3

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

Use Postman to test AssemblyAI API for audio transcription and LLM insights.

Key Insights

Postman is a valuable tool for initial API testing and understanding responses before coding.

AssemblyAI API can transcribe audio/video files via URL or direct upload.

Speaker diarization can identify different speakers within an audio file.

AssemblyAI's LeMUR framework enables LLM-driven analysis of audio transcripts.

LeMUR can extract action items, answer questions, and summarize audio content.

Postman's browser version may time out on long requests; a desktop agent can help.

INTRODUCTION TO POSTMAN AND ASSEMBLYAI

This tutorial demonstrates using Postman to interact with AssemblyAI's API for audio processing and Large Language Model (LLM) applications. Postman is highlighted as an excellent tool for beginners to understand API requests and responses without immediate coding. It allows for direct testing of endpoints, parameters, and the structure of returned data, making the initial learning curve for new APIs more manageable and encouraging its use beyond this specific tutorial for general API exploration.

AUDIO TRANSCRIPTION VIA POSTMAN

The process begins with setting up a POST request in Postman to AssemblyAI's transcript endpoint. Users need to include their AssemblyAI API key in the authorization headers and set the content type to 'application/json'. Audio or video files can be processed by providing a publicly accessible URL. Alternatively, files can be uploaded directly to AssemblyAI, requiring a change in the content type to 'application/octet-stream' and using the upload endpoint.

RETRIEVING TRANSCRIPTION RESULTS

After initiating a transcription job, Postman is used to poll for the results. A GET request is made to the same transcript endpoint, appending the unique transcript ID obtained from the initial POST request. This allows users to check the status of the transcription job. Once completed, the response will include the full text transcript along with other metadata, such as the audio URL and the model used.

LEVERAGING CONVERSATIONAL INTELLIGENCE FEATURES

AssemblyAI offers various 'conversational intelligence' models beyond basic transcription. These can be enabled by setting specific parameters in the initial transcription request. For instance, setting 'speaker_labels' to true activates speaker diarization, which identifies and labels different speakers throughout the audio. The results, found in the 'utterance' section of the response, attribute text segments to specific speakers and provide word-level timestamps.

USING LEMUR FOR LLM-POWERED ANALYSIS

AssemblyAI's LeMUR framework allows direct LLM interaction through the API for advanced analysis. Two key use cases demonstrated are extracting action items and answering specific questions from transcripts. The 'extract action items' endpoint can process one or more transcript IDs to generate a list of actionable tasks discussed in meetings. Users can specify output formats and provide context to guide LeMUR's analysis.

ADVANCED QUERYING AND RESPONSE FORMATTING

The question-answering capabilities of LeMUR are shown by sending POST requests to a dedicated endpoint. Users can input their own questions, specify desired answer formats, and choose the LLM model. Parameters like 'max_output_size' and 'temperature' control the length and creativity of the AI's response. The tutorial also notes that browser-based Postman requests exceeding 30 seconds may require a desktop agent to prevent timeouts.

Mentioned in This Episode

●Software & Apps

●Tools

●Organizations

●Concepts

Common Questions

Postman is a great tool for testing APIs before integrating them into a codebase. It allows you to send HTTP requests, inspect responses, and experiment with different parameters without writing any code.

Topics

Postman API Testing Audio Transcription Large Language Models (LLMs)Conversational Intelligence Action Item Extraction Question Answering API Integration

Mentioned in this video

Software & Apps

API key

A unique key required to authenticate requests to the Assembly AI API, obtained from the Assembly AI dashboard.

Postman

A platform for API development and testing, used in the video to interact with Assembly AI's API.

Lemur

Assembly AI's framework for large language models, used for tasks like generating summaries, action items, and answering questions from audio data.

Concepts

Application Octet Stream

The content type required for uploading audio files directly to the Assembly AI API.

Action Items

Tasks or follow-up items generated from meeting discussions, extractable using Assembly AI's LLM features.

Speaker Labels

A feature of Assembly AI that identifies and separates different speakers within an audio file.

Companies

GitLab

A company whose meeting recordings were used as an example for transcription and analysis.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free