Key Moments

How to Build a Podcast Summarization Web APP in Python and Streamlit

AssemblyAIAssemblyAI
People & Blogs3 min read31 min video
Jun 5, 2022|6,057 views|112|7
Save to Pod
TL;DR

Build a Python Streamlit app to summarize podcasts into chapters using AssemblyAI and Listen Notes APIs.

Key Insights

1

Integrate AssemblyAI's chapterization feature with Listen Notes API to fetch podcast episode data.

2

Develop a Python script to handle API communication, episode data retrieval, and transcript processing.

3

Utilize Streamlit to create an interactive web interface for podcast summarization.

4

Extract and display podcast title, episode title, thumbnail, and chapter summaries with timestamps.

5

Implement a user-friendly interface with a sidebar for input and expandable sections for chapter details.

6

Convert millisecond timestamps from the API into human-readable HH:MM:SS format for display.

PROJECT OVERVIEW AND API INTEGRATION

This project details the construction of a web application designed to summarize podcast episodes into distinct chapters. It leverages the AssemblyAI API for its chapterization and summarization capabilities and the Listen Notes API to source podcast episodes. The application will feature a web interface built with Streamlit, allowing users to input a podcast episode ID and receive a structured summary including the podcast title, episode name, cover art, and chapter-by-chapter breakdowns with summaries and timestamps.

SETTING UP API COMMUNICATION AND DATA RETRIEVAL

The development process begins by structuring the project into a main script and a supporting script for API communication. The supporting script is updated to remove unnecessary upload functionality, as podcasts will be accessed directly via their URLs from the Listen Notes API. This involves setting up API keys for both Listen Notes and AssemblyAI, defining endpoint URLs, and creating a function `get_episode_audio_url` that takes an episode ID and returns the audio URL along with metadata such as the episode thumbnail, episode title, and podcast title.

CUSTOMIZING ASSEMBLYAI FOR CHAPTERIZATION

Key modifications are made to the AssemblyAI interaction functions to specifically utilize the auto-chapters feature instead of sentiment analysis. This includes renaming variables and parameters to reflect the use of 'auto_chapters'. The functions responsible for polling transcription status and retrieving transcription results are updated to interact with the chapterization endpoint. A minor adjustment is made to increase the polling interval from 30 to 60 seconds, accommodating potentially longer podcast episodes and ensuring more robust status checks.

PROCESSING AND STORING CHAPTER DATA

The `save_transcript` function is central to the data processing pipeline. It now accepts an episode ID, retrieves the audio URL and metadata using `get_episode_audio_url`, and then passes this information to AssemblyAI for chapterization. Instead of saving raw transcripts, the application processes the response to extract chapter information, including the gist, summary, start, and end times for each chapter. This structured chapter data, along with the podcast title, episode title, and thumbnail, is saved into a JSON file for easy retrieval and display.

BUILDING THE STREAMLIT WEB INTERFACE

The Streamlit library is employed to create an intuitive web interface. The application features a title, a sidebar for user input, and a button to trigger the summarization process. When the 'Get Podcast Summary' button is clicked, the `save_transcript` function is executed. Subsequently, the application loads the generated JSON file, extracts the podcast metadata and chapter details, and displays them. This includes the podcast and episode titles, a thumbnail image, and chapter information presented within expandable sections.

ENHANCING USER EXPERIENCE WITH CHAPTER EXPANDERS

Chapter details are presented using Streamlit's expander functionality. Each chapter is represented by an expander whose title is the 'gist' or headline of that chapter, along with its start time, formatted into a human-readable HH:MM:SS string derived from millisecond timestamps. Clicking on an expander reveals the detailed summary for that specific chapter. This structured display makes it easy for users to navigate and digest podcast content efficiently.

Podcast Summarization App Development Steps

Practical takeaways from this episode

Do This

Use AssemblyAI for chapterization and summarization.
Get podcast data from Listen Notes API using episode IDs.
Structure API communication in a separate supporting script.
Modify the save_transcript function to handle audio URLs from Listen Notes
Save extracted chapter data and episode metadata to a JSON file.
Use Streamlit to build a user-friendly web interface.
Display episode title, thumbnail, and chapters with summaries.
Use expanders in Streamlit to show chapter summaries.
Format chapter start times from milliseconds to human-readable format (HH:MM:SS).

Avoid This

Do not upload audio files directly to AssemblyAI; use audio URLs.
Do not use the upload endpoint or chunk size for this project.
Do not save transcript data as plain text; use JSON for easier parsing.
Do not omit error handling for API requests and file operations.
Do not forget to install Streamlit using 'pip install streamlit'.

Common Questions

The application uses AssemblyAI for advanced features like automatic chaptering and summarization, and the Listen Notes API to fetch podcast episode details and audio URLs.

Topics

Mentioned in this video

More from AssemblyAI

View all 48 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free