Transcribe a live phone call with Python - Flask tutorial

AssemblyAIAssemblyAI
Science & Technology3 min read33 min video
Feb 27, 2024|22,552 views|445|16
Save to Pod

Key Moments

TL;DR

Build a real-time call transcription app with Python, Flask, Twilio, and AssemblyAI.

Key Insights

1

Integrate Python, Flask, Twilio, and AssemblyAI for real-time call transcription.

2

Use ngrok to expose local Flask application to Twilio without cloud deployment.

3

Configure Twilio to stream audio to a Flask WebSocket endpoint.

4

Process audio data using AssemblyAI's real-time transcription API.

5

Handle different Twilio WebSocket message types (connected, start, media, stop).

6

Decode base64 encoded audio from Twilio and convert to μ-law format for AssemblyAI.

7

Automate Twilio number configuration using the Twilio REST client and ngrok.

PROJECT OVERVIEW AND ARCHITECTURE

This tutorial demonstrates how to build a real-time phone call transcription application using Python, Flask, Twilio, and AssemblyAI. The core concept involves configuring a Twilio number to forward incoming call audio streams to a local Flask application via an ngrok tunnel. AssemblyAI's real-time transcription service then processes this audio, returning partial and final transcripts that are displayed in the console.

SETUP AND DEPENDENCY INSTALLATION

To begin, users need accounts with AssemblyAI, Twilio, and ngrok, along with Python installed. Key steps include creating a project directory, setting up a .env file for API keys and credentials (ngrok auth token, Twilio Account SID, API key, API secret, and AssemblyAI API key), and creating a .gitignore file to exclude sensitive information and virtual environments from version control.

FLASK APPLICATION SKELETON AND ENDPOINTS

A Flask application is created using `flask` and `flask-sock` for WebSocket support. Two main endpoints are defined: `/` (incoming call route) to handle Twilio's initial POST request with TwiML instructions, and `/websocket` to manage the audio stream. The application needs to respond to POST requests from Twilio and return TwiML that directs the call audio to the application's WebSocket.

TWILIO INTEGRATION AND AUDIO STREAMING

The Flask application's TwiML response is modified to include a `<Connect>` tag with a `<Stream>` instruction. This tells Twilio to send the incoming audio stream to a WebSocket endpoint (e.g., `wss://{request.host}/websocket`). The WebSocket endpoint in Flask receives this audio data, which is base64 encoded by Twilio. This payload is then decoded and converted to the μ-law format required by AssemblyAI.

ASSEMBLYAI REAL-TIME TRANSCRIPTION

A separate Python file (`twilio_transcriber.py`) is created to handle AssemblyAI integration. This file defines functions for WebSocket events (`on_open`, `on_data`, `on_error`, `on_close`) and a `TwilioTranscriber` class that inherits from `AssemblyAI.RealtimeTranscriber`. The `on_data` function differentiates between partial and final transcripts, printing them dynamically to the console. Partial transcripts are printed with a carriage return to update in place, while final transcripts appear on new lines.

CONNECTING TWILIO AUDIO TO ASSEMBLYAI

In the Flask `transcription_websocket` function, an instance of `TwilioTranscriber` is created and connected to AssemblyAI's real-time service. As audio data (media messages) is received from Twilio, it's decoded from base64, converted to μ-law format, and streamed to AssemblyAI using the transcriber's `stream` method. The transcriber then handles printing the transcribed text as it's generated.

AUTOMATING TWILIO NUMBER CONFIGURATION

To streamline the setup, the tutorial shows how to automate the process of updating the Twilio number's voice webhook. By loading Twilio and ngrok credentials, and using the `twilio.rest.Client`, the application can programmatically find the correct Twilio number and update its `voice_url` to the dynamically generated ngrok tunnel URL. This eliminates the need for manual configuration in the Twilio console.

RUNNING AND TESTING THE APPLICATION

Before running, ensure any existing ngrok tunnels are closed. The Flask application is started with `python main.py`. Users then call their Twilio number, and the audio is streamed, transcribed by AssemblyAI, and displayed in the terminal in real-time. The process concludes with the transcriber session closing when the call ends.

Real-Time Phone Call Transcription Setup

Practical takeaways from this episode

Do This

Provision a Twilio number with voice capabilities.
Install Flask, Flask-Sock, AssemblyAI SDK, python-dotenv, ngrok, and Twilio libraries.
Create a .env file for API keys and authentication tokens.
Configure the Flask app to handle POST requests from Twilio and serve a WebSocket.
Use ngrok to create a public URL for your local Flask application.
Configure the Twilio number's webhook to point to your ngrok URL.
Modify TwiML to connect to the WebSocket for audio streaming.
Implement AssemblyAI's real-time transcription service.
Decode base64 encoded audio data from Twilio before sending to AssemblyAI.
Automate Twilio number webhook configuration programmatically.
Ensure ngrok tunnels and Flask applications are properly disconnected.

Avoid This

Do not commit sensitive API keys and tokens to version control (use .gitignore).
Do not use default Flask debug mode in production.
Do not rely solely on manual ngrok URL copying for Twilio configuration (automate it).
Do not forget to handle different message types from Twilio (connected, start, media, stop).
Do not send raw binary data directly to AssemblyAI without decoding and proper formatting.
Do not leave ngrok tunnels open unnecessarily, especially on free accounts.

Common Questions

You can transcribe phone calls in real time by using Twilio to receive calls, ngrok to expose your local server, Flask to build a web application with WebSocket support, and AssemblyAI's real-time transcription service to process the audio stream.

Topics

Mentioned in this video

More from AssemblyAI

View all 48 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free