What services are needed to build a real-time phone call transcription system?

You will need a service for phone calls (like Twilio), a way to expose your local development server to the internet (like ngrok), a web framework (like Flask with Flask-Sock for websockets), and a speech-to-text transcription service (like AssemblyAI).

How does Twilio send audio data to my Python application?

Twilio sends audio data to your Python application via a WebSocket connection. When a call comes in, Twilio is instructed by TwiML to stream the audio to a specified WebSocket endpoint in your Flask application.

What is ngrok used for in this tutorial?

ngrok is used to create a secure public URL that tunnels traffic from the internet directly to your local machine. This allows Twilio, running on the internet, to send incoming call data to your locally running Flask application without needing to deploy it to a server.

How does AssemblyAI handle real-time transcription?

AssemblyAI's real-time service continuously receives audio data through a WebSocket. It provides both partial transcripts as someone is speaking and final transcripts once an utterance is complete, punctuated and formatted.

What is the sample rate and encoding for Twilio audio?

Twilio typically sends audio data with a sample rate of 8,000 Hz and uses the PCM mu-law encoding. This data is often base64 encoded before being sent over the WebSocket.

How can I automate the Twilio number configuration?

You can automate the process of setting the Twilio number's voice URL webhook by using the Twilio Python REST client library. This allows you to update the number's configuration programmatically within your Python script, avoiding manual steps in the Twilio console.

Key Moments

Transcribe a live phone call with Python - Flask tutorial

AssemblyAI

Science & Technology3 min read33 min video

Feb 27, 2024|22,771 views|448|16

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

Build a real-time call transcription app with Python, Flask, Twilio, and AssemblyAI.

Key Insights

Integrate Python, Flask, Twilio, and AssemblyAI for real-time call transcription.

Use ngrok to expose local Flask application to Twilio without cloud deployment.

Configure Twilio to stream audio to a Flask WebSocket endpoint.

Process audio data using AssemblyAI's real-time transcription API.

Handle different Twilio WebSocket message types (connected, start, media, stop).

Decode base64 encoded audio from Twilio and convert to μ-law format for AssemblyAI.

Automate Twilio number configuration using the Twilio REST client and ngrok.

PROJECT OVERVIEW AND ARCHITECTURE

This tutorial demonstrates how to build a real-time phone call transcription application using Python, Flask, Twilio, and AssemblyAI. The core concept involves configuring a Twilio number to forward incoming call audio streams to a local Flask application via an ngrok tunnel. AssemblyAI's real-time transcription service then processes this audio, returning partial and final transcripts that are displayed in the console.

SETUP AND DEPENDENCY INSTALLATION

To begin, users need accounts with AssemblyAI, Twilio, and ngrok, along with Python installed. Key steps include creating a project directory, setting up a .env file for API keys and credentials (ngrok auth token, Twilio Account SID, API key, API secret, and AssemblyAI API key), and creating a .gitignore file to exclude sensitive information and virtual environments from version control.

FLASK APPLICATION SKELETON AND ENDPOINTS

A Flask application is created using `flask` and `flask-sock` for WebSocket support. Two main endpoints are defined: `/` (incoming call route) to handle Twilio's initial POST request with TwiML instructions, and `/websocket` to manage the audio stream. The application needs to respond to POST requests from Twilio and return TwiML that directs the call audio to the application's WebSocket.

TWILIO INTEGRATION AND AUDIO STREAMING

The Flask application's TwiML response is modified to include a `<Connect>` tag with a `<Stream>` instruction. This tells Twilio to send the incoming audio stream to a WebSocket endpoint (e.g., `wss://{request.host}/websocket`). The WebSocket endpoint in Flask receives this audio data, which is base64 encoded by Twilio. This payload is then decoded and converted to the μ-law format required by AssemblyAI.

ASSEMBLYAI REAL-TIME TRANSCRIPTION

A separate Python file (`twilio_transcriber.py`) is created to handle AssemblyAI integration. This file defines functions for WebSocket events (`on_open`, `on_data`, `on_error`, `on_close`) and a `TwilioTranscriber` class that inherits from `AssemblyAI.RealtimeTranscriber`. The `on_data` function differentiates between partial and final transcripts, printing them dynamically to the console. Partial transcripts are printed with a carriage return to update in place, while final transcripts appear on new lines.

CONNECTING TWILIO AUDIO TO ASSEMBLYAI

In the Flask `transcription_websocket` function, an instance of `TwilioTranscriber` is created and connected to AssemblyAI's real-time service. As audio data (media messages) is received from Twilio, it's decoded from base64, converted to μ-law format, and streamed to AssemblyAI using the transcriber's `stream` method. The transcriber then handles printing the transcribed text as it's generated.

AUTOMATING TWILIO NUMBER CONFIGURATION

To streamline the setup, the tutorial shows how to automate the process of updating the Twilio number's voice webhook. By loading Twilio and ngrok credentials, and using the `twilio.rest.Client`, the application can programmatically find the correct Twilio number and update its `voice_url` to the dynamically generated ngrok tunnel URL. This eliminates the need for manual configuration in the Twilio console.

RUNNING AND TESTING THE APPLICATION

Before running, ensure any existing ngrok tunnels are closed. The Flask application is started with `python main.py`. Users then call their Twilio number, and the audio is streamed, transcribed by AssemblyAI, and displayed in the terminal in real-time. The process concludes with the transcriber session closing when the call ends.

Mentioned in This Episode

●Software & Apps

●Companies

●Concepts

Real-Time Phone Call Transcription Setup

Practical takeaways from this episode

Do This

Provision a Twilio number with voice capabilities.

Install Flask, Flask-Sock, AssemblyAI SDK, python-dotenv, ngrok, and Twilio libraries.

Create a .env file for API keys and authentication tokens.

Configure the Flask app to handle POST requests from Twilio and serve a WebSocket.

Use ngrok to create a public URL for your local Flask application.

Configure the Twilio number's webhook to point to your ngrok URL.

Modify TwiML to connect to the WebSocket for audio streaming.

Implement AssemblyAI's real-time transcription service.

Decode base64 encoded audio data from Twilio before sending to AssemblyAI.

Automate Twilio number webhook configuration programmatically.

Ensure ngrok tunnels and Flask applications are properly disconnected.

Avoid This

Do not commit sensitive API keys and tokens to version control (use .gitignore).

Do not use default Flask debug mode in production.

Do not rely solely on manual ngrok URL copying for Twilio configuration (automate it).

Do not forget to handle different message types from Twilio (connected, start, media, stop).

Do not send raw binary data directly to AssemblyAI without decoding and proper formatting.

Do not leave ngrok tunnels open unnecessarily, especially on free accounts.

Common Questions

You can transcribe phone calls in real time by using Twilio to receive calls, ngrok to expose your local server, Flask to build a web application with WebSocket support, and AssemblyAI's real-time transcription service to process the audio stream.

Topics

Flask Twilio WebSockets Ngrok Telephony DevOps TwiML PCM Mu-law

Mentioned in this video

Software & Apps

ngrok

A tunneling service used to expose the local Flask application to the internet for Twilio to access.

Flask-Sock

A library used to add WebSocket support to Flask applications.

Flask

A micro web framework for Python used to create the web application that handles incoming calls and websockets.

Concepts

TwiML

Twilio Markup Language, used to instruct Twilio on how to handle calls, such as saying messages or connecting to websockets.

PCM mu-law

The audio encoding format used by Twilio for transmitting audio data, compatible with AssemblyAI.

graph neural networks

A type of machine learning model discussed as an advancing area of AI.

Companies

Twilio

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free