Key Moments
Python Audio Processing Basics - How to work with audio files in Python
Key Moments
Learn Python audio processing: WAV files, sampling rates, plotting, mic recording, and MP3s with PyDub.
Key Insights
Understanding audio file formats like MP3 (lossy), FLAC (lossless), and WAV (uncompressed) is crucial.
Key audio parameters include channels (mono/stereo), sample width (bytes per sample), and frame rate (samples per second).
Python's built-in `wave` module simplifies loading, saving, and manipulating WAV files.
Matplotlib and NumPy can be used to visualize audio waveforms from WAV files.
PyAudio allows for recording audio from a microphone and saving it as a WAV file.
PyDub, along with FFmpeg, provides a high-level interface for working with various audio formats, including MP3s, and for audio manipulation.
INTRODUCTION TO AUDIO FILE FORMATS
The tutorial begins by differentiating between common audio file formats: MP3, FLAC, and WAV. MP3 is highlighted as a popular lossy compression format, meaning some data is lost during compression. FLAC, conversely, is a lossless compression format that preserves all original data. WAV is presented as an uncompressed format, offering the highest audio quality but resulting in the largest file sizes, making it the standard for CD audio quality. The focus then shifts to WAV due to its ease of use with Python's built-in `wave` module.
UNDERSTANDING WAV FILE PARAMETERS
Before diving into code, essential WAV file parameters are explained. These include the number of channels (mono or stereo), sample width (the number of bytes used to represent each audio sample), and the frame rate, also known as the sample rate or sample frequency. The frame rate indicates the number of samples taken per second, with 44,100 Hz (or 44.1 kHz) being the standard for CD quality. Other parameters include the total number of frames and the binary data within each frame, which can be converted to integer values.
WORKING WITH THE WAVE MODULE
The tutorial demonstrates how to use Python's built-in `wave` module to load and save WAV files. Loading a WAV file involves opening it in read-binary mode, extracting parameters like channel count, sample width, frame rate, and the total number of frames. The actual audio data, stored as a bytes object, can be read using `read_frames()`. Saving a WAV file requires opening a new file in write-binary mode, setting the same parameters as the original file, and then writing the processed or original frames.
VISUALIZING AUDIO WAVEFORMS
To visualize audio signals, the tutorial introduces the use of `matplotlib` and `numpy`. After loading the WAV file's parameters and audio frames, the raw bytes data is converted into a NumPy array, specifying the data type (e.g., `int16`). A time axis is created using `numpy.linspace` based on the audio duration and the number of samples. Finally, `matplotlib.pyplot` is used to plot the signal array against the time axis, creating a visual representation of the audio waveform with labels and a title.
RECORDING AUDIO WITH PYAUDIO
Recording audio from a microphone is achieved using the `PyAudio` library, which acts as a wrapper for the PortAudio I/O library. The process involves initializing `PyAudio`, setting parameters such as audio format (`paInt16`), channels, sample rate, and chunk size (`frames_per_buffer`). A stream is then opened for input. Audio data is read in chunks over a specified duration and stored in a list. After recording, the stream is stopped and closed, and the `PyAudio` instance is terminated. The collected frames can then be saved as a WAV file using the `wave` module.
HANDLING MP3 FILES WITH PYDUB
For working with formats beyond WAV, such as MP3, the `PyDub` library is recommended. It requires FFmpeg to be installed. `PyDub` offers a high-level interface for loading and manipulating audio. The tutorial shows how to load an audio file (e.g., from WAV) using `AudioSegment.from_wav` and how to convert it to MP3 using the `export` method, specifying the format. `PyDub` also enables easy audio manipulation like adjusting volume, repeating clips, and applying fade-ins or fade-outs, before exporting the final audio file.
Mentioned in This Episode
●Products
●Software & Apps
●Concepts
Python Audio Processing Cheat Sheet
Practical takeaways from this episode
Do This
Avoid This
Common Questions
MP3 is a lossy compression format, meaning some data is lost, resulting in smaller file sizes. FLAC is a lossless compression format, preserving all original data while still compressing. WAV is an uncompressed format, offering the highest quality but the largest file sizes.
Topics
Mentioned in this video
The output WAV file created by recording microphone input using PyAudio.
An example MP3 file created by manipulating a WAV file (increasing volume, repeating, adding fade in/out) using Pydub.
A sample WAV audio file used in the demonstration, approximately five seconds long.
A popular, lossy audio file compression format where data can be lost during compression.
A lossless audio file compression format that allows perfect reconstruction of the original data.
An uncompressed audio file format, offering the best audio quality but largest file size, standard for CD audio.
A popular Python library providing bindings for PortAudio, used for playing and recording audio, including microphone input.
A class from the Pydub library used to represent and manipulate audio data segments.
A cross-platform audio I/O library for which PyAudio provides bindings, enabling audio playback and recording.
A third-party Python library providing a simple, high-level interface for loading and manipulating audio files, including MP3s.
More from AssemblyAI
View all 49 summaries
53 minYour Ground Truth Is Wrong: Evaluating STT with truth files & semantic WER | AssemblyAI Workshop
1 minUniversal-3 Pro Streaming: Subway test
2 minUniversal-3 Pro: Office Icebreakers
20 minBuilding Quso.ai: Autonomous social media, the death of traditional SaaS, and founder lessons
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Get Started Free