Key Moments

Python Audio Processing Basics - How to work with audio files in Python

AssemblyAIAssemblyAI
People & Blogs3 min read25 min video
May 26, 2022|79,054 views|1,241|30
Save to Pod
TL;DR

Learn Python audio processing: WAV files, sampling rates, plotting, mic recording, and MP3s with PyDub.

Key Insights

1

Understanding audio file formats like MP3 (lossy), FLAC (lossless), and WAV (uncompressed) is crucial.

2

Key audio parameters include channels (mono/stereo), sample width (bytes per sample), and frame rate (samples per second).

3

Python's built-in `wave` module simplifies loading, saving, and manipulating WAV files.

4

Matplotlib and NumPy can be used to visualize audio waveforms from WAV files.

5

PyAudio allows for recording audio from a microphone and saving it as a WAV file.

6

PyDub, along with FFmpeg, provides a high-level interface for working with various audio formats, including MP3s, and for audio manipulation.

INTRODUCTION TO AUDIO FILE FORMATS

The tutorial begins by differentiating between common audio file formats: MP3, FLAC, and WAV. MP3 is highlighted as a popular lossy compression format, meaning some data is lost during compression. FLAC, conversely, is a lossless compression format that preserves all original data. WAV is presented as an uncompressed format, offering the highest audio quality but resulting in the largest file sizes, making it the standard for CD audio quality. The focus then shifts to WAV due to its ease of use with Python's built-in `wave` module.

UNDERSTANDING WAV FILE PARAMETERS

Before diving into code, essential WAV file parameters are explained. These include the number of channels (mono or stereo), sample width (the number of bytes used to represent each audio sample), and the frame rate, also known as the sample rate or sample frequency. The frame rate indicates the number of samples taken per second, with 44,100 Hz (or 44.1 kHz) being the standard for CD quality. Other parameters include the total number of frames and the binary data within each frame, which can be converted to integer values.

WORKING WITH THE WAVE MODULE

The tutorial demonstrates how to use Python's built-in `wave` module to load and save WAV files. Loading a WAV file involves opening it in read-binary mode, extracting parameters like channel count, sample width, frame rate, and the total number of frames. The actual audio data, stored as a bytes object, can be read using `read_frames()`. Saving a WAV file requires opening a new file in write-binary mode, setting the same parameters as the original file, and then writing the processed or original frames.

VISUALIZING AUDIO WAVEFORMS

To visualize audio signals, the tutorial introduces the use of `matplotlib` and `numpy`. After loading the WAV file's parameters and audio frames, the raw bytes data is converted into a NumPy array, specifying the data type (e.g., `int16`). A time axis is created using `numpy.linspace` based on the audio duration and the number of samples. Finally, `matplotlib.pyplot` is used to plot the signal array against the time axis, creating a visual representation of the audio waveform with labels and a title.

RECORDING AUDIO WITH PYAUDIO

Recording audio from a microphone is achieved using the `PyAudio` library, which acts as a wrapper for the PortAudio I/O library. The process involves initializing `PyAudio`, setting parameters such as audio format (`paInt16`), channels, sample rate, and chunk size (`frames_per_buffer`). A stream is then opened for input. Audio data is read in chunks over a specified duration and stored in a list. After recording, the stream is stopped and closed, and the `PyAudio` instance is terminated. The collected frames can then be saved as a WAV file using the `wave` module.

HANDLING MP3 FILES WITH PYDUB

For working with formats beyond WAV, such as MP3, the `PyDub` library is recommended. It requires FFmpeg to be installed. `PyDub` offers a high-level interface for loading and manipulating audio. The tutorial shows how to load an audio file (e.g., from WAV) using `AudioSegment.from_wav` and how to convert it to MP3 using the `export` method, specifying the format. `PyDub` also enables easy audio manipulation like adjusting volume, repeating clips, and applying fade-ins or fade-outs, before exporting the final audio file.

Python Audio Processing Cheat Sheet

Practical takeaways from this episode

Do This

Use the built-in 'wave' module for WAV files.
Understand parameters like channels, sample width, and frame rate.
Use Matplotlib and NumPy to visualize audio signals.
Install PyAudio for microphone recording.
Install ffmpeg and Pydub for handling MP3 and other formats.
Remember to close file and stream objects after use.

Avoid This

Do not ignore the sample width when calculating frame sizes.
Do not forget to install necessary libraries (Matplotlib, NumPy, PyAudio, Pydub) and dependencies (ffmpeg).
Do not try to load MP3s directly with the 'wave' module.

Common Questions

MP3 is a lossy compression format, meaning some data is lost, resulting in smaller file sizes. FLAC is a lossless compression format, preserving all original data while still compressing. WAV is an uncompressed format, offering the highest quality but the largest file sizes.

Topics

Mentioned in this video

More from AssemblyAI

View all 49 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free