Key Moments

Data Analysis 6: Principal Component Analysis (PCA) - Computerphile

ComputerphileComputerphile
Education4 min read21 min video
Jul 9, 2019|178,548 views|5,322|134
Save to Pod
TL;DR

PCA transforms data to reveal underlying structure, ordering axes by variance for potential reduction.

Key Insights

1

PCA is a data transformation technique, not strictly data reduction, that reorients data for better separation and clustering.

2

Before PCA, data must be standardized (mean zero, standard deviation one) to prevent features with larger scales from dominating.

3

PCA finds new orthogonal axes (principal components) that maximize the variance of the data, ordering them from most to least significant.

4

The first principal component captures the most variance; subsequent components capture progressively less, allowing for ordered dimensionality reduction.

5

Dimensionality reduction can be achieved by keeping only the top principal components that cumulatively explain a desired percentage of the total variance.

6

Each principal component is a weighted sum of the original attributes, representing a new direction in the data space.

PCA AS A DATA TRANSFORMATION TECHNIQUE

Principal Component Analysis (PCA) is widely recognized, often for data reduction. However, its primary function is data transformation, reorienting data to reveal underlying structure and improve separation, which can then facilitate reduction. PCA finds a "different view" for data, potentially making it more suitable for clustering or machine learning tasks. While PCA itself doesn't reduce dimensions, it orders axes by their usefulness, enabling subsequent data reduction by discarding less significant dimensions.

THE IMPORTANCE OF DATA STANDARDIZATION

For PCA to be effective, data standardization is a crucial prerequisite. This process involves centering all attributes around a mean of zero and scaling them to a standard deviation of one. Without standardization, features with larger scales would disproportionately influence the principal components, leading the analysis to prioritize variance from these dominant features rather than true underlying data structure. Standardization ensures all attributes contribute equally to the PCA process.

IDENTIFYING PRINCIPAL COMPONENTS

PCA seeks new axes, or principal components (PCs), that best capture the data's spread. The first principal component (PC1) is the direction that maximizes the variance in the data. This is achieved by finding an axis where the data points projected onto it are maximally spread out, or conversely, where the sum of squared distances of points to this new axis is minimized. Subsequent principal components are found orthogonally to the previous ones, capturing the next largest amount of remaining variance.

VARIANCE EXPLANATION AND ORDERING

Each principal component is ranked by the amount of variance it explains in the dataset. PC1 accounts for the most variance, PC2 for the second most, and so on, down to the last PC which explains the least. This ordering is fundamental to PCA's utility. The cumulative variance explained by the initial principal components provides a measure of how much information is retained. For example, the first few PCs might explain a large percentage of the total variance.

DIMENSIONALITY REDUCTION WITH PCA

A key application of PCA is dimensionality reduction. By examining the cumulative variance explained by the principal components, one can decide on a cutoff point. For instance, if the first 10 PCs explain 95% of the total variance, one might choose to discard the remaining PCs, effectively reducing the dataset's dimensionality from its original hundreds or thousands of attributes to just 10. This retains most of the data's information while simplifying subsequent analysis.

INTERPRETING PRINCIPAL COMPONENTS

Each principal component is a linear combination, or weighted sum, of the original attributes. For example, PC1 might be represented as `w1*attribute1 + w2*attribute2 + ... + wn*attributen`. The weights (`w`) indicate the contribution of each original attribute to that principal component. While the exact meaning of these components can be challenging to interpret directly, especially in high-dimensional spaces, they provide new axes that can better separate data points, as demonstrated with the music genre example.

PRACTICAL APPLICATION WITH MUSIC DATA

The video illustrates PCA using a music dataset with numerous audio features. After standardizing the data, PCA is applied. The summary of the PCA output shows the proportion of variance explained by each component. For this dataset, PC1 explained about 11.7% of the variance, and cumulatively, the first 133 PCs explained 99% of the variance. This suggests that a significant dimensionality reduction could be achieved by retaining only these 133 components.

VISUALIZING TRANSFORMED DATA

The transformed data, projected onto the first two principal components (PC1 vs. PC2), is visualized using music genre data. While the overall plot appears somewhat jumbled, plotting specific genres like rock, electronic, and classical reveals some separation. Rock tends to appear on the right, electronic on the lower left, and classical at the top. This demonstrates that even with only two components, PCA can start teasing apart distinct groups within the data, although more components might be necessary for clearer separation.

THE ROLE OF PCA IN DATA CLEANING

PCA is part of a broader data cleaning, transformation, and reduction process. By transforming data into a new space defined by principal components, it makes the data more amenable to algorithms like clustering, classification, or regression. The ability to order dimensions by variance makes it easier to decide where to cut off and reduce dimensions, aiming to extract maximum knowledge from the data in the most efficient way possible for further analysis.

Principal Component Analysis (PCA) Quick Guide

Practical takeaways from this episode

Do This

Standardize your data (center around zero, standard deviation of one) before applying PCA, especially if attributes have different scales.
Understand that PCA is primarily a data transformation technique, finding new axes that maximize variance and minimize error.
Use the variance explained by each principal component to decide where to cut off for dimensionality reduction.
Consider keeping components that explain a high cumulative percentage of variance (e.g., 99%) for effective data reduction.
Visualize PCA results by plotting data points on the first few principal components to see potential data separation.
Use PCA as part of a data cleaning, transformation, and reduction pipeline before machine learning tasks like clustering or classification.

Avoid This

Do not treat PCA as solely a data reduction technique; it transforms data first.
Do not apply PCA to data with widely different scales without standardization, as this will bias the results.
Do not assume the first few principal components directly correspond to original features; they are weighted sums.
Do not neglect to examine PCA loadings if you need to understand the meaning of the new axes.
Do not stop at just two principal components if they do not adequately separate your data, especially if much variance remains.

Cumulative Variance Explained by Principal Components

Data extracted from this episode

Principal ComponentProportion of VarianceCumulative Proportion
PC111.6%11.6%
PC28%20%
PC35%25%
...
PC13399%

Common Questions

PCA is primarily a data transformation technique that finds new axes (principal components) for data, maximizing the variance along these new axes. This helps in better separating and visualizing data, and as a side effect, orders these axes by usefulness, enabling subsequent data reduction.

Topics

Mentioned in this video

More from Computerphile

View all 82 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free