Unlocking Tomorrow with Aiblogtech Today

principle component analysis
Machine Learning

How to do Principal Component Analysis PCA with Python

Using a dimensionality reduction technique known as principal component analysis (PCA), high-dimensional data can be reduced to a lower-dimensional space while preserving the majority of the data’s variability. It is widely used to accelerate the training of machine learning models, identify characteristics in data, and present data visually.

Here’s how to apply PCA in Python using the sklearn.decomposition package:

Include the required libraries:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
#Import the PCA from sklearn
from sklearn.decomposition import PCA

Get ready or load the dataset:

For this example, we’ll use the scikit-learn Iris dataset:

# Load the Iris dataset
data = load_iris()
X, y =,

Data standardization:

It is a good idea to standardize the data before utilizing PCA because this technique is sensitive to the features’ magnitude:

# Standardize the data
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

Utilize PCA

# Create a PCA object specifying the number of components you want to keep (in this example, 2 components)
pca = PCA(n_components=2)

# Fit the PCA model to your standardized data
X_pca = pca.fit_transform(X_std)

Visualize the changed data:

# Scatter plot the transformed data with the first and second principal components
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis')
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.title('PCA: Iris Dataset')

The different classes of the Iris dataset will be represented by different colors on the scatter plot, and the data points will be projected onto the first two main components.

Steps of Principal Component Analysis PCA

principal component analysis

PCA automatically sorts the components according to decreasing order of explained variance. The majority of the variation in the data is explained by the first principal component, followed by the second, and so forth. By setting n_components, you can indicate the number of primary components you want to retain.
Recall that the transformation does not account for class labels and that PCA is an unsupervised technique. It is often used as a preprocessing step before using supervised machine learning algorithms in order to reduce the dimensionality of the data and remove noise or irrelevant information.



Your email address will not be published. Required fields are marked *