Unlocking Tomorrow with Aiblogtech Today

k nearest neighbor algorithm for machine learning2
Machine Learning

How to do K-Nearest Neighbor Classification with Python?

A data point is assigned to the majority class of its k nearest neighbors in the feature space using the K-Nearest Neighbor (KNN) classification technique. It’s an easy-to-understand and direct approach. Instead of employing explicit model training, it relies on storing the entire dataset during the prediction process.

Difference between K-Nearest Neighbor Classification and K-Means Clustering

difference between k nearest neighbor and k means clustering

The K-Nearest Neighbors classification in Python can be created using the sklearn.neighbors package in the manner described below:
Include the required libraries:

import numpy as np
import matplotlib.pyplot as plt
#for dataset classification import the following library
from sklearn.datasets import make_classification
#to split the model into train and test set
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
#for result comparison
from sklearn.metrics import accuracy_score, classification_report

Make a dataset or import some test information:
We’ll use make_classification to generate fake data for this example:

# Generate synthetic data
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_clusters_per_class=1, n_redundant=0, random_state=42)

From the data, create training and test sets:

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Build and train the K-Nearest Neighbors classifier:

# Create a KNeighborsClassifier object with the desired number of neighbors (k)
knn_classifier = KNeighborsClassifier(n_neighbors=3)

# Train the classifier on the training data, y_train)

Predict the test set using the following:

# Use the trained classifier to make predictions on the test data
y_pred = knn_classifier.predict(X_test)

Examine the classifier’s output:

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Print the classification report
print(classification_report(y_test, y_pred))

The classification report, which can be used to evaluate the K-Nearest Neighbors classifier’s performance on the test set, contains the accuracy, recall, F1-score, and support for each class.

Please be advised that the choice you make regarding the number of neighbors (k) may have an impact on the KNN classifier’s performance. A small k could result in overfitting, whereas a large k could cause underfitting and over-smoothing. To find the optimal value for k, you can apply techniques like grid search and cross-validation, which are similar to what we previously discussed for hyperparameter tuning.


Your email address will not be published. Required fields are marked *