Based on Bayes’ theorem, the Naïve Bayes classifier is a simple yet powerful machine learning technique. It is widely used for classification tasks and can be applied to a wide range of data types, especially in text categorization and natural language processing (NLP).
The word “Naïve” refers to the notion that features are conditionally independent given the class label. Although this assumption may not always hold in real-world data, Naïve Bayes is known for its effectiveness and ability to handle high-dimensional feature spaces, and it can still perform surprisingly well in practice.

Create a Naïve Bayes classifier in Python
To create a Naïve Bayes classifier in Python, use the sklearn.naive_bayes package as follows:
Firstly, Import the necessary libraries first.
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
Then, get your dataset ready or loaded:
For this example, let’s use the Scikit-Learn Iris dataset:
# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target
Next, from the data, create training and test sets:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Then, Create and train the Naïve Bayes classifier:
# Create a Gaussian Naive Bayes classifier
nb_classifier = GaussianNB()
# Train the classifier on the training data
nb_classifier.fit(X_train, y_train)
In addition, Predict the test set using the following:
# Use the trained classifier to make predictions
y_pred = nb_classifier.predict(X_test)
Finally, Examine the classifier’s output:
# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
# Print the classification report
print(classification_report(y_test, y_pred))
The data that the classification_report function provides can be used to evaluate the Naïve Bayes classifier’s performance on the test set. For each class, this data comprises support, F1-score, recall, and precision.
In conclusion, it is important to acknowledge that Naïve Bayes relies on the assumption of feature independence, which may not hold true in practical scenarios. If your data defies this assumption, take into consideration alternative classifiers such as support vector machines (SVMs), random forests, or decision trees. However, Naïve Bayes remains a good baseline model and is especially valuable for text classification tasks.