Classification with KNN and Gaussian Naive Bayes

This page covers Iris classification with K-Nearest Neighbors and Gaussian Naive Bayes, including accuracy, confusion matrix, precision, recall, F1-score, and classification reports.

What you should be able to do

Train a KNN classifier.
Predict labels and compare them with true labels.
Read confusion matrices and classification reports.
Compare models by accuracy and class-level metrics.

Reusable patterns

Classification predicts discrete class labels.
accuracy_score(y_test, y_pred) compares actual and predicted labels.
classification_report summarizes precision, recall, F1-score, and support.

Classification

K-Nearest Neighbors

Listing 1. Fetch the data

# fetch the data
from sklearn import datasets
dataset = datasets.load_iris()
dataset.keys()

Expected text output or note

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

Listing 2. Contains class labels for each element

dataset.target # contains class labels for each element

Expected text output or note

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Listing 3. Class names

dataset.target_names # class names

Expected text output or note

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

Listing 4. Feature names

dataset.feature_names # feature names

Expected text output or note

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

Listing 5. Split the data

# split the data
X = dataset.data # input features (input data)
y = dataset.target # class labels (target)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.5, random_state = 42)

The function returns four sets:

X_train: input data for training.
X_test: input data for testing.
y_train: correct labels for the training data.
y_test: correct labels for the test data.

Listing 6. Define the classification model

# define the classification model
from sklearn.neighbors import KNeighborsClassifier
knn_model=KNeighborsClassifier(n_neighbors = 3)
knn_model.fit(X_train, y_train)

Expected text output or note

KNeighborsClassifier(n_neighbors=3)

Listing 7. Inspect the classes learned by KNN

knn_model.classes_

Expected text output or note

array([0, 1, 2])

Listing 8. Inspect the distance metric used by KNN

knn_model.effective_metric_

Expected text output or note

'euclidean'

Listing 9. Model score

# model score
knn_model.score(X_test, y_test)

Expected text output or note

0.9733333333333334

Listing 10. Predict classes for the test data

# predict classes for the test data
y_pred_KNN = knn_model.predict(X_test)

Listing 11. Print the array of predicted data

# print the array of predicted data
knn_model.predict(X_test)

Expected text output or note

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
       0, 1, 1, 2, 1, 2, 1, 2, 1, 0, 2, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0,
       1, 2, 0, 1, 2, 0, 2, 2, 1])

Now we have:

y_test = actual classes.
y_pred_KNN = predicted classes.

Listing 12. Model score

# model score
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred_KNN)

Expected text output or note

0.9733333333333334

Listing 13. Example for one flower

# example for one flower
class_names={0:'setosa', 1:'versicolor', 2:'virginica'}
print("For the first flower in the test set, the predicted class is {}, and the actual class is {}.".format(class_names[y_pred_KNN[0]], class_names[y_test[0]]))

Expected text output or note

For the first flower in the test set, the predicted class is versicolor, and the actual class is versicolor.

Confusion matrix

Listing 14. Actual values first, then predicted values

from sklearn.metrics import confusion_matrix
confusion_matrix(y_test,y_pred_KNN) # actual values first, then predicted values

Expected text output or note

array([[29,  0,  0],
       [ 0, 23,  0],
       [ 0,  2, 21]])

Listing 15. Display the KNN confusion matrix

from sklearn import metrics
import matplotlib.pyplot as plt
confusion_matrix_values = metrics.confusion_matrix(y_test, y_pred_KNN)

cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_values, display_labels = dataset.target_names)
fig, ax = plt.subplots(figsize = (10, 10))
cm_display.plot(ax = ax)
plt.show()

Expected text output or note

<Figure size 1000x1000 with 2 Axes>

[visual output omitted; run the code to display the image or chart]

Listing 16. Precision for each class

# precision for each class
from sklearn.metrics import precision_score, recall_score, average_precision_score
precision_score(y_test, y_pred_KNN, average = None)

Expected text output or note

array([1.  , 0.92, 1.  ])

Listing 17. Precision for the entire model

# precision for the entire model
print(precision_score(y_test, y_pred_KNN, average = 'micro'))

Expected text output or note

0.9733333333333334

Listing 18. Recall for each class

# recall for each class
from sklearn.metrics import recall_score
recall_score(y_test, y_pred_KNN, average = None)

Expected text output or note

array([1.        , 1.        , 0.91304348])

Listing 19. F1-score

# F1-score
from sklearn.metrics import f1_score
f1_score(y_test, y_pred_KNN, average = None)

Expected text output or note

array([1.        , 0.95833333, 0.95454545])

Listing 20. Classification report

# classification report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred_KNN, target_names = dataset.target_names))

Expected text output or note

precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        29
  versicolor       0.92      1.00      0.96        23
   virginica       1.00      0.91      0.95        23

    accuracy                           0.97        75
   macro avg       0.97      0.97      0.97        75
weighted avg       0.98      0.97      0.97        75

Listing 21. Try different numbers of neighbors and then choose the optimal model

# try different numbers of neighbors and then choose the optimal model
accuracies = []
for i in range(1, 21):
  knn_model = KNeighborsClassifier(n_neighbors = i)
  knn_model.fit(X_train, y_train)
  accuracies.append(knn_model.score(X_test, y_test))
accuracies

Expected text output or note

[0.9733333333333334,
 0.96,
 0.9733333333333334,
 0.9333333333333333,
 0.9466666666666667,
 0.9466666666666667,
 0.9466666666666667,
 0.9466666666666667,
 0.96,
 0.9466666666666667,
 0.9466666666666667,
 0.9466666666666667,
 0.96,
 0.9466666666666667,
 0.9733333333333334,
 0.96,
 0.96,
 0.9466666666666667,
 0.96,
 0.9466666666666667]

Listing 22. Accuracy graph by number of neighbors

# accuracy graph by number of neighbors
import matplotlib.pyplot as plt
plt.plot(range(1, 21), accuracies)

Expected text output or note

[<matplotlib.lines.Line2D at 0x79e54a5a3140>]

<Figure size 640x480 with 1 Axes>

[visual output omitted; run the code to display the image or chart]

Practice task. GaussianNB - Build a model on the Iris dataset; the data can stay the same as before. - Calculate accuracy and show the classification report. - Which model achieved better results?

Listing 23. Train Gaussian Naive Bayes and compare accuracy

from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
print(nb_model.score(X_test, y_test))
print(knn_model.score(X_test, y_test))

Expected text output or note

0.9866666666666667
0.9466666666666667

Listing 24. Predict test labels with Gaussian Naive Bayes

y_pred_NB = nb_model.predict(X_test)

Listing 25. Print the Gaussian Naive Bayes classification report

print(classification_report(y_test, y_pred_NB, target_names = dataset.target_names))

Expected text output or note

precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        29
  versicolor       0.96      1.00      0.98        23
   virginica       1.00      0.96      0.98        23

    accuracy                           0.99        75
   macro avg       0.99      0.99      0.99        75
weighted avg       0.99      0.99      0.99        75

Back to overview

Python Data Foundations Documentation

Classification with KNN and Gaussian Naive Bayes

Classification

Listing 1. Fetch the data

Listing 2. Contains class labels for each element

Listing 3. Class names

Listing 4. Feature names

Listing 5. Split the data

Listing 6. Define the classification model

Listing 7. Inspect the classes learned by KNN

Listing 8. Inspect the distance metric used by KNN

Listing 9. Model score

Listing 10. Predict classes for the test data

Listing 11. Print the array of predicted data

Listing 12. Model score

Listing 13. Example for one flower

Listing 14. Actual values first, then predicted values

Listing 15. Display the KNN confusion matrix

Listing 16. Precision for each class

Listing 17. Precision for the entire model

Listing 18. Recall for each class

Listing 19. F1-score

Listing 20. Classification report

Listing 21. Try different numbers of neighbors and then choose the optimal model

Listing 22. Accuracy graph by number of neighbors

Listing 23. Train Gaussian Naive Bayes and compare accuracy

Listing 24. Predict test labels with Gaussian Naive Bayes

Listing 25. Print the Gaussian Naive Bayes classification report