Classification with KNN and Gaussian Naive Bayes
This page covers Iris classification with K-Nearest Neighbors and Gaussian Naive Bayes, including accuracy, confusion matrix, precision, recall, F1-score, and classification reports.
What you should be able to do
- Train a KNN classifier.
- Predict labels and compare them with true labels.
- Read confusion matrices and classification reports.
- Compare models by accuracy and class-level metrics.
Reusable patterns
- Classification predicts discrete class labels.
- accuracy_score(y_test, y_pred) compares actual and predicted labels.
- classification_report summarizes precision, recall, F1-score, and support.
Classification
K-Nearest Neighbors
Listing 1. Fetch the data
# fetch the data
from sklearn import datasets
dataset = datasets.load_iris()
dataset.keys()Expected text output or note
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])Listing 2. Contains class labels for each element
dataset.target # contains class labels for each elementExpected text output or note
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])Listing 3. Class names
dataset.target_names # class namesExpected text output or note
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')Listing 4. Feature names
dataset.feature_names # feature namesExpected text output or note
['sepal length (cm)',
'sepal width (cm)',
'petal length (cm)',
'petal width (cm)']Listing 5. Split the data
# split the data
X = dataset.data # input features (input data)
y = dataset.target # class labels (target)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.5, random_state = 42)The function returns four sets:
- X_train: input data for training.
- X_test: input data for testing.
- y_train: correct labels for the training data.
- y_test: correct labels for the test data.
Listing 6. Define the classification model
# define the classification model
from sklearn.neighbors import KNeighborsClassifier
knn_model=KNeighborsClassifier(n_neighbors = 3)
knn_model.fit(X_train, y_train)Expected text output or note
KNeighborsClassifier(n_neighbors=3)Listing 7. Code listing 7
knn_model.classes_Expected text output or note
array([0, 1, 2])Listing 8. Code listing 8
knn_model.effective_metric_Expected text output or note
'euclidean'Listing 9. Model score
# model score
knn_model.score(X_test, y_test)Expected text output or note
0.9733333333333334Listing 10. Predict classes for the test data
# predict classes for the test data
y_pred_KNN = knn_model.predict(X_test)Listing 11. Print the array of predicted data
# print the array of predicted data
knn_model.predict(X_test)Expected text output or note
array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
0, 1, 1, 2, 1, 2, 1, 2, 1, 0, 2, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0,
1, 2, 0, 1, 2, 0, 2, 2, 1])Now we have:
- y_test = actual classes.
- y_pred_KNN = predicted classes.
Listing 12. Model score
# model score
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred_KNN)Expected text output or note
0.9733333333333334Listing 13. Example for one flower
# example for one flower
klasa={0:'setosa', 1:'versicolor', 2:'virginica'}
print("For the first flower in the test set, the predicted class is {}, and the actual class is {}.".format(klasa[y_pred_KNN[0]], klasa[y_test[0]]))Expected text output or note
For the first flower in the test set, the predicted class is versicolor, and the actual class is versicolor.Confusion matrix
Listing 14. Actual values first, then predicted values
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test,y_pred_KNN) # actual values first, then predicted valuesExpected text output or note
array([[29, 0, 0],
[ 0, 23, 0],
[ 0, 2, 21]])Listing 15. Code listing 15
from sklearn import metrics
import matplotlib.pyplot as plt
confusion_matrix_values = metrics.confusion_matrix(y_test, y_pred_KNN)
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix_values, display_labels = dataset.target_names)
fig, ax = plt.subplots(figsize = (10, 10))
cm_display.plot(ax = ax)
plt.show()Expected text output or note
<Figure size 1000x1000 with 2 Axes>
[visual output omitted; run the code to display the image or chart]Listing 16. Precision for each class
# precision for each class
from sklearn.metrics import precision_score, recall_score, average_precision_score
precision_score(y_test, y_pred_KNN, average = None)Expected text output or note
array([1. , 0.92, 1. ])Listing 17. Precision for the entire model
# precision for the entire model
print(precision_score(y_test, y_pred_KNN, average = 'micro'))Expected text output or note
0.9733333333333334Listing 18. Recall for each class
# recall for each class
from sklearn.metrics import recall_score
recall_score(y_test, y_pred_KNN, average = None)Expected text output or note
array([1. , 1. , 0.91304348])Listing 19. F1-score
# F1-score
from sklearn.metrics import f1_score
f1_score(y_test, y_pred_KNN, average = None)Expected text output or note
array([1. , 0.95833333, 0.95454545])Listing 20. Classification report
# classification report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred_KNN, target_names = dataset.target_names))Expected text output or note
precision recall f1-score support
setosa 1.00 1.00 1.00 29
versicolor 0.92 1.00 0.96 23
virginica 1.00 0.91 0.95 23
accuracy 0.97 75
macro avg 0.97 0.97 0.97 75
weighted avg 0.98 0.97 0.97 75Listing 21. Try different numbers of neighbors and then choose the optimal model
# try different numbers of neighbors and then choose the optimal model
accuracies = []
for i in range(1, 21):
knn_model = KNeighborsClassifier(n_neighbors = i)
knn_model.fit(X_train, y_train)
accuracies.append(knn_model.score(X_test, y_test))
accuraciesExpected text output or note
[0.9733333333333334,
0.96,
0.9733333333333334,
0.9333333333333333,
0.9466666666666667,
0.9466666666666667,
0.9466666666666667,
0.9466666666666667,
0.96,
0.9466666666666667,
0.9466666666666667,
0.9466666666666667,
0.96,
0.9466666666666667,
0.9733333333333334,
0.96,
0.96,
0.9466666666666667,
0.96,
0.9466666666666667]Listing 22. Accuracy graph by number of neighbors
# accuracy graph by number of neighbors
import matplotlib.pyplot as plt
plt.plot(range(1, 21), accuracies)Expected text output or note
[<matplotlib.lines.Line2D at 0x79e54a5a3140>]
<Figure size 640x480 with 1 Axes>
[visual output omitted; run the code to display the image or chart]Practice task. GaussianNB
- Build a model on the Iris dataset; the data can stay the same as before.
- Calculate accuracy and show the classification report.
- Which model achieved better results?
Listing 23. Code listing 23
from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
print(nb_model.score(X_test, y_test))
print(knn_model.score(X_test, y_test))Expected text output or note
0.9866666666666667
0.9466666666666667Listing 24. Code listing 24
y_pred_NB = nb_model.predict(X_test)Listing 25. Code listing 25
print(classification_report(y_test, y_pred_NB, target_names = dataset.target_names))Expected text output or note
precision recall f1-score support
setosa 1.00 1.00 1.00 29
versicolor 0.96 1.00 0.98 23
virginica 1.00 0.96 0.98 23
accuracy 0.99 75
macro avg 0.99 0.99 0.99 75
weighted avg 0.99 0.99 0.99 75