KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (2024)

KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (2)

Hello everyone, welcome to another one of my articles! Today, I would like to debate a bit about KNN, or K Nearest Neighbor! We will see how it works and how to use it. Come on?

Every project used is available on GitHub and you can access it by clicking here. There you will have access to all the codes used in the base for this article.

What will you see in this article?

  • What is KNN and how it works;
  • What is and how KNeighborsClassifier works;
  • Why and when to use KNeighborsClassifier and KNN;
  • KNN practical example with KNeighborsClassifier;
KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (3)

KNN , or K Nearest Neighbor is a Machine Learning algorithm that uses the similarity between our data to make classifications (supervised machine learning) or clustering (unsupervised machine learning).

With KNN we can have a certain set of data and from it draw patterns that can classify or group our data.

  • But how exactly does it work?

Let’s think about your name first: K Nearest Neighbor

The concept of neighborhood depends on the idea that those close to us tend to be more like us.

From this notion, what KNN (very generically) does is create neighborhoods in our dataset and as we pass other data samples to the model it will return us on “which neighborhood our sample would best fit” !

Let’s see an example?

KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (4)

Note that for this example we have 3 different groups (or clusters) — blue, red and orange — Each of these represents a “neighborhood” with a “border” delimited by the gray circle at the bottom.

The basis of KNN is this, grouping data into clusters. From there, other algorithms do the job of classifying or grouping.

In today’s article I want to emphasize the KNeighborsClassifier.

KNeighborsClassifier is a supervised learning algorithm that makes classifications based on data neighbors. Like? Let’s take one more example:

KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (5)

Suppose we have a sample X (in this case, it’s the green dot). Having it in the plane, we can say an amount N of neighbors that we want and with this we can say if our sample is classified as Blue or Red.

Note that if we want to base a number N (also use the “K”) equal to 3 , we are defining a number of neighbors that will be used to define which class our sample most resembles.

In the example above, it is noted that with an N=3 we look for the 3 data closest to our test sample (in green) and verify that there are 2 blue samples and only 1 red, so we can classify our sample X as blue , as most neighbors classify themselves as such.

On the other hand, if we look for the 7 closest data (N=7) we will have a different situation: 4 samples in red and 3 in blue, thus, the result of our classification of sample X will be different from the previous one, now it will be classified with red!

What I want to give you here is: Changing the number K of neighbors can change the classification, ie it is a slightly sensitive parameter and should be chosen with caution!

Note: When I use the expression “closest data” , you can also understand it as “closest neighbors”

KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (6)

When we have less scattered data and few outliers , KNeighborsClassifier shines.

KNN in general is a series of algorithms that are different from the rest. If we have numerical data and a small amount of features (columns) KNeighborsClassifier tends to behave better.

When it comes to KNN , it is used more often for grouping tasks.

In the future I will bring an article about this and the use of KMeans , but today I will limit myself to showing the basis of KNN, this will serve us in the future!

KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (7)

Remembering again that all the code used in this article is available on Github and any doubt you can access it here and see more deeply the lines of code.

The database used is the ESRB game rating, that is, the age indication of a game. You can access it here.

Starting by importing used libraries….

import pandas as pdfrom sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

from sklearn.model_selection import GridSearchCV

from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_validate

By default, we do the division of our data into training and testing, for this case, we had already been given a dataset already divided into training and testing, so I will skip this step.

Let’s create our model and train it with our data:

# instantiating the model:
model = KNeighborsClassifier(n_neighbors=3)
# training the model:
model.fit(X_train,y_train)

I started with a number N of neighbors equal to 3 just for testing (n_neighbors=3).

Note that there is not necessarily a “training” of the model, as we do not necessarily need to teach something to the model, as we test it, we see the results.

It’s time to see how the model fared….

# making predictions with the created model:
pred = model.predict(X_test)

#
measuring model accuracy:
accuracy = accuracy_score(y_test , pred)
print("Acurácia : {}".format(round(accuracy*100,4))) print(classification_report(y_test , pred))
KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (8)

We can see that we even had good accuracy (80%) despite this low results in some classifications. Let’s try changing the N number to a value that can improve our performance, for that I’ll use GridSearchCV to find the best value:

k_list = list(range(1,61))

k_values = dict(n_neighbors=k_list)grid = GridSearchCV(model , k_values, cv=6, scoring='accuracy')

grid.fit(pd.concat([X_train,X_test]), pd.concat([y_train,y_test]))

As a result, we received the value of N equal to 13, that is, with an n_neighbors=13 we obtained the best accuracy, around 81.8%, slightly higher than that seen previously. When we test this we have the following:

KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (9)

For this new case we obtained even more balanced results for our model. Perhaps some treatments in our database can improve the overall accuracy of the model, but that is not the purpose of today’s article.

I’ve already talked about it in my article: “How to deal with Unbalanced Classes in Machine Learning (Precision, Recall, Oversampling and Undersampling)

KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (10)

In today’s article we looked at a bit of KNN (the basis for many “neighborhood” algorithms) and took a deep look at KNeighborsClassifier . We were able to see how it works from the inside, understand a little about the concept of “neighbors” and above all put into practice what we learned in a real classification project!

That’s it for today, in the future I’ll bring a kind of “part 2” of this article, debating about KMeans, until next time!

KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (11)

Thank you for reading this article to the end, please consider accessing other articles and meeting me on social media:

KNN (K Nearest Neighbors) and KNeighborsClassifier — What it is, how it works, and a practical… (2024)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Kerri Lueilwitz

Last Updated:

Views: 6109

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Kerri Lueilwitz

Birthday: 1992-10-31

Address: Suite 878 3699 Chantelle Roads, Colebury, NC 68599

Phone: +6111989609516

Job: Chief Farming Manager

Hobby: Mycology, Stone skipping, Dowsing, Whittling, Taxidermy, Sand art, Roller skating

Introduction: My name is Kerri Lueilwitz, I am a courageous, gentle, quaint, thankful, outstanding, brave, vast person who loves writing and wants to share my knowledge and understanding with you.