![]() |
The k-Nearest Neighbors (KNN) algorithm is a simple, yet powerful, non-parametric method used for classification and regression. One of the critical parameters in KNN is the value of k, which represents the number of nearest neighbors to consider when making a prediction. In this article, we’ll explore how to change the value of k in KNN using R. Why change the value of the K?Changing the value of k in KNN affects the model’s performance by balancing the bias-variance trade-off. A low k(e.g., k=1) results in low bias and high variance, making the model sensitive to noise and outliers, potentially leading to overfitting. Conversely, a high k reduces variance and increases bias, which might cause underfitting. Finding the optimal k through techniques like cross-validation ensures better generalization on unseen data. Additionally, k impacts computational efficiency, with higher k values requiring more calculations. Before we dive into changing the value of k, let’s set up our environment and load the necessary libraries. # Load necessary libraries
library(class)
library(caret) Now we will discuss step by step to Change the Value of k in KNN Using R Programming Language. Step 1: Creating a Sample DatasetFor demonstration purposes, we’ll use the famous Iris dataset. This dataset contains measurements of iris flowers from three different species.
Step 2: Implementing KNN with Different Values of kThe knn function from the class package allows us to implement KNN easily. We can change the value of k by simply modifying the k parameter.
Step 3: Evaluating the ModelTo evaluate the performance of our KNN model, we can use a confusion matrix.
Output: Confusion Matrix and Statistics
Reference
Prediction setosa versicolor virginica
setosa 10 0 0
versicolor 0 10 1
virginica 0 0 9
Overall Statistics
Accuracy : 0.9667
95% CI : (0.8278, 0.9992)
No Information Rate : 0.3333
P-Value [Acc > NIR] : 2.963e-13
Kappa : 0.95
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: setosa Class: versicolor Class: virginica
Sensitivity 1.0000 1.0000 0.9000
Specificity 1.0000 0.9500 1.0000
Pos Pred Value 1.0000 0.9091 1.0000
Neg Pred Value 1.0000 1.0000 0.9524
Prevalence 0.3333 0.3333 0.3333
Detection Rate 0.3333 0.3333 0.3000
Detection Prevalence 0.3333 0.3667 0.3000
Balanced Accuracy 1.0000 0.9750 0.9500 Step 4: Changing the Value of kWe can change the value of k to see how it affects the model’s performance. Let’s try k = 5 and k = 7.
Output: Confusion Matrix and Statistics
Reference
Prediction setosa versicolor virginica
setosa 10 0 0
versicolor 0 10 0
virginica 0 0 10
Overall Statistics
Accuracy : 1
95% CI : (0.8843, 1)
No Information Rate : 0.3333
P-Value [Acc > NIR] : 4.857e-15
Kappa : 1
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: setosa Class: versicolor Class: virginica
Sensitivity 1.0000 1.0000 1.0000
Specificity 1.0000 1.0000 1.0000
Pos Pred Value 1.0000 1.0000 1.0000
Neg Pred Value 1.0000 1.0000 1.0000
Prevalence 0.3333 0.3333 0.3333
Detection Rate 0.3333 0.3333 0.3333
Detection Prevalence 0.3333 0.3333 0.3333
Balanced Accuracy 1.0000 1.0000 1.0000 Here we can see the accuracy is increase when we change the k value. Advanced Approach to Change the Value of k in KNN Using RWe can automate the process of evaluating KNN with different values of k by creating a loop.
Output: $`k = 1`
Confusion Matrix and Statistics
Reference
Prediction setosa versicolor virginica
setosa 10 0 0
versicolor 0 9 2
virginica 0 1 8
Overall Statistics
Accuracy : 0.9
95% CI : (0.7347, 0.9789)
No Information Rate : 0.3333
P-Value [Acc > NIR] : 1.665e-10
Kappa : 0.85
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: setosa Class: versicolor Class: virginica
Sensitivity 1.0000 0.9000 0.8000
Specificity 1.0000 0.9000 0.9500
Pos Pred Value 1.0000 0.8182 0.8889
Neg Pred Value 1.0000 0.9474 0.9048
Prevalence 0.3333 0.3333 0.3333
Detection Rate 0.3333 0.3000 0.2667
Detection Prevalence 0.3333 0.3667 0.3000
Balanced Accuracy 1.0000 0.9000 0.8750.................................................... Visualizing the ResultsTo better understand the impact of different k values on model performance, we can visualize the accuracy.
Output: ![]() Change the Value of k in KNN Using R ConclusionChanging the value of k in the KNN algorithm can significantly impact the model’s performance. By experimenting with different k values, we can identify the optimal k that provides the best accuracy for our specific dataset. Using the class and caret packages in R, it’s straightforward to implement and evaluate KNN models with various k values. |
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 15 |