What Does cl Parameter in knn Function in R Mean? - Coding

The knn function in R is a powerful tool for implementing the k-Nearest Neighbors (k-NN) algorithm, a simple and intuitive method for classification and regression tasks. The function is part of the class package, which provides functions for classification. Among its various parameters, the cl parameter plays a crucial role. This article will explain the significance of the cl parameter in the knn function and how it is used in practice.

Understanding the `knn` Function

The knn function is used to classify a set of test data points based on their proximity to a set of training data points. The basic syntax of the knn function is as follows:

knn(train, test, cl, k = 1, l = 0, prob = FALSE, use.all = TRUE)

Where:

train: A matrix or data frame of training set cases.
test: A matrix or data frame of test set cases.
cl: A factor of true classifications of the training set.
k: The number of nearest neighbors to consider (default is 1).
l: A parameter for window size (default is 0, meaning no windowing).
prob: If TRUE, the proportion of votes for the winning class is returned as an attribute.
use.all: Controls the handling of ties (default is TRUE).

Role of the `cl` Parameter

The cl parameter is essential in the knn function as it provides the true classifications (labels) for the training data. These labels are used to determine the class of the test data points based on the majority vote of their nearest neighbors. Here are some main points of cl Parameter.

Training Labels: The cl parameter must be a factor vector containing the class labels for the training data points. The length of cl must be equal to the number of rows in the train data set.
Classification: During the classification process, the knn function calculates the distance between each test data point and all the training data points. It then identifies the k nearest neighbors.
Voting: The class of each test data point is determined by the majority vote among its k nearest neighbors. The class labels provided in cl are used for this voting process.
Output: The knn function returns a factor vector of predicted class labels for the test data points.

Let’s explain the use of the cl parameter with a practical example using the famous Iris dataset using R Programming Language.

Step 1: Load Necessary Libraries and Data

First, load the required libraries and prepare the data.

# Load necessary libraries
install.packages("class")
library(class)
data(iris)

# Prepare the data
set.seed(123)  # For reproducibility
index <- sample(1:nrow(iris), 0.7 * nrow(iris))
train_data <- iris[index, -5]  # Training data (excluding labels)
test_data <- iris[-index, -5]  # Test data (excluding labels)
train_labels <- iris[index, 5]  # Training labels
test_labels <- iris[-index, 5]  # Test labels

Step 2: Apply the `knn` Function

Use the knn function to classify the test data points based on the training data.

# Apply the knn function
k <- 3  # Number of nearest neighbors
predicted_labels <- knn(train = train_data, test = test_data, cl = train_labels, k = k)

# Print the predicted labels
print(predicted_labels)

Output:

 [1] setosa     setosa     setosa     setosa     setosa     setosa     setosa    
 [8] setosa     setosa     setosa     setosa     setosa     setosa     setosa    
[15] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[22] versicolor versicolor versicolor versicolor versicolor versicolor virginica 
[29] versicolor versicolor versicolor versicolor virginica  virginica  virginica 
[36] virginica  virginica  virginica  virginica  virginica  virginica  virginica 
[43] virginica  virginica  virginica 
Levels: setosa versicolor virginica

Step 3: Evaluate the Model

Compare the predicted labels with the true labels to evaluate the performance of the model.

# Evaluate the model
confusion_matrix <- table(Predicted = predicted_labels, Actual = test_labels)
print(confusion_matrix)

# Calculate accuracy
accuracy <- sum(predicted_labels == test_labels) / length(test_labels)
print(paste("Accuracy:", round(accuracy * 100, 2), "%"))

Output:

            Actual
Predicted    setosa versicolor virginica
  setosa         14          0         0
  versicolor      0         17         0
  virginica       0          1        13

[1] "Accuracy: 97.78 %"

The confusion matrix and accuracy metric are used to evaluate the performance of the k-NN classifier.

Conclusion

The cl parameter in the knn function in R is crucial as it provides the true class labels for the training data. These labels are used during the classification process to determine the class of the test data points based on their nearest neighbors. By understanding and correctly using the cl parameter, you can effectively apply the k-NN algorithm to various classification tasks and achieve reliable results.

Reffered: https://www.geeksforgeeks.org

AI ML DS

Related
Explain the concept of transfer learning and its application in computer vision.
Building a Stock Price Prediction Model with CatBoost: A Hands-On Tutorial
How to Make Heatmap Square in Seaborn FacetGrid
The Distributional Hypothesis in NLP: Foundations, Applications, and Computational Methods
Transforming Language Understanding: An In-Depth Look at BERT and Its Applications

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	20