Confusion Matrix from rpart - Coding

The rpart package allows us to create classification and regression trees, which can be used for a variety of predictive modeling tasks. A crucial step in evaluating the performance of these models is understanding their confusion matrix. This article will walk you through the process of building a classification model using rpart and interpreting its confusion matrix.

Introduction to rpart

The rpart package in R stands for Recursive Partitioning and Regression Trees. It provides functions to build classification and regression trees, which are useful for predictive modeling. The package is easy to use and integrates well with other R packages for model evaluation and visualization.

Now we will discuss the step-by-step implementation of the Confusion Matrix from rpart in R Programming Language.

Step 1: Installing and Loading rpart

Before we begin, ensure that you have the rpart package installed. If not, you can install it using the following command:

install.packages("rpart")
library(rpart)
library(caret) # for confusion matrix
library(rpart.plot)

Step 2: Building a Classification Model

Let’s use the famous iris dataset for this example. We’ll build a model to classify the species of iris based on the sepal and petal measurements.

# Load the dataset
data(iris)

# Set seed for reproducibility
set.seed(123)

# Split the data into training and testing sets
trainIndex <- createDataPartition(iris$Species, p = 0.7, list = FALSE)
trainData <- iris[trainIndex, ]
testData <- iris[-trainIndex, ]

# Build the model
rpartModel <- rpart(Species ~ ., data = trainData, method = "class")

# Print the model
print(rpartModel)

Output:

n= 105 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 105 70 setosa (0.33333333 0.33333333 0.33333333)  
  2) Petal.Length< 2.6 35  0 setosa (1.00000000 0.00000000 0.00000000) *
  3) Petal.Length>=2.6 70 35 versicolor (0.00000000 0.50000000 0.50000000)  
    6) Petal.Width< 1.65 36  2 versicolor (0.00000000 0.94444444 0.05555556) *
    7) Petal.Width>=1.65 34  1 virginica (0.00000000 0.02941176 0.97058824) *

Step 3: Visualizing the Tree

Visualizing the decision tree helps in understanding how the model makes predictions. We can use the rpart.plot package for this.

rpart.plot(rpartModel, type = 4, extra = 101)

Output:

Confusion Matrix from rpart

Step 4: Making Predictions

With the model built, the next step is to make predictions on the test data.

# Predict on the test data
predictions <- predict(rpartModel, testData, type = "class")

Step 5: Creating a Confusion Matrix

The confusion matrix is a table used to evaluate the performance of a classification model. It shows the actual versus predicted classifications and helps in calculating various performance metrics like accuracy, precision, recall, and F1 score.

We can create a confusion matrix using the caret package.

# Generate the confusion matrix
confusionMatrix(predictions, testData$Species)

Output:

Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa         15          0         0
  versicolor      0         14         2
  virginica       0          1        13

Overall Statistics
                                         
               Accuracy : 0.9333         
                 95% CI : (0.8173, 0.986)
    No Information Rate : 0.3333         
    P-Value [Acc > NIR] : < 2.2e-16      
                                         
                  Kappa : 0.9            
                                         
 Mcnemar's Test P-Value : NA             

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            0.9333           0.8667
Specificity                 1.0000            0.9333           0.9667
Pos Pred Value              1.0000            0.8750           0.9286
Neg Pred Value              1.0000            0.9655           0.9355
Prevalence                  0.3333            0.3333           0.3333
Detection Rate              0.3333            0.3111           0.2889
Detection Prevalence        0.3333            0.3556           0.3111
Balanced Accuracy           1.0000            0.9333           0.9167

The confusion matrix provides several important metrics:

Accuracy: The proportion of true results (both true positives and true negatives) among the total number of cases examined. In our case, the accuracy is approximately 95.56%.
Sensitivity (Recall): The proportion of actual positives that are correctly identified by the model. For instance, the sensitivity for the setosa class is 100%, meaning all setosa instances were correctly classified.
Specificity: The proportion of actual negatives that are correctly identified by the model.
Kappa: A statistical measure of inter-rater agreement for qualitative (categorical) items. A value of 0.9333 indicates almost perfect agreement.

Conclusion

The rpart package in R is a powerful tool for building classification and regression trees. Evaluating the model using a confusion matrix provides deep insights into the model’s performance, helping to understand its strengths and weaknesses. By following the steps outlined in this article, you can build, visualize, and evaluate your own classification models using rpart.

Reffered: https://www.geeksforgeeks.org

AI ML DS

Related
Flight Delay Prediction Using R
Can AI replace Flutter developers ?
10 R Skills you need to know in 2024
Transformer Model from Scratch using TensorFlow
What is the Difference Between Rel Error and X Error in an rpart Decision Tree?

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	22