Horje
Calculating Precision and Recall for Multiclass Classification Using Confusion Matrix

Multiclass classification is a common problem in machine learning where a model is required to predict one of several predefined categories. Evaluating the performance of such models can be complex, especially when dealing with imbalanced datasets. Two essential metrics for evaluating multiclass classification models are precision and recall. In this article, we will delve into the details of how to calculate precision and recall using a confusion matrix for multiclass classification.

Understanding the Confusion Matrix

A confusion matrix is a table used to describe the performance of a classification model. It compares the actual target values with those predicted by the model. For a multiclass classification problem, the confusion matrix is a square matrix where the number of rows and columns equals the number of classes.

Each cell in the matrix represents the count of instances for a specific combination of actual and predicted classes. The diagonal elements represent the counts of correctly classified instances (True Positives), while off-diagonal elements represent misclassifications.

Calculating Precision and Recall

Precision and recall are calculated for each class individually. Let’s define these metrics:

  • Precision for a class is the ratio of true positives to the sum of true positives and false positives. It measures the accuracy of the positive predictions.
  • Recall for a class is the ratio of true positives to the sum of true positives and false negatives. It measures the ability of the classifier to find all positive instances.

Formulas

Precision: [Tex]= \frac{TP}{FP+ TP} [/Tex]

Recall : [Tex]= \frac{TP}{FN+ TP}[/Tex]

Where:

  • TP (True Positives) is the count of correctly predicted instances for the class.
  • FP (False Positives) is the count of instances incorrectly predicted as the class.
  • FN (False Negatives) is the count of instances that belong to the class but were predicted as another class.

Example: Calculating Precision and Recall from a Confusion Matrix

Let’s consider a multiclass classification problem with four classes. The confusion matrix is as follows:

Predicted Class 1

Predicted Class 2

Predicted Class 3

Predicted Class 4

Actual Class 1

100

20

5

Actual Class 2

15

80

10

Actual Class 3

10

5

5

Actual Class 4

5

10

80

To calculate the precision and recall for each class, we need to compute the true positives, false positives, and false negatives for each class.

Class 1

  • True Positives (TP): 100 (correctly predicted instances of Class 1)
  • False Positives (FP): 20 + 10 + 5 = 35 (incorrectly predicted instances of Class 1)
  • False Negatives (FN): 15 + 10 + 5 = 30 (missed instances of Class 1)

[Tex]Precision_1 = \frac{TP_1}{FP_1+TP_1} = \frac{100}{35+100} =0.74[/Tex]

[Tex]Recall_1 = \frac{TP_1}{FN_1+ TP_1}= \frac{100}{30+100} =0.77[/Tex]

Class 2

  • True Positives (TP): 80 (correctly predicted instances of Class 2)
  • False Positives (FP): 15 + 5 + 10 = 30 (incorrectly predicted instances of Class 2)
  • False Negatives (FN): 20 + 5 + 10 = 35 (missed instances of Class 2)

[Tex]Precision_2=\frac{TP_2}{FP_2+TP_2}= \frac{80}{30+80} =0.73[/Tex]

[Tex]Recall_2= \frac{TP_2}{FN_2+ TP_2}= \frac{80}{35+80} =0.69[/Tex]

Class 3

  • True Positives (TP): 90 (correctly predicted instances of Class 3)
  • False Positives (FP): 10 + 5 + 5 = 20 (incorrectly predicted instances of Class 3)
  • False Negatives (FN): 10 + 5 + 5 = 20 (missed instances of Class 3)

[Tex]Precision_3= \frac{TP_ 3}{FP_3+ TP_3}= \frac{90}{20+90} =0.82[/Tex]

[Tex]Recall_3= \frac{TP_3}{FN_3+TP_3} = \frac{90}{20+90} =0.82[/Tex]

Class 4

  • True Positives (TP): 80 (correctly predicted instances of Class 4)
  • False Positives (FP): 5 + 10 + 5 = 20 (incorrectly predicted instances of Class 4)
  • False Negatives (FN): 5 + 10 + 5 = 20 (missed instances of Class 4)

[Tex]Precision_4= \frac{TP_4}{FP_4+ TP_4}= \frac{80}{20+80} =0.80[/Tex]

[Tex]Recall_4= \frac{TP_4}{FN_4+TP_4}= \frac{80}{20 +80} =0.80[/Tex]

Different Methods for Calculating Precision and Recall

1. Micro-Averaging

Micro-averaging is a method to calculate precision and recall across all classes. It involves summing up the true positives, false positives, and false negatives across all classes and then calculating the precision and recall using these total counts.

  • Micro-Averaged Precision: [Tex] \text{Precision}_{\text{micro}} = \frac{\sum_{i=1}^{N} \text{TP}_i}{\sum_{i=1}^{N} (\text{TP}_i + \text{FP}_i)} [/Tex]
  • Micro-Averaged Recall:[Tex] \text{Recall}_{\text{micro}} = \frac{\sum_{i=1}^{N} \text{TP}_i}{\sum_{i=1}^{N} (\text{TP}_i + \text{FN}_i)} [/Tex]

2. Macro-Averaging

Macro-averaging calculates the precision and recall for each class individually and then takes the arithmetic mean across all classes. This method gives equal weight to each class, regardless of the number of instances.

  • Macro-Averaged Precision: [Tex] \text{Precision}_{\text{macro}} = \frac{1}{N} \sum_{i=1}^{N} \frac{\text{TP}_i}{\text{TP}_i + \text{FP}_i} [/Tex]
  • Macro-Averaged Recall: [Tex] \text{Recall}_{\text{macro}} = \frac{1}{N} \sum_{i=1}^{N} \frac{\text{TP}_i}{\text{TP}_i + \text{FN}_i} [/Tex]

3. Weighted Averaging

Weighted averaging is similar to macro-averaging but assigns different weights to each class based on the number of instances in that class.

  • Weighted Averaged Precision: [Tex] \text{Precision}_{\text{weighted}} = \sum_{i=1}^{N} \frac{n_i}{n} \cdot \frac{\text{TP}_i}{\text{TP}_i + \text{FP}_i} [/Tex]
  • Weighted Averaged Recall: [Tex] \text{Recall}_{\text{weighted}} = \sum_{i=1}^{N} \frac{n_i}{n} \cdot \frac{\text{TP}_i}{\text{TP}_i + \text{FN}_i} [/Tex]

where [Tex]n_i[/Tex] is the number of instances in class i and n is the total number of instances.

Calculating Precision and Recall with Python Implementation

In practice, you can use libraries like scikit-learn in Python to calculate these metrics efficiently. Here is a sample code snippet:

Python

from sklearn.metrics import confusion_matrix, precision_score, recall_score y_true = [0, 0, 0, 1, 1, 1, 2, 2, 2, 2] y_pred = [0, 0, 1, 1, 1, 2, 2, 0, 2, 2] cm = confusion_matrix(y_true, y_pred) print("Confusion Matrix:\n", cm) precision = precision_score(y_true, y_pred, average=None) recall = recall_score(y_true, y_pred, average=None) print("Precision per class:", precision) print("Recall per class:", recall) # Macro and Micro averaged Precision and Recall macro_precision = precision_score(y_true, y_pred, average='macro') macro_recall = recall_score(y_true, y_pred, average='macro') micro_precision = precision_score(y_true, y_pred, average='micro') micro_recall = recall_score(y_true, y_pred, average='micro') print("Macro Precision:", macro_precision) print("Macro Recall:", macro_recall) print("Micro Precision:", micro_precision) print("Micro Recall:", micro_recall)

Output:

Confusion Matrix: [[2 1 0] [0 2 1] [1 0 3]] Precision per class: [0.66666667 0.66666667 0.75 ] Recall per class: [0.66666667 0.66666667 0.75 ] Macro Precision: 0.6944444444444443 Macro Recall: 0.6944444444444443 Micro Precision: 0.7 Micro Recall: 0.7

Conclusion

Precision and recall are essential metrics for evaluating the performance of a multiclass classification model. By using a confusion matrix, you can calculate these metrics for each class and then aggregate them using macro or micro averaging to get a comprehensive view of your model’s performance. Understanding and applying these metrics will help you fine-tune your model and improve its accuracy in real-world applications.




Reffered: https://www.geeksforgeeks.org


AI ML DS

Related
How to Set Dataframe Column Value as X-axis Labels in Python Pandas How to Set Dataframe Column Value as X-axis Labels in Python Pandas
How to fix "R neuralNet: non-conformable arguments How to fix "R neuralNet: non-conformable arguments
Pandas Full Form Pandas Full Form
Goal-based AI Agents Goal-based AI Agents
How to Use SMOTE for Imbalanced Data in R How to Use SMOTE for Imbalanced Data in R

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
12