Horje
How to create Naive Bayes in R for numerical and categorical variables

Naive Bayes classifiers are simple yet powerful probabilistic classifiers based on Bayes’ theorem. They are particularly useful for large datasets and have applications in various domains, including text classification, spam detection, and medical diagnosis. This article will guide you through the process of creating a Naive Bayes classifier in R that can handle both numerical and categorical variables.

Understanding Naive Bayes

Naive Bayes classifiers assume that the features (predictors) are conditionally independent given the class label. Despite this “naive” assumption, they often perform surprisingly well in practice. The key idea is to calculate the posterior probability for each class and then select the class with the highest probability.

Now we will discuss the Step-by-Step Guide to Creating a Naive Bayes Classifier for numerical and categorical variables in R Programming Language.

Step 1: Install and Load Required Packages

First, ensure that you have the e1071 package installed, as it provides an implementation of the Naive Bayes classifier.

R
install.packages("e1071")
library(e1071)

Step 2: Prepare Your Data

For demonstration purposes, we’ll use a sample dataset. Here’s an example dataset that includes both numerical and categorical variables:

R
# Sample data
data <- data.frame(
  age = c(25, 30, 45, 35, 50, 23, 37, 61, 22, 42),
  income = c('high', 'high', 'medium', 'low', 'low', 'medium', 'medium', 'high', 'low', 
                                                                              'medium'),
  student = c('no', 'no', 'no', 'yes', 'no', 'yes', 'yes', 'no', 'yes', 'yes'),
  credit_rating = c('fair', 'excellent', 'fair', 'fair', 'excellent', 'excellent', 'fair',
                                                             'fair', 'excellent', 'fair'),
  buys_computer = c('no', 'no', 'yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes')
)
head(data)
# Convert categorical variables to factors
data$income <- as.factor(data$income)
data$student <- as.factor(data$student)
data$credit_rating <- as.factor(data$credit_rating)
data$buys_computer <- as.factor(data$buys_computer)

Output:

  age income student credit_rating buys_computer
1 25 high no fair no
2 30 high no excellent no
3 45 medium no fair yes
4 35 low yes fair yes
5 50 low no excellent yes
6 23 medium yes excellent no

Step 3: Split the Data into Training and Testing Sets

Splitting the data helps in evaluating the performance of the model. We’ll use 70% of the data for training and 30% for testing.

R
set.seed(123)  # For reproducibility
train_index <- sample(1:nrow(data), 0.7 * nrow(data))
train_data <- data[train_index, ]
test_data <- data[-train_index, ]

Step 4: Train the Naive Bayes Model

Use the naiveBayes function from the e1071 package to train the model.

R
model <- naiveBayes(buys_computer ~ ., data = train_data)
print(model)

Output:

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace)

A-priori probabilities:
Y
no yes
0.5714286 0.4285714

Conditional probabilities:
age
Y [,1] [,2]
no 34.75000 17.74589
yes 36.33333 12.50333

income
Y high low medium
no 0.7500000 0.0000000 0.2500000
yes 0.0000000 0.3333333 0.6666667

student
Y no yes
no 0.7500000 0.2500000
yes 0.3333333 0.6666667

credit_rating
Y excellent fair
no 0.5000000 0.5000000
yes 0.3333333 0.6666667

Step 5: Make Predictions

Use the trained model to make predictions on the test data.

R
predictions <- predict(model, test_data)
print(predictions)

Output:

[1] yes yes yes
Levels: no yes

Step 6: Evaluate the Model

Evaluate the performance of the model by comparing the predictions with the actual class labels.

R
# Confusion matrix
confusion_matrix <- table(predictions, test_data$buys_computer)
print(confusion_matrix)

# Calculate accuracy
accuracy <- sum(diag(confusion_matrix)) / sum(confusion_matrix)
print(paste("Accuracy:", round(accuracy, 2)))

Output:

predictions no yes
no 0 0
yes 0 3

[1] "Accuracy: 1"

The naiveBayes function in the e1071 package can handle both numerical and categorical variables. Numerical variables are assumed to follow a Gaussian (normal) distribution, while categorical variables are handled by calculating the frequency of each category given the class label.

Conclusion

Creating a Naive Bayes classifier in R to handle both numerical and categorical variables involves:

  • Installing and loading the e1071 package.
  • Preparing your data by converting categorical variables to factors.
  • Splitting the data into training and testing sets.
  • Training the model using the naiveBayes function.
  • Making predictions and evaluating the model’s performance.

Naive Bayes classifiers are robust and efficient, making them a great choice for various classification tasks. By following the steps outlined in this article, you can implement a Naive Bayes classifier in R for datasets with mixed types of variables.




Reffered: https://www.geeksforgeeks.org


AI ML DS

Related
Multinomial Naive Bayes Classifier in R Multinomial Naive Bayes Classifier in R
How to Perform a Cramer-Von Mises Test in R How to Perform a Cramer-Von Mises Test in R
How to Achieve Disc Shape in D3 Force Simulation? How to Achieve Disc Shape in D3 Force Simulation?
What are the design schemas of data modelling? What are the design schemas of data modelling?
14 Things AI Can — and Can&#039;t Do 14 Things AI Can — and Can&#039;t Do

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
17