![]() |
CatBoost is an acronym that refers to “Categorical Boosting” and is intended to perform well in classification and regression tasks. CatBoost’s ability to handle categorical variables without the requirement for manual encoding is one of its primary advantages. It employs a method known as Ordered Boosting to handle the difficulties faced by categorical features such as large cardinality. This enables CatBoost to handle categorical data automatically, saving the user time and effort.CatBoost’s basic idea is its ability to effectively and efficiently handle categorical features. It implements a novel technique called Ordered Boosting, which generates a numerical representation by permuting the categorical variables. This method maintains the category information while allowing the model to use the powerful gradient-boosting technique. What is CatBoost?CatBoost, the cutting-edge algorithm developed by Yandex is always a go-to solution for seamless, efficient, and mind-blowing machine learning, classification and regression tasks. With its innovative Ordered Boosting algorithm, CatBoost takes the predictions to new heights by harnessing the power of decision trees. In this article, you’ll explore, the workings of catboost algorithm. Key features related to CatBoost :The key features related to CatBoost are as follows:
Workings of CatboostCatBoost is a powerful gradient-boosting technique designed for machine learning tasks, particularly those involving structured input. It leverages the concept of gradient boosting, which is an ensemble learning method. The algorithm starts by making an initial guess, often the mean of the target variable. It then gradually constructs an ensemble of decision trees, with each tree aiming to reduce the errors or residuals from the previous trees. One of the key strengths of CatBoost is its ability to handle categorical features effectively. It employs a technique called “ordered boosting” to directly process categorical data, leading to faster training and improved model performance. This is achieved by encoding the categorical features in a way that preserves the natural ordering of the categories. To prevent overfitting, CatBoost incorporates regularization techniques. These techniques introduce penalties or constraints during the training process to discourage the model from becoming too complex and fitting thе training data too closely. Regularization helps to generalize the model and make it more robust to unseen data. The algorithm iteratively constructs the ensemble of trees by minimizing the loss function using gradient descent. At each iteration, it calculates the negative gradient of the loss function with respect to the current predictions and fits a new tree to the negative gradient. The learning rate determines the step size taken during gradient descent. The process is repeated until a predetermined number of trees have been added or a convergence criterion has been met. When making predictions, CatBoost combines the predictions from all the trees in the ensemblе. This aggregation of predictions results in highly accurate and reliable models. Mathematically, CatBoost can be represented as follows: Given a training datasеt with N samples and M features, where each sample is denoted as (x_i, y_i), as x_i is a vector of M features and y_i is the corresponding target variablе, CatBoost aims to learn a function F(x) that predicts the target variable y. where,
The equation states that the overall prediction F(x) is obtained by summing up the initial guess F_0(x) with thе predictions of each tree f_m(x_i) for each training sample. This summation is performed for all trees (m) and all training samples (i). Getting started with CatBoost :Install the Packages !pip install catboost Step 1: Importing Necessary LibrariesBefore we begin coding, we must first import the appropriate libraries. We’ll use the pandas package for data manipulation and the CatBoost library for algorithm implementation. Python3
Step 2: Loading the DatasetDataset link :Titanic Dataset Python3
Step 3: Preprocessing the DatasetPreprocessing processes will be performed for the dataset. Missing values will be handled, categorical variables will be converted to numeric representations, and the data will be divided into training and testing sets. Python
Step 4: Setup and Training the CatBoost ModelWe’ll now initialize the CatBoostClassifier and define the training hyperparameters. We’ll determine the number of iterations, learning rate, and tree depth. Finally, the model will be fitted to the training data. Python3
Step 5: Assessing the Model’s PerformanceWe can evaluate the model’s performance on the testing data after it has been trained. To understand the precision, recall, and F1-score of the model, we’ll compute the accuracy score and provide a classification report. Python3
Output: 98: learn: 0.3625223 total: 257ms remaining: 2.59ms 99: learn: 0.3621516 total: 259ms remaining: 0us Accuracy: 0.7988826815642458 Classification Report: precision recall f1-score support 0 0.79 0.89 0.84 105 1 0.81 0.68 0.74 74 accuracy 0.80 179 macro avg 0.80 0.78 0.79 179 weighted avg 0.80 0.80 0.80 179 Step 6: Feature Importance with CatBoostCatBoost includes an in-built feature importance approach for determining the importance of each feature in the model. A bar plot can be used to show the feature significance scores. Python
Output: ![]() feature importance generated by matplotlib ConclusionTo summarize, CatBoost is a powerful and user-friendly gradient boosting library that is appropriate for a wide range of applications. Whether you’re a newbie searching for a simple approach to machine learning or an experienced practitioner looking for top-tier performance, CatBoost is a useful tool to have in your toolbox. However, as with any tool, its success is dependent on the individual problem and dataset, therefore it’s always a good idea to experiment with it and compare it to other techniques. |
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 14 |