CatBoost Grid search and random search - Coding

Finding the best model and setting it up for optimum performance may be difficult in the realm of machine learning. Thankfully, methods like Grid Search and Random Search may be used to help. We shall clarify these techniques in this article, focusing on CatBoost, a potential gradient-boosting library. Building precise and effective machine learning models requires a grasp of fundamental search techniques, regardless of your level of experience as a data scientist.

CatBoost

Gradient boosting for categorical data is quickly and precisely implemented by CatBoost, a machine-learning package. By iteratively fitting the new learners to the residual errors of the old learners, the gradient-boosting approach combines several weak learners (such as decision trees) into strong learners. Without the requirement for human encoding or preprocessing, CatBoost can handle category features automatically. Finding the ideal values for the hyperparameters, such as the learning rate, the depth of the trees, the number of iterations, and so on, may be difficult when utilizing gradient boosting. These hyperparameters have an impact on the model’s efficiency, precision, training time, and memory requirements. As a result, it’s crucial to tune them methodically and properly.

Grid Search and Random Search

To determine the ideal set of hyperparameters for a machine learning model, researchers employ grid search and random search strategies. You may save time and effort by automating the hyperparameter tweaking procedure.

Grid Search

As the name implies, grid search entails defining a grid of hyperparameter values to search through. It is a methodical but computationally costly strategy since it thoroughly tests all potential combinations of hyperparameters inside the specified grid.

Key steps of Grid Search:

Define a grid of hyperparameter values to explore.
Train and evaluate the model for each combination of hyperparameters.
Select the combination that performs the best.

Random Search

On the other side, Random Search employs a more random strategy. Finding useful hyperparameters using this method may be more effective than using Grid Search since it draws hyperparameters at random from predetermined distributions.

Key steps of Random Search:

Specify the hyperparameter distributions (e.g., uniform or log-uniform) to sample from.
Randomly sample combinations of hyperparameters.
Train and evaluate the model for each sampled combination.
Select the combination that performs the best.

Differences between Grid Search and Random Search

Aspect	Grid Search	Random Search
Search Strategy	Systematic: Tries all combinations in the grid	Random: Samples combinations from predefined distributions
Computational Cost	High (exponential with the number of hyperparameters)	Lower (linear with the number of iterations)
Suitable for	Smaller parameter spaces	Larger and complex parameter spaces
Exploration	Limited exploration of parameter space	Broader exploration of parameter space

Using Grid Search and Random Search with CatBoost

We’ll use a simple classification job and the well-known “Breast Cancer Wisconsin (Diagnostic)” dataset to show the differences between Grid Search and Random Search for hyperparameter improving in CatBoost.

We’ll do hyperparameter tuning using both Grid Search and Random Search, weighing the benefits and drawbacks of each method. We will also assess the models and touch briefly on deployment issues.

Step 1: Import Necessary Libraries and Load the Dataset

Python

import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from catboost import CatBoostClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import make_scorer, f1_score
 
# Load the Breast Cancer dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)
 
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 2: Perform Grid Search for Hyperparameter Tuning

Python

# Initialize the CatBoost classifier
model = CatBoostClassifier(loss_function='Logloss', random_state=42)
 
# Define the parameter grid for Grid Search
param_grid = {
    'iterations': [100, 200],
    'learning_rate': [0.01, 0.1],
    'depth': [3, 6]
}
 
# Use 'f1_weighted' as the scoring metric for Grid Search
scorer = make_scorer(f1_score, average='weighted')
grid_search = GridSearchCV(model, param_grid, cv=5, scoring=scorer, n_jobs=-1)
grid_search.fit(X_train, y_train)
 
# Print the best hyperparameters for Grid Search
print("Grid Search - Best Hyperparameters:", grid_search.best_params_)

Output :

170:    learn: 0.0093395    total: 432ms    remaining: 73.3ms
171:    learn: 0.0093290    total: 434ms    remaining: 70.7ms
172:    learn: 0.0092868    total: 437ms    remaining: 68.2ms
173:    learn: 0.0091540    total: 439ms    remaining: 65.6ms
174:    learn: 0.0090309    total: 441ms    remaining: 63ms
175:    learn: 0.0090234    total: 444ms    remaining: 60.5ms
176:    learn: 0.0089493    total: 446ms    remaining: 58ms
177:    learn: 0.0089408    total: 448ms    remaining: 55.4ms
178:    learn: 0.0089310    total: 451ms    remaining: 52.9ms
179:    learn: 0.0087964    total: 453ms    remaining: 50.4ms
180:    learn: 0.0087965    total: 456ms    remaining: 47.9ms
181:    learn: 0.0087429    total: 458ms    remaining: 45.3ms
182:    learn: 0.0087357    total: 461ms    remaining: 42.8ms
183:    learn: 0.0087141    total: 463ms    remaining: 40.3ms
184:    learn: 0.0085773    total: 465ms    remaining: 37.7ms
185:    learn: 0.0085404    total: 468ms    remaining: 35.2ms
186:    learn: 0.0084845    total: 470ms    remaining: 32.7ms
187:    learn: 0.0082686    total: 473ms    remaining: 30.2ms
188:    learn: 0.0082414    total: 475ms    remaining: 27.7ms
189:    learn: 0.0081177    total: 478ms    remaining: 25.1ms
190:    learn: 0.0080564    total: 480ms    remaining: 22.6ms
191:    learn: 0.0078925    total: 483ms    remaining: 20.1ms
192:    learn: 0.0078671    total: 485ms    remaining: 17.6ms
193:    learn: 0.0078671    total: 487ms    remaining: 15.1ms
194:    learn: 0.0078669    total: 489ms    remaining: 12.5ms
195:    learn: 0.0076908    total: 492ms    remaining: 10ms
196:    learn: 0.0075966    total: 494ms    remaining: 7.52ms
197:    learn: 0.0075902    total: 496ms    remaining: 5.01ms
198:    learn: 0.0075845    total: 499ms    remaining: 2.5ms
199:    learn: 0.0075563    total: 501ms    remaining: 0us
Grid Search - Best Hyperparameters: {'depth': 3, 'iterations': 200, 'learning_rate': 0.1}

Step 3: Perform Random Search for Hyperparameter Tuning

Python

# Initialize the CatBoost classifier for Random Search
model = CatBoostClassifier(loss_function='Logloss', random_state=42)
 
# Define the parameter distributions for Random Search
param_dist = {
    'iterations': [100, 200],
    'learning_rate': [0.01, 0.1],
    'depth': [3, 6]
}
 
# Perform Randomized Search with cross-validation
random_search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=10, cv=5, scoring=scorer, n_jobs=-1, random_state=42)
random_search.fit(X_train, y_train)
 
# Print the best hyperparameters for Random Search
print("Random Search - Best Hyperparameters:", random_search.best_params_)

Output:

170:    learn: 0.0093395    total: 414ms    remaining: 70.3ms
171:    learn: 0.0093290    total: 417ms    remaining: 67.8ms
172:    learn: 0.0092868    total: 419ms    remaining: 65.4ms
173:    learn: 0.0091540    total: 421ms    remaining: 63ms
174:    learn: 0.0090309    total: 424ms    remaining: 60.5ms
175:    learn: 0.0090234    total: 426ms    remaining: 58.1ms
176:    learn: 0.0089493    total: 428ms    remaining: 55.6ms
177:    learn: 0.0089408    total: 431ms    remaining: 53.2ms
178:    learn: 0.0089310    total: 433ms    remaining: 50.8ms
179:    learn: 0.0087964    total: 435ms    remaining: 48.3ms
180:    learn: 0.0087965    total: 437ms    remaining: 45.9ms
181:    learn: 0.0087429    total: 440ms    remaining: 43.5ms
182:    learn: 0.0087357    total: 442ms    remaining: 41.1ms
183:    learn: 0.0087141    total: 444ms    remaining: 38.6ms
184:    learn: 0.0085773    total: 446ms    remaining: 36.2ms
185:    learn: 0.0085404    total: 449ms    remaining: 33.8ms
186:    learn: 0.0084845    total: 451ms    remaining: 31.4ms
187:    learn: 0.0082686    total: 453ms    remaining: 28.9ms
188:    learn: 0.0082414    total: 456ms    remaining: 26.5ms
189:    learn: 0.0081177    total: 458ms    remaining: 24.1ms
190:    learn: 0.0080564    total: 460ms    remaining: 21.7ms
191:    learn: 0.0078925    total: 463ms    remaining: 19.3ms
192:    learn: 0.0078671    total: 465ms    remaining: 16.9ms
193:    learn: 0.0078671    total: 467ms    remaining: 14.4ms
194:    learn: 0.0078669    total: 469ms    remaining: 12ms
195:    learn: 0.0076908    total: 472ms    remaining: 9.62ms
196:    learn: 0.0075966    total: 474ms    remaining: 7.22ms
197:    learn: 0.0075902    total: 476ms    remaining: 4.81ms
198:    learn: 0.0075845    total: 478ms    remaining: 2.4ms
199:    learn: 0.0075563    total: 481ms    remaining: 0us
Random Search - Best Hyperparameters: {'learning_rate': 0.1, 'iterations': 200, 'depth': 3}

Step 4: Model Evaluation

Now, we can evaluate the models using the best hyperparameters found by each method.

Python

# Evaluate the Grid Search model
grid_search_model = grid_search.best_estimator_
y_pred_grid = grid_search_model.predict(X_test)
 
# Print classification report for the Grid Search model
print("Grid Search - Classification Report:")
print(classification_report(y_test, y_pred_grid))

Output:

Grid Search - Classification Report:
              precision    recall  f1-score   support
           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71
    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

Now we will check the classification report for the Random Search Model.

Python3

# Evaluate the Random Search model
random_search_model = random_search.best_estimator_
y_pred_random = random_search_model.predict(X_test)
 
# Print classification report for the Random Search model
print("Random Search - Classification Report:")
print(classification_report(y_test, y_pred_random))

Output:

Random Search - Classification Report:
              precision    recall  f1-score   support
           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71
    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

Step 5: Deployment

Deployment considerations depend on your specific project requirements. You can save the best-tuned model to a file and load it for making predictions on new data. To deploy and save the best model obtained after hyperparameter tuning, you can use the following code:

Python

import joblib
 
# Deploy the best Grid Search model (you can also deploy the best Random Search model similarly)
best_model = grid_search.best_estimator_
 
# Save the best model to a file using joblib
model_filename = "best_catboost_model.joblib"
joblib.dump(best_model, model_filename)
 
# Later, to load and use the saved model for predictions:
loaded_model = joblib.load(model_filename)
 
# Make predictions using the loaded model
y_pred_loaded = loaded_model.predict(X_test)
 
# Print classification report for the loaded model
print("Loaded Model - Classification Report:")
print(classification_report(y_test, y_pred_loaded))

Output:

Loaded Model - Classification Report:
              precision    recall  f1-score   support
           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71
    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

Conclusion

In this article, we explored Grid Search and Random Search for hyperparameter tuning in CatBoost models. We used these methods on a real dataset to highlight their importance in improving model performance. We also covered model deployment, demonstrating how to keep the best-tuned CatBoost model for usage in real-world applications. You may improve the precision and potency of your CatBoost models in a variety of classification problems by becoming an expert at these techniques.

Reffered: https://www.geeksforgeeks.org

AI ML DS

Related
Build, Test, and Deploy Model With AutoML
Feature Descriptor in Image Processing
PyTorch Loss Functions
Clustering Metrics in Machine Learning
Swish Activation Function

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	11