Hyperparameter tuning SVM parameters using Genetic Algorithm - Coding

The performance support Vector Machines (SVMs) are heavily dependent on hyperparameters such as the regularization parameter (C) and the kernel parameters (gamma for RBF kernel). Genetic Algorithms (GAs) leverage evolutionary principles to search for optimal hyperparameter values.

This article explores the use of Genetic Algorithms for tuning SVM parameters, discussing their implementation and advantages.

Hyperparameters of Support Vector Machines (SVMs)

Support Vector Machines (SVMs) are supervised learning models for classification and regression tasks. They work by finding the hyperplane that best separates the data into different classes, maximizing the margin between them.

Key hyperparameters for SVMs include:

C (Regularization Parameter): Controls the trade-off between achieving a low training error and a low testing error.
Kernel Parameters: These include parameters specific to the chosen kernel function, such as gamma for the RBF kernel.

GA for Hyperparameter Tuning SVM Parameters

For SVMs, the hyperparameters (C and gamma) are encoded as chromosomes. Each gene in the chromosome represents a specific hyperparameter.

The fitness function evaluates the performance of the SVM model with a given set of hyperparameters, typically using cross-validation to measure accuracy or another relevant metric.

GA Workflow for Hyperparameter Tuning

Initialization: Generate an initial population of potential hyperparameter sets.
Selection: Choose parent solutions based on their fitness scores.
Crossover: Combine parent solutions to produce offspring with traits from both parents.
Mutation: Introduce random variations to offspring to maintain diversity.
Evaluation: Assess the fitness of the new solutions.
Iteration: Repeat the process for multiple generations until convergence or a stopping criterion is met.

Pseudocode

Initialize population with random hyperparameter sets
Evaluate fitness of each individual in the population
while (termination criteria not met) do:
Select parents based on fitness
Apply crossover to produce offspring
Apply mutation to offspring
Evaluate fitness of offspring
Select individuals for the next generation
end while
Return the best hyperparameter set

Optimizing SVM Hyperparameters with Genetic Algorithms

Step 1: Install Necessary Packages

This step installs the required Python packages deap and scikit-learn using pip. These packages are necessary for running the genetic algorithm and for the machine learning tasks, respectively.

pip install deap

Step 2: Import Libraries

In this step, we import the necessary libraries for implementing the genetic algorithm and machine learning functionalities. random is used for random number generation, numpy for numerical operations, datasets, cross_val_score, and SVC from sklearn for loading the dataset, cross-validation, and SVM classifier respectively. The deap library provides the tools needed for the genetic algorithm.

import random
import np
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC
from deap import base, creator, tools, algorithms

Step 3: Load Dataset

Here, we load the digits dataset from scikit-learn. This dataset is a collection of handwritten digits and is a good example for demonstrating the use of machine learning classifiers. We then separate the data into features (X) and target labels (y).

# Load dataset
data = datasets.load_digits()
X = data.data
y = data.target

Step 4: Define Evaluation Function with Error Handling

We define a function evaluate that will be used to assess the performance of an individual in the genetic algorithm. The individual represents a set of hyperparameters for the SVM classifier (C and gamma). We ensure the values of C and gamma are at least 0.1 to avoid invalid parameter values. The SVM classifier is trained and evaluated using cross-validation, and the mean score is returned. If any error occurs during evaluation, a poor score is assigned.

# Define evaluation function with error handling
def evaluate(individual):
C = max(0.1, individual[0])
gamma = max(0.1, individual[1])
try:
clf = SVC(C=C, gamma=gamma)
score = cross_val_score(clf, X, y, cv=5).mean()
except Exception as e:
score = -1 # Assign a poor score if there's an error
return score,

Step 5: Setup Genetic Algorithm Toolbox

This step involves setting up the DEAP toolbox for the genetic algorithm. We define the fitness function to be maximized and the structure of an individual (a list with a fitness attribute). We then register the functions for creating attributes (random float), individuals (repeated attributes), and populations (repeated individuals). The genetic operators for crossover, mutation, selection, and evaluation are also registered.

# Genetic Algorithm setup
toolbox = base.Toolbox()
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)

toolbox.register("attr_float", random.uniform, 0.1, 10)
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_float, 2)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

toolbox.register("mate", tools.cxBlend, alpha=0.5)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=1, indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("evaluate", evaluate)

Step 6: Define Main Function to Run Genetic Algorithm

We define the main function that initializes the random seed for reproducibility and creates the initial population. We set up statistics to be recorded during the genetic algorithm run, including average, standard deviation, minimum, and maximum fitness. The genetic algorithm is then executed using eaSimple, which runs the algorithm for a specified number of generations with given crossover and mutation probabilities.

# Genetic Algorithm execution
def main():
random.seed(42)

# Create initial population
population = toolbox.population(n=50)

# Define statistics to be recorded
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("std", np.std)
stats.register("min", np.min)
stats.register("max", np.max)

# Run genetic algorithm
population, logbook = algorithms.eaSimple(population, toolbox, cxpb=0.5, mutpb=0.2, ngen=40, stats=stats, verbose=True)

return population, logbook

Step 7: Execute Main Function and Output Results

In the final step, we execute the main function and retrieve the best individual from the final population. We then extract the best hyperparameters (C and gamma) and print the best individual along with its fitness score and hyperparameters.

if __name__ == "__main__":
population, logbook = main()

# Get the best individual
best_individual = tools.selBest(population, 1)[0]
best_C = max(0.1, best_individual[0])
best_gamma = max(0.1, best_individual[1])

print(f"Best individual: {best_individual}")
print(f"Best fitness: {best_individual.fitness.values[0]}")
print(f"Best hyperparameters: C={best_C}, gamma={best_gamma}")

Complete Code

Python

# Install necessary packages
!pip install deap scikit-learn

import random
import numpy as np
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC
from deap import base, creator, tools, algorithms

# Load dataset
data = datasets.load_digits()
X = data.data
y = data.target

# Define evaluation function with error handling
def evaluate(individual):
    C = max(0.1, individual[0])
    gamma = max(0.1, individual[1])
    try:
        clf = SVC(C=C, gamma=gamma)
        score = cross_val_score(clf, X, y, cv=5).mean()
    except Exception as e:
        score = -1  # Assign a poor score if there's an error
    return score,

# Genetic Algorithm setup
toolbox = base.Toolbox()
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)

toolbox.register("attr_float", random.uniform, 0.1, 10)
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_float, 2)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

toolbox.register("mate", tools.cxBlend, alpha=0.5)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=1, indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("evaluate", evaluate)

# Genetic Algorithm execution
def main():
    random.seed(42)

    # Create initial population
    population = toolbox.population(n=50)

    # Define statistics to be recorded
    stats = tools.Statistics(lambda ind: ind.fitness.values)
    stats.register("avg", np.mean)
    stats.register("std", np.std)
    stats.register("min", np.min)
    stats.register("max", np.max)

    # Run genetic algorithm
    population, logbook = algorithms.eaSimple(population, toolbox, cxpb=0.5, mutpb=0.2, ngen=40, stats=stats, verbose=True)

    return population, logbook

if __name__ == "__main__":
    population, logbook = main()

    # Get the best individual
    best_individual = tools.selBest(population, 1)[0]
    best_C = max(0.1, best_individual[0])
    best_gamma = max(0.1, best_individual[1])

    print(f"Best individual: {best_individual}")
    print(f"Best fitness: {best_individual.fitness.values[0]}")
    print(f"Best hyperparameters: C={best_C}, gamma={best_gamma}")

# Train and test the final model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
final_model = SVC(C=best_individual[0], gamma=best_individual[1])
final_model.fit(X_train, y_train)
print(f'Test accuracy: {final_model.score(X_test, y_test)}')

Output:

gen nevals avg std min max
0 50 0.107677 0.00987644 0.101281 0.139164
1 33 0.113994 0.0121825 0.101281 0.140279
2 30 0.124992 0.0120814 0.101838 0.153649
3 32 0.132813 0.00799749 0.110752 0.150864
4 32 0.134774 0.0107546 0.101838 0.154206
5 29 0.139799 0.00769207 0.107409 0.154206
6 23 0.142953 0.0092506 0.101838 0.155877
7 24 0.147688 0.00616172 0.136379 0.155877
8 20 0.151276 0.00503346 0.135822 0.155877
9 34 0.151811 0.00685508 0.121894 0.155877
10 28 0.152724 0.00622722 0.134708 0.155877
11 21 0.154084 0.0080811 0.101281 0.155877
12 31 0.155454 0.00280674 0.135822 0.155877
13 38 0.154763 0.00440217 0.135822 0.155877
14 28 0.155309 0.00301966 0.135822 0.155877
15 27 0.154017 0.00824709 0.101838 0.155877
16 25 0.154072 0.00547047 0.135822 0.155877
17 25 0.155811 0.000467967 0.152535 0.155877
18 33 0.15288 0.00929871 0.101838 0.155877
19 22 0.154752 0.00447696 0.135822 0.155877
20 31 0.15454 0.00458452 0.135822 0.155877
21 40 0.154373 0.00513775 0.135822 0.155877
22 29 0.155265 0.003148 0.135822 0.155877
23 30 0.154396 0.00801579 0.101838 0.155877
24 32 0.152813 0.00940053 0.101838 0.155877
25 32 0.153627 0.00872855 0.101838 0.155877
26 33 0.154295 0.00536714 0.135822 0.155877
27 27 0.155476 0.0028078 0.135822 0.155877
28 30 0.154641 0.00471836 0.135822 0.155877
29 32 0.154396 0.00801579 0.101838 0.155877
30 35 0.154072 0.00531391 0.135822 0.155877
31 31 0.154173 0.00517646 0.135822 0.155877
32 24 0.154429 0.0061524 0.119109 0.155877
33 30 0.153404 0.0107497 0.101838 0.155877
34 32 0.155298 0.00304926 0.135822 0.155877
35 36 0.154674 0.00476297 0.135822 0.155877
36 35 0.154507 0.00768577 0.101838 0.155877
37 21 0.155877 2.77556e-17 0.155877 0.155877
38 27 0.154184 0.00576914 0.131365 0.155877
39 36 0.154474 0.0055802 0.126351 0.155877
40 26 0.155153 0.0034742 0.135822 0.155877
Best individual: [-6.96604485823403, 1.256273035647874]
Best fitness: 0.15587743732590528
Best hyperparameters: C=0.1, gamma=1.256273035647874

The output provided represents the progress and results of the genetic algorithm over 40 generations. Here’s a detailed explanation of the key parts:

Generational Statistics

The table shows statistics for each generation:

gen: The generation number.
nevals: The number of individuals evaluated in that generation.
avg: The average fitness value of the population in that generation.
std: The standard deviation of the fitness values, indicating the variability within the population.
min: The minimum fitness value in the population.
max: The maximum fitness value in the population.

This table helps track the genetic algorithm’s progress, showing how the fitness of the population improves (or not) over generations.

The best individual found by the genetic algorithm is represented as:

Best individual: [-6.96604485823403, 1.256273035647874]

This individual corresponds to the hyperparameters:

C=0.1 (adjusted to the minimum value allowed)
[Tex]\gamma = 1.256273035647874[/Tex]

The fitness value of the best individual is:

Best fitness: 0.15587743732590528

This value represents the highest cross-validation score achieved by the SVM classifier with the best hyperparameters during the genetic algorithm run.

Advantages of Using GA for Hyperparameter Tuning

Efficient Exploration of the Search Space: GAs focus on promising regions, reducing the time needed to find optimal hyperparameters.
Ability to Escape Local Optima: GAs’ stochastic nature helps them avoid being trapped in suboptimal solutions.
Scalability to Complex Models: GAs are effective even with large, complex hyperparameter spaces.
Balancing Exploration and Exploitation: GAs maintain diversity while refining good solutions.

Conclusion

Hyperparameter tuning is essential for optimizing machine learning models, and Genetic Algorithms offer an efficient and effective solution. GAs provide a balance between exploration and exploitation, making them suitable for complex hyperparameter spaces. While they come with computational challenges, their advantages often outweigh the drawbacks. As machine learning continues to evolve, GAs will likely play an increasingly important role in hyperparameter optimization.

Reffered: https://www.geeksforgeeks.org

AI ML DS

Related
How to Get an Internship as a Marketing Analyst
Masked Autoencoders in Deep Learning
Internal Covariant Shift Problem in Deep Learning
Computer Vision Datasets
What is the difference between Object Localization, Object Recognition, and Object Detection?

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	15