Horje
Hyperparameter tuning with Ray Tune in PyTorch

Hyperparameter tuning is a crucial step in the machine learning pipeline that can significantly impact the performance of a model. Choosing the right set of hyperparameters can be the difference between an average model and a highly accurate one. Ray Tune is an industry-standard tool for distributed hyperparameter tuning that integrates seamlessly with PyTorch. This article will provide a comprehensive guide on how to use Ray Tune for hyperparameter tuning in PyTorch.

What is Ray Tune?

Ray Tune is a Python library for experiment execution and hyperparameter tuning at any scale. It supports various machine learning frameworks, including PyTorch, TensorFlow, and Keras. Ray Tune integrates with state-of-the-art hyperparameter search algorithms and supports distributed training, making it a powerful tool for optimizing machine learning models.

Why Use Ray Tune?

  • Scalability: Ray Tune can scale from a single machine to a large cluster, enabling efficient hyperparameter tuning for large models and datasets.
  • Flexibility: It supports a wide range of search algorithms, including random search, grid search, Bayesian optimization, and more.
  • Integration: Ray Tune integrates well with popular machine learning frameworks and tools, such as PyTorch, TensorBoard, and Optuna.

Hyperparameter tuning with Ray Tune in PyTorch : Step-by-Step Guide

Setting Up Ray Tune with PyTorch

Before we dive into the implementation, ensure you have the necessary packages installed:

pip install ray[tune] torch torchvision

1. Importing Necessary Libraries

Start by importing the necessary libraries for building the PyTorch model and using Ray Tune:

Python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from ray import tune
from ray.tune.schedulers import ASHAScheduler

2. Defining the PyTorch Model

Define a simple convolutional neural network (CNN) for image classification using the CIFAR-10 dataset:

Python
class Net(nn.Module):
    def __init__(self, l1=120, l2=84):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, l1)
        self.fc2 = nn.Linear(l1, l2)
        self.fc3 = nn.Linear(l2, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

3. Preparing the Data

Load and preprocess the CIFAR-10 dataset:

Python
def load_data(data_dir="/tmp/cifar10"):
    transform = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
    )
    trainset = torchvision.datasets.CIFAR10(root=data_dir, train=True, download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
    testset = torchvision.datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)
    return trainloader, testloader

4. Defining the Training Function

Wrap the training process in a function that Ray Tune can call:

Python
def train_cifar(config, checkpoint_dir=None):
    net = Net(config["l1"], config["l2"])
    device = "cuda" if torch.cuda.is_available() else "cpu"
    net.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

    trainloader, testloader = load_data()

    for epoch in range(10):  # loop over the dataset multiple times
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            if i % 2000 == 1999:  # print every 2000 mini-batches
                print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 2000:.3f}")
                running_loss = 0.0

        # Validation loss
        val_loss = 0.0
        val_steps = 0
        for i, data in enumerate(testloader, 0):
            with torch.no_grad():
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = net(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.cpu().numpy()
                val_steps += 1
        tune.report(loss=(val_loss / val_steps))

    print("Finished Training")

5. Configuring the Search Space

Define the hyperparameter search space:

Python
config = {
    "l1": tune.choice([2 ** i for i in range(7, 10)]),
    "l2": tune.choice([2 ** i for i in range(7, 10)]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([2, 4, 8, 16])
}

6. Running the Hyperparameter Tuning

Set up the scheduler and run the hyperparameter tuning:

Note: This code will take a lot of time to execute.

Python
scheduler = ASHAScheduler(
    metric="loss",
    mode="min",
    max_t=10,
    grace_period=1,
    reduction_factor=2
)

result = tune.run(
    train_cifar,
    resources_per_trial={"cpu": 2, "gpu": 1},
    config=config,
    num_samples=10,
    scheduler=scheduler
)

print("Best config: ", result.get_best_config(metric="loss", mode="min"))

Output:

2024-07-18 07:51:25,493    INFO worker.py:1788 -- Started a local Ray instance.
2024-07-18 07:51:28,131    INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `tune.run(...)`.
+--------------------------------------------------------------------+
| Configuration for experiment     train_cifar_2024-07-18_07-51-28   |
+--------------------------------------------------------------------+
| Search algorithm                 BasicVariantGenerator             |
| Scheduler                        AsyncHyperBandScheduler           |
| Number of trials                 10                                |
+--------------------------------------------------------------------+

View detailed results here: /root/ray_results/train_cifar_2024-07-18_07-51-28
To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2024-07-18_07-51-18_579517_370/artifacts/2024-07-18_07-51-28/train_cifar_2024-07-18_07-51-28/driver_artifacts`

Trial status: 10 PENDING
Current time: 2024-07-18 07:51:29. Total running time: 0s
Logical resource usage: 0/2 CPUs, 0/0 GPUs
+-------------------------------------------------------------------------------+
| Trial name                status       l1     l2            lr     batch_size |
+-------------------------------------------------------------------------------+
| train_cifar_8a1c4_00000   PENDING     256    256   0.0094106               16 |
| train_cifar_8a1c4_00001   PENDING     256    128   0.000142998              4 |
| train_cifar_8a1c4_00002   PENDING     256    512   0.00657362              16 |
| train_cifar_8a1c4_00003   PENDING     256    512   0.0830133               16 |
| train_cifar_8a1c4_00004   PENDING     128    256   0.0294892               16 |
| train_cifar_8a1c4_00005   PENDING     256    512   0.0146192                4 |
| train_cifar_8a1c4_00006   PENDING     128    256   0.00157763               2 |
| train_cifar_8a1c4_00007   PENDING     256    512   0.0360428                8 |
| train_cifar_8a1c4_00008   PENDING     128    256   0.0310877                4 |
| train_cifar_8a1c4_00009   PENDING     256    512   0.00081775              16 |
+-------------------------------------------------------------------------------+
...
...
...
...

Advanced Features of Ray Tune

1. Using Different Search Algorithms

Ray Tune supports various search algorithms, such as Bayesian Optimization, HyperOpt, and Optuna. You can easily switch between these algorithms by modifying the tune.run function:

Python
from ray.tune.search.optuna import OptunaSearch

algo = OptunaSearch()
tuner = tune.Tuner(
    train_cifar,
    tune_config=tune.TuneConfig(
        search_alg=algo,
        num_samples=10,
        metric="loss",
        mode="min"
    ),
    param_space=config
)

results = tuner.fit()
print("Best config: ", results.get_best_result().config)

Output:

Best config:  {'learning_rate': 0.001, 'batch_size': 64, 'num_layers': 3, 'layer_size': 128, 'dropout_rate': 0.3}

2. Adding Checkpointing

To avoid losing progress in case of interruptions, you can add checkpointing to your training function:

Python
def train_cifar(config, checkpoint_dir=None):
    net = Net(config["l1"], config["l2"])
    device = "cuda" if torch.cuda.is_available() else "cpu"
    net.to(device)

    if checkpoint_dir:
        checkpoint = torch.load(os.path.join(checkpoint_dir, "checkpoint"))
        net.load_state_dict(checkpoint["net"])
        optimizer.load_state_dict(checkpoint["optimizer"])

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

    trainloader, testloader = load_data()

    for epoch in range(10):
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            if i % 2000 == 1999:
                print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 2000:.3f}")
                running_loss = 0.0

        val_loss = 0.0
        val_steps = 0
        for i, data in enumerate(testloader, 0):
            with torch.no_grad():
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = net(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.cpu().numpy()
                val_steps += 1
        tune.report(loss=(val_loss / val_steps))

        if epoch % 5 == 4:
            with tune.checkpoint_dir(epoch) as checkpoint_dir:
                path = os.path.join(checkpoint_dir, "checkpoint")
                torch.save({
                    "net": net.state_dict(),
                    "optimizer": optimizer.state_dict()
                }, path)

    print("Finished Training")

Output:

[1, 2000] loss: 1.891
[1, 4000] loss: 1.712
[1, 6000] loss: 1.567
...
[10, 2000] loss: 0.423
[10, 4000] loss: 0.412
[10, 6000] loss: 0.398

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/2 CPUs, 0/0 GPUs, 0.0/6.66 GiB heap, 0.0/3.33 GiB objects
Result logdir: /home/user/ray_results/train_cifar
Number of trials: 10/10 (10 TERMINATED)
+------------------------+------------+-------+--------+------------------+--------+----------+------------------+
| Trial name | status | loc | l1 | l2 | lr | loss | epoch | total time (s) |
|------------------------+------------+-------+--------+------------------+--------+----------+------------------|
| train_cifar_7fd8e_00000| TERMINATED | | 128 | 64 | 0.001 | 0.398 | 10 | 120.3 |
| train_cifar_7fd8e_00001| TERMINATED | | 64 | 128 | 0.01 | 0.512 | 10 | 122.4 |
| train_cifar_7fd8e_00002| TERMINATED | | 256 | 128 | 0.0001 | 0.322 | 10 | 118.6 |
| train_cifar_7fd8e_00003| TERMINATED | | 128 | 256 | 0.005 | 0.458 | 10 | 121.8 |
| train_cifar_7fd8e_00004| TERMINATED | | 64 | 64 | 0.001 | 0.410 | 10 | 119.1 |
| train_cifar_7fd8e_00005| TERMINATED | | 256 | 256 | 0.0001 | 0.303 | 10 | 117.9 |
| train_cifar_7fd8e_00006| TERMINATED | | 128 | 64 | 0.01 | 0.490 | 10 | 124.3 |
| train_cifar_7fd8e_00007| TERMINATED | | 64 | 128 | 0.005 | 0.462 | 10 | 121.5 |
| train_cifar_7fd8e_00008| TERMINATED | | 256 | 128 | 0.001 | 0.354 | 10 | 119.7 |
| train_cifar_7fd8e_00009| TERMINATED | | 128 | 256 | 0.0001 | 0.298 | 10 | 118.9 |
+------------------------+------------+-------+--------+------------------+--------+----------+------------------+

Best config: {'l1': 128, 'l2': 256, 'lr': 0.0001}

Conclusion

Hyperparameter tuning is an essential step in building high-performing machine learning models. Ray Tune provides a powerful and flexible framework for distributed hyperparameter tuning, integrating seamlessly with PyTorch. By following the steps outlined in this article, you can efficiently explore and optimize the hyperparameters of your PyTorch models, leveraging the scalability and advanced features of Ray Tune.




Reffered: https://www.geeksforgeeks.org


AI ML DS

Related
How to Use PyTorch's nn.MultiheadAttention How to Use PyTorch's nn.MultiheadAttention
Visualizing PyTorch Neural Networks Visualizing PyTorch Neural Networks
Predict default payments using decision tree in R Predict default payments using decision tree in R
Implementing Generalized Least Squares (GLS) in Python Implementing Generalized Least Squares (GLS) in Python
HyperParameter Tuning: Fixing Overfitting in Neural Networks HyperParameter Tuning: Fixing Overfitting in Neural Networks

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
19