What is Batch Normalization In Deep Learning? - Coding

Internal covariate shift is a major challenge encountered while training deep learning models. Batch normalization was introduced to address this issue. In this article, we are going to learn the fundamentals and need of Batch normalization. We are also going to perform batch normalization.

Table of Content

What is Batch Normalization?
Need for Batch Normalization
Fundamentals of Batch Normalization
Batch Normalization in TensorFlow
Batch Normalization in PyTorch
Benefits of Batch Normalization
Conclusion

What is Batch Normalization?

Batch normalization was introduced to mitigate the internal covariate shift problem in neural networks by Sergey Ioffe and Christian Szegedy in 2015. The normalization process involves calculating the mean and variance of each feature in a mini-batch and then scaling and shifting the features using these statistics. This ensures that the input to each layer remains roughly in the same distribution, regardless of changes in the distribution of earlier layers’ outputs. Consequently, Batch Normalization helps in stabilizing the training process, enabling higher learning rates and faster convergence.

Need for Batch Normalization

Batch Normalization is extension of concept of normalization from just the input layer to the activations of each hidden layer throughout the neural network. By normalizing the activations of each layer, Batch Normalization helps to alleviate the internal covariate shift problem, which can hinder the convergence of the network during training.

In traditional neural networks, as the input data propagates through the network, the distribution of each layer’s inputs changes. This phenomenon, known as internal covariate shift, can slow down the training process. Batch Normalization aims to mitigate this issue by normalizing the inputs of each layer.

The inputs to each hidden layer are the activations from the previous layer. If these activations are normalized, it ensures that the network is consistently presented with inputs that have a similar distribution, regardless of the training stage. This stability in the distribution of inputs allows for smoother and more efficient training.

By applying Batch Normalization into the hidden layers of the network, the gradients propagated during backpropagation are less likely to vanish or explode, leading to more stable training dynamics. This ultimately facilitates faster convergence and better performance of the neural network on the given task.

Fundamentals of Batch Normalization

In this section, we are going to discuss the steps taken to perform batch normalization.

Step 1: Compute the Mean and Variance of Mini-Batches

For mini-batch of activations [Tex]x_1,x_2,…,x_m[/Tex], the mean [Tex]μ_B[/Tex] and variance [Tex]\sigma_{B}^{2}[/Tex] of the mini-batch are computed.

Step 2: Normalization

Each activation [Tex]x_i [/Tex]is normalized using the computed mean and variance of the mini-batch.

The normalization process subtracts the mean [Tex]\mu_B[/Tex] from each activation and divides by the square root of the variance [Tex]\sigma_{B}^{2}[/Tex], ensuring that the normalized activations have a zero mean and unit variance.

Additionally, a small constant [Tex]\epsilon[/Tex] is added to the denominator for numerical stability, particularly to prevent division by zero.

[Tex]\widehat{x_i} = \frac{x_i – \mu_{B}}{\sqrt{\sigma_{B}^{2} +\epsilon}}[/Tex]

Step 3: Scale and Shift the Normalized Activations

The normalized activations [Tex]x^i[/Tex] are then scaled by a learnable parameter [Tex]\gamma[/Tex] and shifted by another learnable parameter [Tex]\beta[/Tex]. These parameters allow the model to learn the optimal scaling and shifting of the normalized activations, giving the network additional flexibility.

[Tex]y_i = \gamma \widehat{x_i} + \beta[/Tex]

Batch Normalization in TensorFlow

In the provided pseudo code, we have used a simple neural network model with batch normalization using TensorFlow’s Keras API. We have added, the batch normalization layer using ‘tf.keras.layers.BatchNormalization()‘ to normalize the activations of the previous layer.

import tensorflow as tf

# Define a simple model
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, input_shape=(784,)),
tf.keras.layers.BatchNormalization(), # Add Batch Normalization layer
tf.keras.layers.Activation('relu'),
tf.keras.layers.Dense(10),
tf.keras.layers.Activation('softmax')
])

# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=32)

Batch Normalization in PyTorch

In the following pseudo code, we have build a simple neural network with batch normalization using PyTorch. We have define a subclass of ‘nn.Module‘ and added the ‘nn.BatchNorm1D‘ after the first fully connected layer to normalize the activations.

We have used ‘nn.BatchNorm1D’ as the input data is one-dimensional, but for two-dimensional data, especially for Convolutional Neural Networks ‘BatchNorm2D’ is used.

import torch
import torch.nn as nn

# Define a simple model
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(784, 64)
self.bn = nn.BatchNorm1d(64) # Add Batch Normalization layer
self.relu = nn.ReLU()
self.fc2 = nn.Linear(64, 10)
self.softmax = nn.Softmax(dim=1)

def forward(self, x):
x = self.fc1(x)
x = self.bn(x) # Apply Batch Normalization
x = self.relu(x)
x = self.fc2(x)
x = self.softmax(x)
return x

# Instantiate the model
model = Model()

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Train the model
for epoch in range(5):
for inputs, labels in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

Benefits of Batch Normalization

Faster Convergence: Batch Normalization reduces internal covariate shift, allowing for faster convergence during training.
Higher Learning Rates: With Batch Normalization, higher learning rates can be used without the risk of divergence.
Regularization Effect: Batch Normalization introduces a slight regularization effect that reduces the need for adding regularization techniques like dropout.

Conclusion

Batch Normalization is a powerful technique for stabilizing the training of deep neural networks. By normalizing the inputs of each layer, it addresses issues like vanishing gradients and accelerates convergence. In Keras, integrating Batch Normalization into your models is simple and can lead to significant improvements in performance. As you explore the depths of deep learning, consider harnessing the power of Batch Normalization to enhance the training of your neural networks.

Reffered: https://www.geeksforgeeks.org

AI ML DS

Related
10 AI tools for Software Testing 2024
Handwritten Digit Recognition with OpenCV
Deterministic vs Stochastic Environment in AI
Linear Algebra Techniques in Data Science
Customer Segmentation via Cluster Analysis

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	14