![]() |
Internal covariate shift is a major challenge encountered while training deep learning models. Batch normalization was introduced to address this issue. In this article, we are going to learn the fundamentals and need of Batch normalization. We are also going to perform batch normalization. Table of Content What is Batch Normalization?Batch normalization was introduced to mitigate the internal covariate shift problem in neural networks by Sergey Ioffe and Christian Szegedy in 2015. The normalization process involves calculating the mean and variance of each feature in a mini-batch and then scaling and shifting the features using these statistics. This ensures that the input to each layer remains roughly in the same distribution, regardless of changes in the distribution of earlier layers’ outputs. Consequently, Batch Normalization helps in stabilizing the training process, enabling higher learning rates and faster convergence. Need for Batch NormalizationBatch Normalization is extension of concept of normalization from just the input layer to the activations of each hidden layer throughout the neural network. By normalizing the activations of each layer, Batch Normalization helps to alleviate the internal covariate shift problem, which can hinder the convergence of the network during training.
The inputs to each hidden layer are the activations from the previous layer. If these activations are normalized, it ensures that the network is consistently presented with inputs that have a similar distribution, regardless of the training stage. This stability in the distribution of inputs allows for smoother and more efficient training. By applying Batch Normalization into the hidden layers of the network, the gradients propagated during backpropagation are less likely to vanish or explode, leading to more stable training dynamics. This ultimately facilitates faster convergence and better performance of the neural network on the given task. Fundamentals of Batch NormalizationIn this section, we are going to discuss the steps taken to perform batch normalization. Step 1: Compute the Mean and Variance of Mini-BatchesFor mini-batch of activations [Tex]x_1,x_2,…,x_m[/Tex], the mean [Tex]μ_B[/Tex] and variance [Tex]\sigma_{B}^{2}[/Tex] of the mini-batch are computed. Step 2: NormalizationEach activation [Tex]x_i [/Tex]is normalized using the computed mean and variance of the mini-batch. The normalization process subtracts the mean [Tex]\mu_B[/Tex] from each activation and divides by the square root of the variance [Tex]\sigma_{B}^{2}[/Tex], ensuring that the normalized activations have a zero mean and unit variance. Additionally, a small constant [Tex]\epsilon[/Tex] is added to the denominator for numerical stability, particularly to prevent division by zero. [Tex]\widehat{x_i} = \frac{x_i – \mu_{B}}{\sqrt{\sigma_{B}^{2} +\epsilon}}[/Tex] Step 3: Scale and Shift the Normalized ActivationsThe normalized activations [Tex]x^i[/Tex] are then scaled by a learnable parameter [Tex]\gamma[/Tex] and shifted by another learnable parameter [Tex]\beta[/Tex]. These parameters allow the model to learn the optimal scaling and shifting of the normalized activations, giving the network additional flexibility. [Tex]y_i = \gamma \widehat{x_i} + \beta[/Tex] Batch Normalization in TensorFlowIn the provided pseudo code, we have used a simple neural network model with batch normalization using TensorFlow’s Keras API. We have added, the batch normalization layer using ‘tf.keras.layers.BatchNormalization()‘ to normalize the activations of the previous layer. import tensorflow as tf Batch Normalization in PyTorchIn the following pseudo code, we have build a simple neural network with batch normalization using PyTorch. We have define a subclass of ‘nn.Module‘ and added the ‘nn.BatchNorm1D‘ after the first fully connected layer to normalize the activations. We have used ‘nn.BatchNorm1D’ as the input data is one-dimensional, but for two-dimensional data, especially for Convolutional Neural Networks ‘BatchNorm2D’ is used. import torch Benefits of Batch Normalization
ConclusionBatch Normalization is a powerful technique for stabilizing the training of deep neural networks. By normalizing the inputs of each layer, it addresses issues like vanishing gradients and accelerates convergence. In Keras, integrating Batch Normalization into your models is simple and can lead to significant improvements in performance. As you explore the depths of deep learning, consider harnessing the power of Batch Normalization to enhance the training of your neural networks. |
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 14 |