Long Short Term Memory Networks using PyTorch - Coding

Long Short-Term Memory Networks (LSTMs) are used for sequential data analysis. LSTM offers solutions to the challenges of learning long-term dependencies. In this article, explore how LSTM works, and how we can build and train LSTM models in PyTorch.

Long Short-Term Memory Networks (LSTMs)

The difficulties of conventional RNNs in learning, and remembering long-term relationships in sequential data were especially addressed by the construction of LSTMs, a form of recurrent neural network architecture. To overcome the drawbacks of RNNs, LSTMs introduce the idea of a “cell.” This cell has an intricate structural design that allows it to selectively recall or forget specific information. The efficacy of LSTMs relies on their ability to update, forget, and retain information using a set of specialized gates.

The LSTM cell consists of the following components:

Cells: The memory units of LSTMs.
Forget Gate: Decides which information from the previous cell state to discard.
Input Gate: Determines which new information from the current input to remember.
Output Gate: Controls what information from the current cell state to expose as the output.

With these gates, LSTMs can effectively learn long-term dependencies within sequential data.

Implementing Long Short Term Memory using PyTorch

For implementing LSTMs using PyTorch, we will following the steps discussed below:

Step 1: Install Necessary Libraries

For this implementation, we will required PyTorch library, that we can install using the following command:

pip install torch torchvision

Step 2: Defining LSTM Model

For defining the LSTM model, we will define an LSTMModel class, which inherits from nn.Module in PyTorch. It includes an LSTM layer followed by a fully connected layer (linear layer) for the final output.

The forward method defines the forward pass of the model, where the input sequence x is passed through the LSTM layer, and the final hidden state is passed through the fully connected layer to produce the output.

The initial hidden state and cell state are initialized as zeros, and the gradients are detached to prevent backpropagation through time.

import torch
import torch.nn as nn

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim

        # LSTM layer
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)

        # Fully connected layer
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # Detach the gradients to prevent backpropagation through time
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))

        # Reshaping the outputs for the fully connected layer
        out = self.fc(out[:, -1, :])
        return out

Step 3: Model Training

To train the LSTM model, you will typically use a loss function like Mean Squared Error (MSE) for regression tasks or Cross-Entropy Loss for classification, along with an optimizer like Adam:

model = LSTMModel(input_dim=1, hidden_dim=100, layer_dim=1, output_dim=1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Training loop
for epoch in range(num_epochs):
    outputs = model(trainX)
    optimizer.zero_grad()
    loss = criterion(outputs, trainY)
    loss.backward()
    optimizer.step()

    print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

Complete Implementation: LSTM using PyTorch using Sequential Data

For this implementation, we will be following these steps:

Step 1: Import Libraries and Data Preparation

We have imported the necessary libraries in this step and generated synthetic sine wave data and created sequences for training LSTM model. The data is generated using np.sin(t), where t is a linspace from 0 to 100 with 1000 points. The function create_sequences(data, seq_length) creates input-output pairs for training the neural network. It creates sequences of length seq_length from the data, where each input sequence is followed by the corresponding output value. inally, the input sequences (X) and output values (y) are converted into PyTorch tensors using torch.tensor, preparing the data for training neural networks.

Python

import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(0)
torch.manual_seed(0)

# Generate synthetic sine wave data
t = np.linspace(0, 100, 1000)
data = np.sin(t)

# Function to create sequences
def create_sequences(data, seq_length):
    xs = []
    ys = []
    for i in range(len(data)-seq_length):
        x = data[i:(i+seq_length)]
        y = data[i+seq_length]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

seq_length = 10
X, y = create_sequences(data, seq_length)

# Convert data to PyTorch tensors
trainX = torch.tensor(X[:, :, None], dtype=torch.float32)
trainY = torch.tensor(y[:, None], dtype=torch.float32)

Step 2: Define LSTM Model

We will define LSTM model using pytorch.

Python

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
        out = self.fc(out[:, -1, :])  # Selecting the last output
        return out

Step 3: Model Training

After defining the model, we will train an LSTM neural network model using PyTorch to predict the next value in a synthetic sine wave sequence. It initializes the model, loss function (Mean Squared Error), and optimizer (Adam), then iterates through a specified number of epochs to train the model. During each epoch, the model’s output is computed, the loss is calculated, gradients are backpropagated, and the optimizer updates the model parameters. Finally, it prints the loss every 10 epochs to monitor training progress.

Python

model = LSTMModel(input_dim=1, hidden_dim=100, layer_dim=1, output_dim=1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

num_epochs = 100
for epoch in range(num_epochs):
    model.train()
    outputs = model(trainX)
    optimizer.zero_grad()
    loss = criterion(outputs, trainY)
    loss.backward()
    optimizer.step()

    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Output:

Epoch [10/100], Loss: 0.0682
Epoch [20/100], Loss: 0.0219
Epoch [30/100], Loss: 0.0046
Epoch [40/100], Loss: 0.0027
Epoch [50/100], Loss: 0.0010
Epoch [60/100], Loss: 0.0003
Epoch [70/100], Loss: 0.0001
Epoch [80/100], Loss: 0.0000
Epoch [90/100], Loss: 0.0000
Epoch [100/100], Loss: 0.0000

Step 4: Testing and Visualization

Python

import matplotlib.pyplot as plt

# Predicted outputs
model.eval()
predicted = model(trainX).detach().numpy()

# Adjusting the original data and prediction for plotting
# The prediction corresponds to the point just after each sequence
original = data[seq_length:]  # Original data from the end of the first sequence
time_steps = np.arange(seq_length, len(data))  # Corresponding time steps

plt.figure(figsize=(12, 6))
plt.plot(time_steps, original, label='Original Data')
plt.plot(time_steps, predicted, label='Predicted Data', linestyle='--')
plt.title('LSTM Model Predictions vs. Original Data')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.legend()
plt.show()

Output:

Conclusion

LSTM is capable to handle variety of sequence prediction problems. By using PyTorch’s flexible framework, you can build, train and deploy LSTM models.

Reffered: https://www.geeksforgeeks.org

AI ML DS

Related
Memory-bounded search ( Memory Bounded Heuristic Search ) in AI
What is Data Acquisition in Machine Learning?
How to Compute Entropy using SciPy?
Building Language Models in NLP
Building a Convolutional Neural Network using PyTorch

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	15