Horje
Classifying Sonar Data: Rocks vs. Mines Using Machine Learning

The objective of this project is to classify sonar data to differentiate between rocks and mines using machine learning techniques. Sonar data, collected through sound waves, is processed to detect underwater objects. Machine learning models can analyze this data to predict whether an object is a rock or a mine.

Classification Project: Differentiating Between Rocks and Mines

Dataset Description

The dataset used in this project is sonar.all-data.csv, which contains sonar signals data collected to distinguish between rocks and mines.

Dataset Link – SonarData

  • This data is essential for training machine learning models that can predict whether a sonar reading corresponds to a rock or a mine.
  • The dataset consists of 60 numerical feature columns followed by a class label column.
  • Each feature represents a measurement from the sonar signals, and the class label indicates whether the reading corresponds to a rock (R) or a mine (M).

General Approach

  1. Exploratory Data Analysis (EDA): Understand the structure and characteristics of the dataset.
  2. Data Preparation: Prepare the dataset for machine learning models.
  3. Model Development: Train and evaluate machine learning models to classify sonar data.
  4. Model Evaluation: Check the performance of the trained models.

Let’s Implement this project stepwise, and classify between Rocks and Mines

Step 1: Exploratory Data Analysis (EDA)

Understand the structure and characteristics of the dataset.

Import Libraries and Dataset

  • We begin by importing necessary Python libraries such as NumPy, Pandas, and Matplotlib for data manipulation and visualization.
  • Next, we load the sonar dataset into a Pandas DataFrame from a CSV file.
Python
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Load the dataset
data_path = "C:\\Users\\Tonmoy\\Downloads\\Dataset\\sonar.all-data.csv"
data = pd.read_csv(data_path, header=None)

Check the Preview of Data

Display the first few rows of the dataset to understand its structure.

Python
print(df.head())

Output:

First few rows of the dataset:
       0       1       2       3       4       5       6       7       8   ...      52      53      54      55      56      57      58      59  60
0  0.0200  0.0371  0.0428  0.0207  0.0954  0.0986  0.1539  0.1601  0.3109  ...  0.0065  0.0159  0.0072  0.0167  0.0180  0.0084  0.0090  0.0032   R
1  0.0453  0.0523  0.0843  0.0689  0.1183  0.2583  0.2156  0.3481  0.3337  ...  0.0089  0.0048  0.0094  0.0191  0.0140  0.0049  0.0052  0.0044   R
2  0.0262  0.0582  0.1099  0.1083  0.0974  0.2280  0.2431  0.3771  0.5598  ...  0.0166  0.0095  0.0180  0.0244  0.0316  0.0164  0.0095  0.0078   R
3  0.0100  0.0171  0.0623  0.0205  0.0205  0.0368  0.1098  0.1276  0.0598  ...  0.0036  0.0150  0.0085  0.0073  0.0050  0.0044  0.0040  0.0117   R
4  0.0762  0.0666  0.0481  0.0394  0.0590  0.0649  0.1209  0.2467  0.3564  ...  0.0054  0.0105  0.0110  0.0015  0.0072  0.0048  0.0107  0.0094   R

[5 rows x 61 columns]

Check the Dataset Shape

Check the number of rows and columns to get an overview of the dataset’s size.

Python
print("Shape of the dataset:", data.shape)

Output:

Shape of the dataset: (208, 61)

View the Statistical Summary

Use describe() to obtain statistical metrics like mean, standard deviation, and percentiles.

Python
print(data.describe())

Output:

               0           1           2           3           4           5   ...          54          55          56          57          58          
59
count  208.000000  208.000000  208.000000  208.000000  208.000000  208.000000  ...  208.000000  208.000000  208.000000  208.000000  208.000000  208.000000
mean     0.029164    0.038437    0.043832    0.053892    0.075202    0.104570  ...    0.009290    0.008222    0.007820    0.007949    0.007941    0.006507
std      0.022991    0.032960    0.038428    0.046528    0.055552    0.059105  ...    0.007088    0.005736    0.005785    0.006470    0.006181    0.005031
min      0.001500    0.000600    0.001500    0.005800    0.006700    0.010200  ...    0.000600    0.000400    0.000300    0.000300    0.000100    0.000600
25%      0.013350    0.016450    0.018950    0.024375    0.038050    0.067025  ...    0.004150    0.004400    0.003700    0.003600    0.003675    0.003100
50%      0.022800    0.030800    0.034300    0.044050    0.062500    0.092150  ...    0.007500    0.006850    0.005950    0.005800    0.006400    0.005300
75%      0.035550    0.047950    0.057950    0.064500    0.100275    0.134125  ...    0.012100    0.010575    0.010425    0.010350    0.010325    0.008525
max      0.137100    0.233900    0.305900    0.426400    0.401000    0.382300  ...    0.044700    0.039400    0.035500    0.044000    0.036400    0.043900

Step 2: Data Preparation

Prepare the dataset for machine learning models.

Separate Features and Target

The last column is the target variable (rock or mine), and the rest are features.

Python
X = df.iloc[:, :-1]  # Features
y = df.iloc[:, -1]   # Target

Encode Labels

Convert categorical labels (‘R’ for rock and ‘M’ for mine) into numerical format.

Python
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)

Split Data

Divide the dataset into training and testing sets to evaluate model performance.

Python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Step 3: Model Development

Train and evaluate machine learning models to classify sonar data.

k-Nearest Neighbors (kNN)

  • Initialize Variables: Define a range of neighbor values to test and prepare to store accuracy results.
Python
from sklearn.neighbors import KNeighborsClassifier
neighbors = np.arange(1, 14)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

Train kNN Model

Fit kNN models with different neighbor values and record accuracy.

Python
for i, k in enumerate(neighbors):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    train_accuracy[i] = knn.score(X_train, y_train)
    test_accuracy[i] = knn.score(X_test, y_test)

Plot Results

Visualize accuracy for different neighbor values to select the best k.

Python
plt.plot(neighbors, test_accuracy, label='Testing Accuracy')
plt.plot(neighbors, train_accuracy, label='Training Accuracy')
plt.xlabel('Number of Neighbors')
plt.ylabel('Accuracy')
plt.title('k-NN Varying Number of Neighbors')
plt.legend()
plt.show()

Output:

Figure_1

Plot the K-NN

Final kNN Model

Train the kNN model with the optimal number of neighbors and make predictions.

Python
knn = KNeighborsClassifier(n_neighbors=2)
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)

Logistic Regression

Fit a logistic regression model to the training data.

Python
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred_logistic = model.predict(X_test)

Principal Component Analysis (PCA)

Reduce the feature dimensions using PCA and fit a Logistic Regression model to the reduced features.

Python
# Initialize PCA and reduce the number of components
pca = PCA(n_components=10)  # Adjust the number of components as needed
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

# Train Logistic Regression model on PCA-transformed data
model_pca = LogisticRegression()
model_pca.fit(X_train_pca, y_train)

# Make predictions
y_pred_pca = model_pca.predict(X_test_pca)

Support Vector Machines (SVM)

Python
# Train SVM model on original features
svm = SVC(kernel='linear')  # You can use different kernels like 'rbf' as well
svm.fit(X_train, y_train)

# Make predictions
y_pred_svm = svm.predict(X_test)

Step 4. Model Evaluation

Check the performance of the trained models.

Evaluate kNN

Compute the accuracy of the kNN model on the test set.

Python
from sklearn.metrics import accuracy_score, confusion_matrix
print("kNN Accuracy:", accuracy_score(y_test, y_pred_knn))

Output:

kNN Accuracy: 0.8809523809523809

Confusion Matrix

Display the confusion matrix to understand prediction results.

Python
print("kNN Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_knn))

Output:

kNN Confusion Matrix:
[[25  1]
 [ 4 12]]

Evaluate Logistic Regression

Compute the accuracy of the logistic regression model.

Python
print("Logistic Regression Accuracy:", accuracy_score(y_test, y_pred_logistic))

Output:

Logistic Regression Accuracy: 0.7857142857142857

Confusion Matrix

Show the confusion matrix for logistic regression results.

Python
print("Logistic Regression Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_logistic))

Output:

Logistic Regression Confusion Matrix:
[[19  7]
 [ 2 14]]

Evaluate the PCA-based model

Python
# Evaluate the PCA-based model
print("PCA + Logistic Regression Accuracy:", accuracy_score(y_test, y_pred_pca))
print("PCA + Logistic Regression Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_pca))

Output:

PCA + Logistic Regression Accuracy: 0.7619047619047619
PCA + Logistic Regression Confusion Matrix:
[[18  8]
 [ 2 14]]

Evaluate the SVM model

Python
# Evaluate the SVM model
print("SVM Accuracy:", accuracy_score(y_test, y_pred_svm))
print("SVM Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_svm))

Output:

SVM Accuracy: 0.8571428571428571
SVM Confusion Matrix:
[[22  4]
 [ 2 14]]

Conclusion

In this project, the k-Nearest Neighbors (kNN) algorithm demonstrated better performance compared to Logistic Regression in classifying sonar data into rocks and mines. The accuracy of the kNN model was higher, making it a more suitable choice for this task.




Reffered: https://www.geeksforgeeks.org


AI ML DS

Related
Roles of Generative AI in Drug is Discovery: Advantages, Case Studies and Examples Roles of Generative AI in Drug is Discovery: Advantages, Case Studies and Examples
Implicit Matrix Factorization in NLP Implicit Matrix Factorization in NLP
NumPy's polyfit Function : A Comprehensive Guide NumPy's polyfit Function : A Comprehensive Guide
Why Pandas is Used in Python Why Pandas is Used in Python
How to Change Label Font Sizes in Seaborn How to Change Label Font Sizes in Seaborn

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
23