E-commerce product recommendations using catboost - Coding

In the dynamic world of e-commerce, personalized product recommendations stand as a cornerstone strategy to enhance user experience and boost sales. The integration of sophisticated machine learning algorithms has transformed how businesses predict and cater to individual customer preferences. One such powerful tool is CatBoost, a high-performance, open-source library developed by Yandex.

This article explores the impact of CatBoost on e-commerce product recommendations, elucidating its advantages and mechanisms along with the implementation.

Table of Content

E-commerce Product Recommendations
Benefits of CatBoost in E-commerce
Implementing E-commerce Product Recommendation using CatBoost
Applications of CatBoost in E-commerce
Conclusion

E-commerce Product Recommendations

E-commerce product recommendation is a feature commonly used in online retail to suggest products to customers based on various factors, including their browsing history, purchase behavior, product preferences, and other users’ similar actions. This technique is pivotal in personalizing the shopping experience and increasing customer engagement and sales.

Benefits of CatBoost in E-commerce

Handling Categorical Data: CatBoost excels in dealing with categorical data directly. Most machine learning models require extensive preprocessing to convert categorical data into numerical formats. CatBoost simplifies this process by efficiently processing categorical variables, thereby preserving the richness of data and reducing the preprocessing overhead.
Improved Recommendation Accuracy: By leveraging its advanced algorithms for regression and classification, CatBoost improves the accuracy of product recommendations. It considers historical data, user behavior, and item specifics to predict what products a customer will likely be interested in, enhancing the relevancy of recommendations provided.
Speed and Scalability: CatBoost is renowned for its execution speed and scalability, crucial for e-commerce platforms dealing with millions of users and items. It ensures that recommendations are generated quickly and efficiently, improving the responsiveness of the recommendation system.
Reduced Overfitting: With its built-in mechanisms to combat overfitting, CatBoost ensures that the recommendation models perform well not only on the training data but also on unseen data. This robustness is vital for maintaining the accuracy and reliability of product suggestions over time.

Implementing E-commerce Product Recommendation using CatBoost

This is a comprehensive illustrated step-by-step tutorial that explains how to use CatBoost to create an E-commerce product suggestion system. Additionally, we will use ipywidgets to create an interactive GUI.

Step 1: Import Libraries

Make sure you have installed the required libraries first:

pip install pandas catboost matplotlib seaborn ipywidgets

and then import the following libraries.

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import ipywidgets as widgets
from IPython.display import display

from catboost import CatBoostClassifier, Pool
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

Step 2: Load the Dataset

We’ll use the “Online Retail” dataset, for this implementation.

Python

import pandas as pd

# Load the dataset
url = "https://media.geeksforgeeks.org/wp-content/uploads/20240624164726/Online-Retail.xlsx"
df = pd.read_excel(url)

# Display the first few rows of the dataset
df.head()

Output:

    InvoiceNo    StockCode    Description    Quantity    InvoiceDate    UnitPrice    CustomerID    Country
0    536365    85123A    WHITE HANGING HEART T-LIGHT HOLDER    6    2010-12-01 08:26:00    2.55    17850.0    United Kingdom
1    536365    71053    WHITE METAL LANTERN    6    2010-12-01 08:26:00    3.39    17850.0    United Kingdom
2    536365    84406B    CREAM CUPID HEARTS COAT HANGER    8    2010-12-01 08:26:00    2.75    17850.0    United Kingdom
3    536365    84029G    KNITTED UNION FLAG HOT WATER BOTTLE    6    2010-12-01 08:26:00    3.39    17850.0    United Kingdom
4    536365    84029E    RED WOOLLY HOTTIE WHITE HEART.    6    2010-12-01 08:26:00    3.39    17850.0    United Kingdom

Step 3: Data Preprocessing

Before we can train a model, we need to clean and preprocess the data.

Python

# Drop rows with missing values
df.dropna(inplace=True)

# Convert InvoiceDate to datetime
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])

# Create a new column for total price
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']

# Filter out negative quantities
df = df[df['Quantity'] > 0]

# Display the cleaned dataset
df.head()

Output:

InvoiceNo    StockCode    Description    Quantity    InvoiceDate    UnitPrice    CustomerID    Country    TotalPrice
0    536365    85123A    WHITE HANGING HEART T-LIGHT HOLDER    6    2010-12-01 08:26:00    2.55    17850.0    United Kingdom    15.30
1    536365    71053    WHITE METAL LANTERN    6    2010-12-01 08:26:00    3.39    17850.0    United Kingdom    20.34
2    536365    84406B    CREAM CUPID HEARTS COAT HANGER    8    2010-12-01 08:26:00    2.75    17850.0    United Kingdom    22.00
3    536365    84029G    KNITTED UNION FLAG HOT WATER BOTTLE    6    2010-12-01 08:26:00    3.39    17850.0    United Kingdom    20.34
4    536365    84029E    RED WOOLLY HOTTIE WHITE HEART.    6    2010-12-01 08:26:00    3.39    17850.0    United Kingdom    20.34

Step 4: Feature Engineering

We’ll create features like the recency, frequency, and monetary value (RFM) for each customer.

Python

import numpy as np

# Define reference date as one day after the last invoice date
reference_date = df['InvoiceDate'].max() + pd.Timedelta(days=1)

# Group by CustomerID to calculate RFM
rfm = df.groupby('CustomerID').agg({
    'InvoiceDate': lambda x: (reference_date - x.max()).days,  # Recency
    'InvoiceNo': 'nunique',  # Frequency
    'TotalPrice': 'sum'  # Monetary
})

# Rename columns
rfm.columns = ['Recency', 'Frequency', 'Monetary']

# Display the RFM dataframe
rfm.head()

Output:

    Recency    Frequency    Monetary
CustomerID            
12346.0    326    1    77183.60
12347.0    2    7    4310.00
12348.0    75    4    1797.24
12349.0    19    1    1757.55
12350.0    310    1    334.40

Step 5: Train-Test Split

Split the data into training and testing sets.

Python

from sklearn.model_selection import train_test_split

# Split the data
X = rfm[['Recency', 'Frequency', 'Monetary']]
y = np.where(rfm['Monetary'] > rfm['Monetary'].median(), 1, 0)  # Target: 1 if above median, else 0

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the splits
X_train.shape, X_test.shape, y_train.shape, y_test.shape

Output:

((3471, 3), (868, 3), (3471,), (868,))

Step 6: Train the CatBoost Model

Now, we’ll train a CatBoost classifier on the training data.

Python

from catboost import CatBoostClassifier

# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6, verbose=False)

# Fit the model
model.fit(X_train, y_train)

# Evaluate the model
accuracy = model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.2f}')

Output:

Accuracy: 9.90

Step 7: Visualize Feature Importance

CatBoost provides feature importance out of the box, which helps understand which features are contributing the most to the predictions.

Python

import matplotlib.pyplot as plt
import seaborn as sns

# Get feature importance
feature_importance = model.get_feature_importance()
features = X.columns

# Create a dataframe for visualization
importance_df = pd.DataFrame({'Feature': features, 'Importance': feature_importance})

# Plot feature importance
plt.figure(figsize=(10, 6))
sns.barplot(x='Importance', y='Feature', data=importance_df)
plt.title('Feature Importance')
plt.show()

Output:

Step 8: Interactive GUI with ipywidgets

Finally, we’ll create an interactive GUI where users can input recency, frequency, and monetary values to get a recommendation.

Python

import ipywidgets as widgets
from IPython.display import display

# Define the widget inputs
recency_input = widgets.IntSlider(min=0, max=365, step=1, description='Recency')
frequency_input = widgets.IntSlider(min=0, max=100, step=1, description='Frequency')
monetary_input = widgets.FloatSlider(min=0, max=10000, step=0.01, description='Monetary')

# Define the output area
output = widgets.Output()

# Define the function to make predictions
def make_prediction(recency, frequency, monetary):
    data = pd.DataFrame({'Recency': [recency], 'Frequency': [frequency], 'Monetary': [monetary]})
    prediction = model.predict(data)[0]
    return "Recommend" if prediction == 1 else "Do not recommend"

# Define the update function
def update_prediction(change):
    with output:
        output.clear_output()
        prediction = make_prediction(recency_input.value, frequency_input.value, monetary_input.value)
        print(f'Recommendation: {prediction}')

# Attach the update function to the widget inputs
recency_input.observe(update_prediction, names='value')
frequency_input.observe(update_prediction, names='value')
monetary_input.observe(update_prediction, names='value')

# Display the widgets and output
display(recency_input, frequency_input, monetary_input, output)

Output:

Applications of CatBoost in E-commerce

Personalized Product Recommendations: CatBoost analyzes user interactions and preferences to suggest products that are uniquely tailored to each customer’s taste and purchasing history.
Dynamic Pricing: CatBoost can also be used to adjust product prices dynamically based on demand, availability, customer preferences, and other external factors.
Inventory Management: Predictive models powered by CatBoost can forecast demand for products, helping businesses manage inventory more effectively to avoid overstocking or stockouts.
Customer Segmentation: By classifying customers into distinct groups based on their behavior and preferences, CatBoost enables more targeted marketing and service strategies.

Conclusion

CatBoost is reshaping the landscape of e-commerce by providing powerful, efficient, and accurate product recommendations. Its ability to handle complex, categorical datasets with speed and precision makes it an excellent tool for any e-commerce business looking to enhance its recommendation systems. As businesses continue to embrace machine learning technologies, tools like CatBoost will play a pivotal role in driving innovation and delivering personalized shopping experiences.

Reffered: https://www.geeksforgeeks.org

AI ML DS

Related
Handle streaming data in a data engineering pipeline
What are the various methods and tools available for extracting data in ETL processes?
How can you deploy a big data solution?
How Good is R for Data Visualization?
Machine learning deployment

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	14