Horje
E-commerce product recommendations using catboost

In the dynamic world of e-commerce, personalized product recommendations stand as a cornerstone strategy to enhance user experience and boost sales. The integration of sophisticated machine learning algorithms has transformed how businesses predict and cater to individual customer preferences. One such powerful tool is CatBoost, a high-performance, open-source library developed by Yandex.

This article explores the impact of CatBoost on e-commerce product recommendations, elucidating its advantages and mechanisms along with the implementation.

E-commerce Product Recommendations

E-commerce product recommendation is a feature commonly used in online retail to suggest products to customers based on various factors, including their browsing history, purchase behavior, product preferences, and other users’ similar actions. This technique is pivotal in personalizing the shopping experience and increasing customer engagement and sales.

Benefits of CatBoost in E-commerce

  1. Handling Categorical Data: CatBoost excels in dealing with categorical data directly. Most machine learning models require extensive preprocessing to convert categorical data into numerical formats. CatBoost simplifies this process by efficiently processing categorical variables, thereby preserving the richness of data and reducing the preprocessing overhead.
  2. Improved Recommendation Accuracy: By leveraging its advanced algorithms for regression and classification, CatBoost improves the accuracy of product recommendations. It considers historical data, user behavior, and item specifics to predict what products a customer will likely be interested in, enhancing the relevancy of recommendations provided.
  3. Speed and Scalability: CatBoost is renowned for its execution speed and scalability, crucial for e-commerce platforms dealing with millions of users and items. It ensures that recommendations are generated quickly and efficiently, improving the responsiveness of the recommendation system.
  4. Reduced Overfitting: With its built-in mechanisms to combat overfitting, CatBoost ensures that the recommendation models perform well not only on the training data but also on unseen data. This robustness is vital for maintaining the accuracy and reliability of product suggestions over time.

Implementing E-commerce Product Recommendation using CatBoost

This is a comprehensive illustrated step-by-step tutorial that explains how to use CatBoost to create an E-commerce product suggestion system. Additionally, we will use ipywidgets to create an interactive GUI.

Step 1: Import Libraries

Make sure you have installed the required libraries first:

pip install pandas catboost matplotlib seaborn ipywidgets

and then import the following libraries.

Python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import ipywidgets as widgets
from IPython.display import display

from catboost import CatBoostClassifier, Pool
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

Step 2: Load the Dataset

We’ll use the “Online Retail” dataset, for this implementation.

Python
import pandas as pd

# Load the dataset
url = "https://media.geeksforgeeks.org/wp-content/uploads/20240624164726/Online-Retail.xlsx"
df = pd.read_excel(url)

# Display the first few rows of the dataset
df.head()

Output:

    InvoiceNo    StockCode    Description    Quantity    InvoiceDate    UnitPrice    CustomerID    Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom

Step 3: Data Preprocessing

Before we can train a model, we need to clean and preprocess the data.

Python
# Drop rows with missing values
df.dropna(inplace=True)

# Convert InvoiceDate to datetime
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])

# Create a new column for total price
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']

# Filter out negative quantities
df = df[df['Quantity'] > 0]

# Display the cleaned dataset
df.head()

Output:

InvoiceNo    StockCode    Description    Quantity    InvoiceDate    UnitPrice    CustomerID    Country    TotalPrice
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34

Step 4: Feature Engineering

We’ll create features like the recency, frequency, and monetary value (RFM) for each customer.

Python
import numpy as np

# Define reference date as one day after the last invoice date
reference_date = df['InvoiceDate'].max() + pd.Timedelta(days=1)

# Group by CustomerID to calculate RFM
rfm = df.groupby('CustomerID').agg({
    'InvoiceDate': lambda x: (reference_date - x.max()).days,  # Recency
    'InvoiceNo': 'nunique',  # Frequency
    'TotalPrice': 'sum'  # Monetary
})

# Rename columns
rfm.columns = ['Recency', 'Frequency', 'Monetary']

# Display the RFM dataframe
rfm.head()

Output:

    Recency    Frequency    Monetary
CustomerID
12346.0 326 1 77183.60
12347.0 2 7 4310.00
12348.0 75 4 1797.24
12349.0 19 1 1757.55
12350.0 310 1 334.40

Step 5: Train-Test Split

Split the data into training and testing sets.

Python
from sklearn.model_selection import train_test_split

# Split the data
X = rfm[['Recency', 'Frequency', 'Monetary']]
y = np.where(rfm['Monetary'] > rfm['Monetary'].median(), 1, 0)  # Target: 1 if above median, else 0

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the splits
X_train.shape, X_test.shape, y_train.shape, y_test.shape

Output:

((3471, 3), (868, 3), (3471,), (868,))

Step 6: Train the CatBoost Model

Now, we’ll train a CatBoost classifier on the training data.

Python
from catboost import CatBoostClassifier

# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6, verbose=False)

# Fit the model
model.fit(X_train, y_train)

# Evaluate the model
accuracy = model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.2f}')

Output:

Accuracy: 9.90

Step 7: Visualize Feature Importance

CatBoost provides feature importance out of the box, which helps understand which features are contributing the most to the predictions.

Python
import matplotlib.pyplot as plt
import seaborn as sns

# Get feature importance
feature_importance = model.get_feature_importance()
features = X.columns

# Create a dataframe for visualization
importance_df = pd.DataFrame({'Feature': features, 'Importance': feature_importance})

# Plot feature importance
plt.figure(figsize=(10, 6))
sns.barplot(x='Importance', y='Feature', data=importance_df)
plt.title('Feature Importance')
plt.show()

Output:

download-(1)

Step 8: Interactive GUI with ipywidgets

Finally, we’ll create an interactive GUI where users can input recency, frequency, and monetary values to get a recommendation.

Python
import ipywidgets as widgets
from IPython.display import display

# Define the widget inputs
recency_input = widgets.IntSlider(min=0, max=365, step=1, description='Recency')
frequency_input = widgets.IntSlider(min=0, max=100, step=1, description='Frequency')
monetary_input = widgets.FloatSlider(min=0, max=10000, step=0.01, description='Monetary')

# Define the output area
output = widgets.Output()

# Define the function to make predictions
def make_prediction(recency, frequency, monetary):
    data = pd.DataFrame({'Recency': [recency], 'Frequency': [frequency], 'Monetary': [monetary]})
    prediction = model.predict(data)[0]
    return "Recommend" if prediction == 1 else "Do not recommend"

# Define the update function
def update_prediction(change):
    with output:
        output.clear_output()
        prediction = make_prediction(recency_input.value, frequency_input.value, monetary_input.value)
        print(f'Recommendation: {prediction}')

# Attach the update function to the widget inputs
recency_input.observe(update_prediction, names='value')
frequency_input.observe(update_prediction, names='value')
monetary_input.observe(update_prediction, names='value')

# Display the widgets and output
display(recency_input, frequency_input, monetary_input, output)

Output:

file

Applications of CatBoost in E-commerce

  • Personalized Product Recommendations: CatBoost analyzes user interactions and preferences to suggest products that are uniquely tailored to each customer’s taste and purchasing history.
  • Dynamic Pricing: CatBoost can also be used to adjust product prices dynamically based on demand, availability, customer preferences, and other external factors.
  • Inventory Management: Predictive models powered by CatBoost can forecast demand for products, helping businesses manage inventory more effectively to avoid overstocking or stockouts.
  • Customer Segmentation: By classifying customers into distinct groups based on their behavior and preferences, CatBoost enables more targeted marketing and service strategies.

Conclusion

CatBoost is reshaping the landscape of e-commerce by providing powerful, efficient, and accurate product recommendations. Its ability to handle complex, categorical datasets with speed and precision makes it an excellent tool for any e-commerce business looking to enhance its recommendation systems. As businesses continue to embrace machine learning technologies, tools like CatBoost will play a pivotal role in driving innovation and delivering personalized shopping experiences.




Reffered: https://www.geeksforgeeks.org


AI ML DS

Related
Handle streaming data in a data engineering pipeline Handle streaming data in a data engineering pipeline
What are the various methods and tools available for extracting data in ETL processes? What are the various methods and tools available for extracting data in ETL processes?
How can you deploy a big data solution? How can you deploy a big data solution?
How Good is R for Data Visualization? How Good is R for Data Visualization?
Machine learning deployment Machine learning deployment

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
14