In the dynamic world of e-commerce, personalized product recommendations stand as a cornerstone strategy to enhance user experience and boost sales. The integration of sophisticated machine learning algorithms has transformed how businesses predict and cater to individual customer preferences. One such powerful tool is CatBoost, a high-performance, open-source library developed by Yandex.
This article explores the impact of CatBoost on e-commerce product recommendations, elucidating its advantages and mechanisms along with the implementation.
E-commerce Product RecommendationsE-commerce product recommendation is a feature commonly used in online retail to suggest products to customers based on various factors, including their browsing history, purchase behavior, product preferences, and other users’ similar actions. This technique is pivotal in personalizing the shopping experience and increasing customer engagement and sales.
Benefits of CatBoost in E-commerce- Handling Categorical Data: CatBoost excels in dealing with categorical data directly. Most machine learning models require extensive preprocessing to convert categorical data into numerical formats. CatBoost simplifies this process by efficiently processing categorical variables, thereby preserving the richness of data and reducing the preprocessing overhead.
- Improved Recommendation Accuracy: By leveraging its advanced algorithms for regression and classification, CatBoost improves the accuracy of product recommendations. It considers historical data, user behavior, and item specifics to predict what products a customer will likely be interested in, enhancing the relevancy of recommendations provided.
- Speed and Scalability: CatBoost is renowned for its execution speed and scalability, crucial for e-commerce platforms dealing with millions of users and items. It ensures that recommendations are generated quickly and efficiently, improving the responsiveness of the recommendation system.
- Reduced Overfitting: With its built-in mechanisms to combat overfitting, CatBoost ensures that the recommendation models perform well not only on the training data but also on unseen data. This robustness is vital for maintaining the accuracy and reliability of product suggestions over time.
Implementing E-commerce Product Recommendation using CatBoostThis is a comprehensive illustrated step-by-step tutorial that explains how to use CatBoost to create an E-commerce product suggestion system. Additionally, we will use ipywidgets to create an interactive GUI.
Step 1: Import LibrariesMake sure you have installed the required libraries first:
pip install pandas catboost matplotlib seaborn ipywidgets and then import the following libraries.
Python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import ipywidgets as widgets
from IPython.display import display
from catboost import CatBoostClassifier, Pool
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
Step 2: Load the DatasetWe’ll use the “Online Retail” dataset, for this implementation.
Python
import pandas as pd
# Load the dataset
url = "https://media.geeksforgeeks.org/wp-content/uploads/20240624164726/Online-Retail.xlsx"
df = pd.read_excel(url)
# Display the first few rows of the dataset
df.head()
Output:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country 0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom Step 3: Data PreprocessingBefore we can train a model, we need to clean and preprocess the data.
Python
# Drop rows with missing values
df.dropna(inplace=True)
# Convert InvoiceDate to datetime
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])
# Create a new column for total price
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']
# Filter out negative quantities
df = df[df['Quantity'] > 0]
# Display the cleaned dataset
df.head()
Output:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country TotalPrice 0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30 1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34 2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00 3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34 4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34 Step 4: Feature EngineeringWe’ll create features like the recency, frequency, and monetary value (RFM) for each customer.
Python
import numpy as np
# Define reference date as one day after the last invoice date
reference_date = df['InvoiceDate'].max() + pd.Timedelta(days=1)
# Group by CustomerID to calculate RFM
rfm = df.groupby('CustomerID').agg({
'InvoiceDate': lambda x: (reference_date - x.max()).days, # Recency
'InvoiceNo': 'nunique', # Frequency
'TotalPrice': 'sum' # Monetary
})
# Rename columns
rfm.columns = ['Recency', 'Frequency', 'Monetary']
# Display the RFM dataframe
rfm.head()
Output:
Recency Frequency Monetary CustomerID 12346.0 326 1 77183.60 12347.0 2 7 4310.00 12348.0 75 4 1797.24 12349.0 19 1 1757.55 12350.0 310 1 334.40 Step 5: Train-Test SplitSplit the data into training and testing sets.
Python
from sklearn.model_selection import train_test_split
# Split the data
X = rfm[['Recency', 'Frequency', 'Monetary']]
y = np.where(rfm['Monetary'] > rfm['Monetary'].median(), 1, 0) # Target: 1 if above median, else 0
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Display the shapes of the splits
X_train.shape, X_test.shape, y_train.shape, y_test.shape
Output:
((3471, 3), (868, 3), (3471,), (868,)) Step 6: Train the CatBoost ModelNow, we’ll train a CatBoost classifier on the training data.
Python
from catboost import CatBoostClassifier
# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6, verbose=False)
# Fit the model
model.fit(X_train, y_train)
# Evaluate the model
accuracy = model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.2f}')
Output:
Accuracy: 9.90 Step 7: Visualize Feature ImportanceCatBoost provides feature importance out of the box, which helps understand which features are contributing the most to the predictions.
Python
import matplotlib.pyplot as plt
import seaborn as sns
# Get feature importance
feature_importance = model.get_feature_importance()
features = X.columns
# Create a dataframe for visualization
importance_df = pd.DataFrame({'Feature': features, 'Importance': feature_importance})
# Plot feature importance
plt.figure(figsize=(10, 6))
sns.barplot(x='Importance', y='Feature', data=importance_df)
plt.title('Feature Importance')
plt.show()
Output:
.png) Finally, we’ll create an interactive GUI where users can input recency, frequency, and monetary values to get a recommendation.
Python
import ipywidgets as widgets
from IPython.display import display
# Define the widget inputs
recency_input = widgets.IntSlider(min=0, max=365, step=1, description='Recency')
frequency_input = widgets.IntSlider(min=0, max=100, step=1, description='Frequency')
monetary_input = widgets.FloatSlider(min=0, max=10000, step=0.01, description='Monetary')
# Define the output area
output = widgets.Output()
# Define the function to make predictions
def make_prediction(recency, frequency, monetary):
data = pd.DataFrame({'Recency': [recency], 'Frequency': [frequency], 'Monetary': [monetary]})
prediction = model.predict(data)[0]
return "Recommend" if prediction == 1 else "Do not recommend"
# Define the update function
def update_prediction(change):
with output:
output.clear_output()
prediction = make_prediction(recency_input.value, frequency_input.value, monetary_input.value)
print(f'Recommendation: {prediction}')
# Attach the update function to the widget inputs
recency_input.observe(update_prediction, names='value')
frequency_input.observe(update_prediction, names='value')
monetary_input.observe(update_prediction, names='value')
# Display the widgets and output
display(recency_input, frequency_input, monetary_input, output)
Output:
 Applications of CatBoost in E-commerce- Personalized Product Recommendations: CatBoost analyzes user interactions and preferences to suggest products that are uniquely tailored to each customer’s taste and purchasing history.
- Dynamic Pricing: CatBoost can also be used to adjust product prices dynamically based on demand, availability, customer preferences, and other external factors.
- Inventory Management: Predictive models powered by CatBoost can forecast demand for products, helping businesses manage inventory more effectively to avoid overstocking or stockouts.
- Customer Segmentation: By classifying customers into distinct groups based on their behavior and preferences, CatBoost enables more targeted marketing and service strategies.
ConclusionCatBoost is reshaping the landscape of e-commerce by providing powerful, efficient, and accurate product recommendations. Its ability to handle complex, categorical datasets with speed and precision makes it an excellent tool for any e-commerce business looking to enhance its recommendation systems. As businesses continue to embrace machine learning technologies, tools like CatBoost will play a pivotal role in driving innovation and delivering personalized shopping experiences.
|