![]() |
A recommender system is a type of information filtering system that provides personalized recommendations to users based on their preferences, interests, and past behaviors. Recommender systems come in a variety of forms, such as content-based, collaborative filtering, and hybrid systems. Content-based systems make recommendations for products based on how closely their characteristics match those of products the user has previously expressed interest in. Collaborative filtering systems recommend items based on the preferences of users who have similar interests to the user being recommended. Hybrid systems combine both content-based and collaborative filtering approaches to make recommendations. We will implement this with the help of Collaborative Filtering. Collaborative filtering involves making predictions (filtering) about a user’s interests by compiling preferences or taste data from numerous users (collaborating). The essential premise is that, if two users A and B share the same opinion on a subject, A is more likely to share B’s opinion on a related but unrelated subject, x, than the opinion of a randomly selected user. Recommender System using PysparkCollaborative filtering is implemented by the machine learning library Spark MLlib using Alternating Least Squares. These parameters apply to the MLlib implementation:
In this, we will use the dataset of the book review. Step 1: Import the necessary libraries and functions and Setup Spark SessionPython3
Output: SparkSession - in-memory SparkContext Spark UI Version v3.3.1 Master local[*] AppName Recommender Step 2: Reading the data from the data setPython3
Output: +-------+-------+------+ |book_id|user_id|rating| +-------+-------+------+ | 1| 314| 5| | 1| 439| 3| | 1| 588| 5| | 1| 1169| 4| | 1| 1185| 4| +-------+-------+------+ only showing top 5 rows Describe the datasetPython3
Output: +-------+-----------------+------------------+------------------+ |summary| book_id| user_id| rating| +-------+-----------------+------------------+------------------+ | count| 981756| 981756| 981756| | mean|4943.275635697668|25616.759933221696|3.8565335989797873| | stddev|2873.207414896143|15228.338825882149|0.9839408559619973| | min| 1| 1| 1| | max| 10000| 53424| 5| +-------+-----------------+------------------+------------------+ Step 3: Splitting the data into training and testingPython3
Step 4: Import the Alternating Least Squares(ALS) Method and apply it.Python3
Step 5: PredictionsPython3
Output: +-------+-------+------+----------+ |book_id|user_id|rating|prediction| +-------+-------+------+----------+ | 2| 6342| 3| 4.8064413| | 1| 17984| 5| 4.9681554| | 1| 38475| 4| 4.4078903| | 2| 6630| 5| 4.344222| | 1| 32055| 4| 3.990228| | 1| 33697| 4| 3.7945805| | 1| 18313| 5| 4.533183| | 1| 5461| 3| 3.8614116| | 1| 47800| 5| 4.914357| | 2| 10751| 3| 4.160536| | 1| 16377| 4| 5.304298| | 1| 45493| 5| 3.998557| | 2| 10509| 2| 1.8626969| | 1| 33890| 3| 3.6022692| | 1| 37284| 5| 4.8147345| | 1| 1185| 4| 3.7463336| | 1| 44397| 5| 5.0251017| | 1| 46977| 4| 4.0746284| | 1| 10944| 5| 4.343548| | 2| 8167| 2| 3.705464| +-------+-------+------+----------+ only showing top 20 rows EvaluationsPython3
Output: Root-mean-square error = nan Step 6: RecommendationsNow, we will predict/recommend the book to a single user – user1 (let’s say, userId:5461) with the help of our trained model. Python3
Output: +-------+-------+ |book_id|user_id| +-------+-------+ | 1| 5461| | 11| 5461| | 19| 5461| | 46| 5461| | 60| 5461| | 66| 5461| | 93| 5461| | 111| 5461| | 121| 5461| | 172| 5461| | 194| 5461| | 212| 5461| | 222| 5461| | 245| 5461| | 264| 5461| | 281| 5461| | 301| 5461| | 354| 5461| | 388| 5461| | 454| 5461| +-------+-------+ only showing top 20 rows Python3
Output: +-------+-------+----------+ |book_id|user_id|prediction| +-------+-------+----------+ | 19| 5461| 5.3429904| | 11| 5461| 4.830688| | 66| 5461| 4.804107| | 245| 5461| 4.705879| | 388| 5461| 4.6276107| | 1161| 5461| 4.612251| | 60| 5461| 4.5895457| | 1402| 5461| 4.5184| | 1088| 5461| 4.454755| | 5152| 5461| 4.415825| | 121| 5461| 4.3423634| | 93| 5461| 4.3357944| | 1796| 5461| 4.30891| | 172| 5461| 4.2679276| | 454| 5461| 4.245925| | 1211| 5461| 4.2431927| | 731| 5461| 4.1873074| | 1094| 5461| 4.1829815| | 222| 5461| 4.182873| | 264| 5461| 4.1469045| +-------+-------+----------+ only showing top 20 rows In the above output, there are predictions for the book IDs for the user with userId “5461”. Step 7: Stop the sparkPython3
|
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 10 |