Model Based Recommeder System

Model-based recommendation is a type of recommendation system that relies on building a predictive model based on the user-item interactions or other relevant features present in the dataset. Unlike memory-based collaborative filtering techniques that directly use the user-item interactions, model-based approaches create a model to predict user preferences and provide recommendations based on this model.

References:

In [4]:
import sys
print(sys.version)
print(sys.executable)

import pandas as pd
import numpy as np
import scipy
import surprise

%load_ext watermark
%watermark -iv
3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:08:16) [MSC v.1943 64 bit (AMD64)]
C:\Users\Sumedha\.conda\envs\py312\python.exe
The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
surprise: 1.1.4
pandas  : 2.2.3
numpy   : 1.26.4
sys     : 3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:08:16) [MSC v.1943 64 bit (AMD64)]
scipy   : 1.15.3

In [2]:
if 'google.colab' in sys.modules:
    !wget -O books.csv "https://raw.githubusercontent.com/zygmuntz/goodbooks-10k/master/books.csv"
    !wget -O ratings.csv "https://raw.githubusercontent.com/zygmuntz/goodbooks-10k/master/ratings.csv"

    !pip uninstall numpy -y
    !pip install numpy==1.19.5
    
    !pip uninstall scikit-surprise -y
    !pip install scikit-surprise
In [3]:
from surprise import Reader, SVD, Dataset, accuracy
from surprise.model_selection import GridSearchCV, train_test_split, cross_validate

Data

In [5]:
from pathlib import Path
if 'google.colab' not in sys.modules:
    path_data = Path.home() / 'github/Recommender_System/data/goodbooks_10k'
    books = pd.read_csv(path_data / 'books.csv').head(1000)
    ratings = pd.read_csv(path_data / 'ratings.csv').head(5000)

print(books.shape)
print(ratings.shape)

display(books.head(2))
display(ratings.head(2))
(1000, 23)
(5000, 3)
book_id goodreads_book_id best_book_id work_id books_count isbn isbn13 authors original_publication_year original_title ... ratings_count work_ratings_count work_text_reviews_count ratings_1 ratings_2 ratings_3 ratings_4 ratings_5 image_url small_image_url
0 1 2767052 2767052 2792775 272 439023483 9.780439e+12 Suzanne Collins 2008.0 The Hunger Games ... 4780653 4942365 155254 66715 127936 560092 1481305 2706317 https://images.gr-assets.com/books/1447303603m... https://images.gr-assets.com/books/1447303603s...
1 2 3 3 4640799 491 439554934 9.780440e+12 J.K. Rowling, Mary GrandPré 1997.0 Harry Potter and the Philosopher's Stone ... 4602479 4800065 75867 75504 101676 455024 1156318 3011543 https://images.gr-assets.com/books/1474154022m... https://images.gr-assets.com/books/1474154022s...

2 rows × 23 columns

user_id book_id rating
0 1 258 5
1 2 4081 4
In [6]:
print(books.shape)
print(books.columns)
books.head(2)
(1000, 23)
Index(['book_id', 'goodreads_book_id', 'best_book_id', 'work_id',
       'books_count', 'isbn', 'isbn13', 'authors', 'original_publication_year',
       'original_title', 'title', 'language_code', 'average_rating',
       'ratings_count', 'work_ratings_count', 'work_text_reviews_count',
       'ratings_1', 'ratings_2', 'ratings_3', 'ratings_4', 'ratings_5',
       'image_url', 'small_image_url'],
      dtype='object')
Out[6]:
book_id goodreads_book_id best_book_id work_id books_count isbn isbn13 authors original_publication_year original_title ... ratings_count work_ratings_count work_text_reviews_count ratings_1 ratings_2 ratings_3 ratings_4 ratings_5 image_url small_image_url
0 1 2767052 2767052 2792775 272 439023483 9.780439e+12 Suzanne Collins 2008.0 The Hunger Games ... 4780653 4942365 155254 66715 127936 560092 1481305 2706317 https://images.gr-assets.com/books/1447303603m... https://images.gr-assets.com/books/1447303603s...
1 2 3 3 4640799 491 439554934 9.780440e+12 J.K. Rowling, Mary GrandPré 1997.0 Harry Potter and the Philosopher's Stone ... 4602479 4800065 75867 75504 101676 455024 1156318 3011543 https://images.gr-assets.com/books/1474154022m... https://images.gr-assets.com/books/1474154022s...

2 rows × 23 columns

In [7]:
books_cols = ['book_id', 'authors', 'original_publication_year', 'title', 'average_rating']
books2 = books[books_cols]
books2.head(2)
Out[7]:
book_id authors original_publication_year title average_rating
0 1 Suzanne Collins 2008.0 The Hunger Games (The Hunger Games, #1) 4.34
1 2 J.K. Rowling, Mary GrandPré 1997.0 Harry Potter and the Sorcerer's Stone (Harry P... 4.44
In [8]:
print(ratings.shape)
print(ratings.columns)
ratings.head(2)
(5000, 3)
Index(['user_id', 'book_id', 'rating'], dtype='object')
Out[8]:
user_id book_id rating
0 1 258 5
1 2 4081 4
In [9]:
df = pd.merge(books, ratings, on="book_id", how="inner")
print(df.shape)
df.head(2)
(3031, 25)
Out[9]:
book_id goodreads_book_id best_book_id work_id books_count isbn isbn13 authors original_publication_year original_title ... work_text_reviews_count ratings_1 ratings_2 ratings_3 ratings_4 ratings_5 image_url small_image_url user_id rating
0 2 3 3 4640799 491 439554934 9.780440e+12 J.K. Rowling, Mary GrandPré 1997.0 Harry Potter and the Philosopher's Stone ... 75867 75504 101676 455024 1156318 3011543 https://images.gr-assets.com/books/1474154022m... https://images.gr-assets.com/books/1474154022s... 4 5
1 2 3 3 4640799 491 439554934 9.780440e+12 J.K. Rowling, Mary GrandPré 1997.0 Harry Potter and the Philosopher's Stone ... 75867 75504 101676 455024 1156318 3011543 https://images.gr-assets.com/books/1474154022m... https://images.gr-assets.com/books/1474154022s... 15 4

2 rows × 25 columns

Model Based Recommender Engine

In [10]:
user_id = df["user_id"].iloc[0]
user_id
Out[10]:
4
In [11]:
sample_df = df[df["user_id"]==user_id]
print(sample_df.shape)
sample_df.head(2)
(83, 25)
Out[11]:
book_id goodreads_book_id best_book_id work_id books_count isbn isbn13 authors original_publication_year original_title ... work_text_reviews_count ratings_1 ratings_2 ratings_3 ratings_4 ratings_5 image_url small_image_url user_id rating
0 2 3 3 4640799 491 439554934 9.780440e+12 J.K. Rowling, Mary GrandPré 1997.0 Harry Potter and the Philosopher's Stone ... 75867 75504 101676 455024 1156318 3011543 https://images.gr-assets.com/books/1474154022m... https://images.gr-assets.com/books/1474154022s... 4 5
55 4 2657 2657 3275794 487 61120081 9.780061e+12 Harper Lee 1960.0 To Kill a Mockingbird ... 72586 60427 117415 446835 1001952 1714267 https://images.gr-assets.com/books/1361975680m... https://images.gr-assets.com/books/1361975680s... 4 4

2 rows × 25 columns

In [12]:
# give a scale that beetwen 1 and 5 for ratings using surprise method Reader
reader = Reader(rating_scale=(1, 5))
In [13]:
data = Dataset.load_from_df(df[['user_id','book_id','rating']], reader)
In [14]:
trainset, testset = train_test_split(data,random_state=42, test_size=.25)
In [15]:
svd_model = SVD(random_state=42)
svd_model.fit(trainset)
Out[15]:
<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1e40b7f2fc0>
In [16]:
predictions = svd_model.test(testset)
In [17]:
# books that our sample didnt read
didnt_read = df["book_id"][~(df["user_id"] == user_id)].drop_duplicates().values.tolist()
In [18]:
def suggest(df,user_id,sug):
    didnt_read = df["book_id"][~(df["user_id"]==user_id)].drop_duplicates().values.tolist()
    temp_dict={}
    for i in didnt_read:
        temp_dict[i] = svd_model.predict(uid=user_id, iid=i)[3]
    suggestions = pd.DataFrame(temp_dict.items(),columns=["book_id",'possible_rate']).sort_values(by="possible_rate", ascending=False).head(sug)
    merged = pd.merge(suggestions,books[["book_id","title"]], how="inner", on="book_id")
    return merged
In [19]:
#for our sample, our machine learning model suggested 5 different book which our model can give around 4.7 rating.
suggest(df, user_id, 5)
Out[19]:
book_id possible_rate title
0 80 4.524551 The Little Prince
1 21 4.398465 Harry Potter and the Order of the Phoenix (Har...
2 184 4.360772 Matilda
3 101 4.330692 Me Talk Pretty One Day
4 70 4.311622 Ender's Game (Ender's Saga, #1)
In [ ]:
 
In [ ]: