01: Connect to Kaggle to download data

  • download kaggle.json from kaggle
In [104]:
# Install Kaggle API
!pip install -q kaggle
In [105]:
from google.colab import files
# files.upload()
In [106]:
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
mv: cannot stat 'kaggle.json': No such file or directory
In [107]:
!kaggle datasets download -d zygmunt/goodbooks-10k --unzip
usage: kaggle [-h] [-v] [-W]
              {competitions,c,datasets,d,kernels,k,models,m,files,f,config}
              ...
kaggle: error: argument command: invalid choice: 'ratingss' (choose from 'competitions', 'c', 'datasets', 'd', 'kernels', 'k', 'models', 'm', 'files', 'f', 'config')
In [108]:
!ls
books.csv      ratings.csv	    sample_book.xml  tags.csv
book_tags.csv  regression_model.h5  sample_data      to_read.csv
In [109]:
# we use books.csv and ratings.csv
In [110]:
!cat ratings.csv | head -n 4
book_id,user_id,rating
1,314,5
1,439,3
1,588,5
In [111]:
!cat books.csv | head -n 4
id,book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,title,language_code,average_rating,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
1,2767052,2767052,2792775,272,439023483,9.78043902348e+12,Suzanne Collins,2008.0,The Hunger Games,"The Hunger Games (The Hunger Games, #1)",eng,4.34,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m/2767052.jpg,https://images.gr-assets.com/books/1447303603s/2767052.jpg
2,3,3,4640799,491,439554934,9.78043955493e+12,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,"Harry Potter and the Sorcerer's Stone (Harry Potter, #1)",eng,4.44,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m/3.jpg,https://images.gr-assets.com/books/1474154022s/3.jpg
3,41865,41865,3212258,226,316015849,9.78031601584e+12,Stephenie Meyer,2005.0,Twilight,"Twilight (Twilight, #1)",en-US,3.57,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m/41865.jpg,https://images.gr-assets.com/books/1361039443s/41865.jpg
In [112]:
!cat tags.csv | head -n 4
tag_id,tag_name
0,-
1,--1-
2,--10-
In [113]:
!cat book_tags.csv | head -n 4
goodreads_book_id,tag_id,count
1,30574,167697
1,11305,37174
1,11557,34173
In [113]:

  1. Data Loading and Preparation
In [114]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import warnings

from tensorflow.keras.layers import Input, Embedding, Flatten, Dot, Dense, Concatenate
from tensorflow.keras.models import Model
from keras.models import load_model
import tensorflow as tf
# warnings.filterwarnings('ignore')
In [115]:
ratings = pd.read_csv('ratings.csv')
books = pd.read_csv('books.csv')
In [116]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(ratings, test_size=0.2, random_state=42)
In [117]:
n_users = len(ratings.user_id.unique())
n_books = len(ratings.book_id.unique())

02: Model Architecture

  • Creates two input layers, one for book IDs and one for user IDs.

  • shape=[1] means each input is a single number (either book ID or user ID).

In [118]:
book_input = Input(shape=[1], name="Book-Input")
user_input = Input(shape=[1], name="User-Input")

Embedding Layers

  • Embedding Layers: Convert integer IDs into dense vectors of fixed size.

  • Why +1? This is because embedding layers require the input dimension to be at least max_id + 1 since IDs typically start at 1 (not 0). It ensures we have enough embedding slots for all possible IDs.

  • 5: The second parameter is the embedding dimension (5 in this case), meaning each book/user will be represented by a 5-dimensional vector.

In [119]:
book_embedding = Embedding(n_books+1, 5, name="Book-Embedding")(book_input)
user_embedding = Embedding(n_users+1, 5, name="User-Embedding")(user_input)

Flatten Layers

Flatten: Converts the 2D embedding output (batch_size, 1, 5) to 1D (batch_size, 5) for the dot product operation.

In [120]:
book_vec = Flatten(name="Flatten-Books")(book_embedding)
user_vec = Flatten(name="Flatten-Users")(user_embedding)

Dot Product

  • Dot Product: Computes the similarity between user and book embeddings.

  • axes=1 means we're doing a dot product along the embedding dimension (5).

In [121]:
prod = Dot(name="Dot-Product", axes=1)([book_vec, user_vec])

Model Compilation

Creates the model with two inputs and one output (the predicted rating).

Uses Adam optimizer and mean squared error loss (common for regression tasks).

In [122]:
model = Model([user_input, book_input], prod)
model.compile('adam', 'mean_squared_error')

03: Training and Saving

Checks if a trained model exists, loads it if yes, otherwise trains a new one.

Saves the trained model to disk for future use.

In [123]:
if os.path.exists('regression_model.h5'):
    model = load_model('regression_model.h5')
else:
    history = model.fit([train.user_id, train.book_id], train.rating,
                       batch_size=64, epochs=5, verbose=1)
    model.save('regression_model.h5')
WARNING:absl:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.

4: Making Recommendations

Gets all unique book IDs

Creates an array with the same user ID repeated for all books

Converts to tensors (required by TensorFlow)

Gets predictions for all books

Sorts predictions to get top recommendations

In [133]:
def make_recommendations(user_id=1, model=model, n_recommendations=5):
    # 1. Get all unique book IDs from the dataset
    book_data = np.array(list(set(dataset.book_id)))

    # 2. Create array with the target user_id repeated for each book
    user = np.array([user_id for _ in range(len(book_data))])

    # 3. Convert to TensorFlow tensors (required for model input)
    user_tensor = tf.convert_to_tensor(user, dtype=tf.int32)
    book_tensor = tf.convert_to_tensor(book_data, dtype=tf.int32)

    # 4. Get predicted ratings for all books
    predictions = model.predict([user_tensor, book_tensor], batch_size=128)
    predictions = np.array([a[0] for a in predictions])  # Flatten predictions

    # 5. Sort books by predicted rating (highest first)
    recommended_book_ids = (-predictions).argsort()[:n_recommendations]

    # 6. Get book details from books.csv
    recommended_books = books[books['id'].isin(recommended_book_ids)]

    return recommended_books
In [129]:
ratings.head(2)
Out[129]:
book_id user_id rating
0 1 314 5
1 1 439 3
In [130]:
books.head(2)
Out[130]:
id book_id best_book_id work_id books_count isbn isbn13 authors original_publication_year original_title ... ratings_count work_ratings_count work_text_reviews_count ratings_1 ratings_2 ratings_3 ratings_4 ratings_5 image_url small_image_url
0 1 2767052 2767052 2792775 272 439023483 9.780439e+12 Suzanne Collins 2008.0 The Hunger Games ... 4780653 4942365 155254 66715 127936 560092 1481305 2706317 https://images.gr-assets.com/books/1447303603m... https://images.gr-assets.com/books/1447303603s...
1 2 3 3 4640799 491 439554934 9.780440e+12 J.K. Rowling, Mary GrandPré 1997.0 Harry Potter and the Philosopher's Stone ... 4602479 4800065 75867 75504 101676 455024 1156318 3011543 https://images.gr-assets.com/books/1474154022m... https://images.gr-assets.com/books/1474154022s...

2 rows × 23 columns

In [128]:
# Example usage
recommended_books = make_recommendations(user_id=1)
print(recommended_books)
79/79 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
[8945 7638 8881 7946 4777]
In [134]:
user_id = ratings.user_id.iloc[0]
# Get recommendations for user 314
user_314_recs = make_recommendations(user_id=user_id)

# Display results nicely
print(f"\nTop 5 Recommendations for User 314:")
print(user_314_recs[['id', 'title', 'authors']].to_string(index=False))
79/79 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step

Top 5 Recommendations for User 314:
  id                                            title                                   authors
7638                                   The Algebraist                             Iain M. Banks
8335                                           Ripper Isabel Allende, Oliver Brock, Frank Wynne
8881                      Benediction (Plainsong, #3)                                Kent Haruf
8945          The Next Accident (Quincy & Rainie, #3)                              Lisa Gardner
9079 Like a Fire Burning (The Work and the Glory, #2)                            Gerald N. Lund

Q: explain why we need adam and mse, for this dot product code? cant we just do like 'np.dot'

Great question! Let me explain why we use Adam optimizer and Mean Squared Error (MSE) in this recommendation system, and why we can't just use np.dot directly.

Why Adam Optimizer?

  1. Adaptive Learning Rates:

    • Adam automatically adjusts learning rates for each parameter
    • In our embedding layers (user and book vectors), different dimensions may need different update rates
    • Classic SGD would use the same learning rate for all parameters
  2. Momentum-like Behavior:

    • Adam combines the benefits of two other optimizers (AdaGrad and RMSProp)
    • Helps navigate the loss landscape more efficiently than plain SGD
  3. Well-suited for Embeddings:

    • Embedding layers often benefit from adaptive optimization
    • Different users/books may need different update magnitudes
  4. Default Choice:

    • Adam works well out-of-the-box for most deep learning tasks
    • Requires less tuning than SGD with momentum

Why Mean Squared Error (MSE)?

  1. Regression Task:

    • We're predicting ratings (continuous values), not classifications
    • MSE is the standard loss function for regression problems
  2. Punishes Large Errors:

    • MSE squares the errors, so bad predictions are penalized more heavily
    • Important for recommendation quality - we really don't want to recommend books the user would hate
  3. Aligns with Rating Prediction:

    • When we say "this user would rate this book 4.5 stars", we want to minimize how wrong we are
    • Directly matches our objective

Why Not Just Use np.dot?

  1. Learning vs Calculation:

    • np.dot would just calculate a static dot product
    • Our goal is to LEARN good embeddings that make the dot product predict ratings well
    • The magic happens in the embedding layers that are trained via backpropagation
  2. End-to-End Training:

    • The Dot layer in Keras is differentiable
    • Allows gradients to flow back through the network to update embeddings
    • np.dot wouldn't connect properly to the training process
  3. Model Abstraction:

    • Keras layers manage all the underlying tensor operations
    • Handles batching, GPU acceleration, etc. automatically
    • Would have to reimplement all this manually with np.dot

The Big Picture

What makes this system work isn't the dot product itself, but the trained embeddings:

  1. The model learns:

    • 5-dimensional user preferences (what genres/themes they like)
    • 5-dimensional book characteristics (what genres/themes they contain)
  2. During training:

    • Adam optimizer adjusts these embeddings
    • MSE measures how well their dot products match actual ratings
    • Over time, similar users and books cluster in embedding space
  3. For prediction:

    • High dot product = good match between user preferences and book characteristics
    • The dot product is just the mathematical operation that combines them

Alternative Approach

You could theoretically do something like:

# Initialize random embeddings
user_embeddings = np.random.rand(n_users+1, 5)
book_embeddings = np.random.rand(n_books+1, 5)

# Manual dot product prediction
def predict(user_id, book_id):
    return np.dot(user_embeddings[user_id], book_embeddings[book_id])

# Manual training loop
for epoch in range(epochs):
    for user_id, book_id, rating in ratings:
        prediction = predict(user_id, book_id)
        error = rating - prediction
        # Manually update embeddings using some learning rate
        user_embeddings[user_id] += lr * error * book_embeddings[book_id]
        book_embeddings[book_id] += lr * error * user_embeddings[user_id]

But this would be:

  1. Much slower (no GPU acceleration)
  2. Harder to implement correctly
  3. Less flexible for model changes
  4. Missing many optimizations that Adam provides

The Keras implementation gives us all this functionality in just a few lines of clean, maintainable code.

This recommendation system is a Collaborative Filtering model, specifically a neural network-based implementation of collaborative filtering. Here's why:

Key Characteristics of This System

  1. Pure Interaction Data:

    • Only uses user-book ratings (interaction data)
    • Doesn't use any content features about books (like genre, author) or users (like demographics)
  2. Embedding-Based Approach:

    • Learns latent factors (embeddings) for users and books
    • These embeddings represent learned patterns from the rating data
  3. Dot Product Architecture:

    • Predicts ratings based on similarity between user and book embeddings
    • Classic collaborative filtering approach

Why Not Content-Based?

A content-based system would:

  1. Use features about the items (book content, metadata)
  2. Create user profiles based on the content of items they've liked
  3. Recommend items similar to what the user has liked based on content features

This system does none of those things - it only looks at who liked what, not why they might have liked it.

Collaborative Filtering Variant

This is specifically a neural collaborative filtering system with these properties:

Characteristic This System Traditional CF
Uses embeddings Sometimes
Neural network
Matrix factorization Neural version Classic SVD
Handles cold start

How It Works

  1. User Embedding: Represents user's preferences in latent space

    • Learned from their rating history
    • Similar users end up close in embedding space
  2. Book Embedding: Represents book characteristics in same latent space

    • Learned from how users have rated it
    • Similar books end up close together
  3. Recommendation: Finds books whose embeddings are close to the user's embedding

    • Measured by dot product (cosine similarity)

Advantages Over Traditional CF

  1. Non-Linear Relationships: Neural nets can learn more complex patterns
  2. Flexibility: Easy to extend (e.g., add more layers)
  3. Feature Combination: Can later add content features if needed

When This Works Best

  • When you have lots of interaction data
  • When you want to discover latent relationships
  • When item content is hard to quantify

The system would become a hybrid model if you added content features (like book genres or descriptions) to the neural network. But in its current form, it's pure collaborative filtering.

In [125]: