01: Connect to Kaggle to download data¶
- download kaggle.json from kaggle
# Install Kaggle API
!pip install -q kaggle
from google.colab import files
# files.upload()
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download -d zygmunt/goodbooks-10k --unzip
!ls
# we use books.csv and ratings.csv
!cat ratings.csv | head -n 4
!cat books.csv | head -n 4
!cat tags.csv | head -n 4
!cat book_tags.csv | head -n 4
- Data Loading and Preparation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import warnings
from tensorflow.keras.layers import Input, Embedding, Flatten, Dot, Dense, Concatenate
from tensorflow.keras.models import Model
from keras.models import load_model
import tensorflow as tf
# warnings.filterwarnings('ignore')
ratings = pd.read_csv('ratings.csv')
books = pd.read_csv('books.csv')
from sklearn.model_selection import train_test_split
train, test = train_test_split(ratings, test_size=0.2, random_state=42)
n_users = len(ratings.user_id.unique())
n_books = len(ratings.book_id.unique())
02: Model Architecture¶
Creates two input layers, one for book IDs and one for user IDs.
shape=[1] means each input is a single number (either book ID or user ID).
book_input = Input(shape=[1], name="Book-Input")
user_input = Input(shape=[1], name="User-Input")
Embedding Layers¶
Embedding Layers: Convert integer IDs into dense vectors of fixed size.
Why +1? This is because embedding layers require the input dimension to be at least max_id + 1 since IDs typically start at 1 (not 0). It ensures we have enough embedding slots for all possible IDs.
5: The second parameter is the embedding dimension (5 in this case), meaning each book/user will be represented by a 5-dimensional vector.
book_embedding = Embedding(n_books+1, 5, name="Book-Embedding")(book_input)
user_embedding = Embedding(n_users+1, 5, name="User-Embedding")(user_input)
Flatten Layers
Flatten: Converts the 2D embedding output (batch_size, 1, 5) to 1D (batch_size, 5) for the dot product operation.
book_vec = Flatten(name="Flatten-Books")(book_embedding)
user_vec = Flatten(name="Flatten-Users")(user_embedding)
Dot Product
Dot Product: Computes the similarity between user and book embeddings.
axes=1
means we're doing a dot product along the embedding dimension (5).
prod = Dot(name="Dot-Product", axes=1)([book_vec, user_vec])
Model Compilation
Creates the model with two inputs and one output (the predicted rating).
Uses Adam optimizer and mean squared error loss (common for regression tasks).
model = Model([user_input, book_input], prod)
model.compile('adam', 'mean_squared_error')
03: Training and Saving¶
Checks if a trained model exists, loads it if yes, otherwise trains a new one.
Saves the trained model to disk for future use.
if os.path.exists('regression_model.h5'):
model = load_model('regression_model.h5')
else:
history = model.fit([train.user_id, train.book_id], train.rating,
batch_size=64, epochs=5, verbose=1)
model.save('regression_model.h5')
4: Making Recommendations¶
Gets all unique book IDs
Creates an array with the same user ID repeated for all books
Converts to tensors (required by TensorFlow)
Gets predictions for all books
Sorts predictions to get top recommendations
def make_recommendations(user_id=1, model=model, n_recommendations=5):
# 1. Get all unique book IDs from the dataset
book_data = np.array(list(set(dataset.book_id)))
# 2. Create array with the target user_id repeated for each book
user = np.array([user_id for _ in range(len(book_data))])
# 3. Convert to TensorFlow tensors (required for model input)
user_tensor = tf.convert_to_tensor(user, dtype=tf.int32)
book_tensor = tf.convert_to_tensor(book_data, dtype=tf.int32)
# 4. Get predicted ratings for all books
predictions = model.predict([user_tensor, book_tensor], batch_size=128)
predictions = np.array([a[0] for a in predictions]) # Flatten predictions
# 5. Sort books by predicted rating (highest first)
recommended_book_ids = (-predictions).argsort()[:n_recommendations]
# 6. Get book details from books.csv
recommended_books = books[books['id'].isin(recommended_book_ids)]
return recommended_books
ratings.head(2)
books.head(2)
# Example usage
recommended_books = make_recommendations(user_id=1)
print(recommended_books)
user_id = ratings.user_id.iloc[0]
# Get recommendations for user 314
user_314_recs = make_recommendations(user_id=user_id)
# Display results nicely
print(f"\nTop 5 Recommendations for User 314:")
print(user_314_recs[['id', 'title', 'authors']].to_string(index=False))
Q: explain why we need adam and mse, for this dot product code? cant we just do like 'np.dot'¶
Great question! Let me explain why we use Adam optimizer and Mean Squared Error (MSE) in this recommendation system, and why we can't just use np.dot
directly.
Why Adam Optimizer?¶
Adaptive Learning Rates:
- Adam automatically adjusts learning rates for each parameter
- In our embedding layers (user and book vectors), different dimensions may need different update rates
- Classic SGD would use the same learning rate for all parameters
Momentum-like Behavior:
- Adam combines the benefits of two other optimizers (AdaGrad and RMSProp)
- Helps navigate the loss landscape more efficiently than plain SGD
Well-suited for Embeddings:
- Embedding layers often benefit from adaptive optimization
- Different users/books may need different update magnitudes
Default Choice:
- Adam works well out-of-the-box for most deep learning tasks
- Requires less tuning than SGD with momentum
Why Mean Squared Error (MSE)?¶
Regression Task:
- We're predicting ratings (continuous values), not classifications
- MSE is the standard loss function for regression problems
Punishes Large Errors:
- MSE squares the errors, so bad predictions are penalized more heavily
- Important for recommendation quality - we really don't want to recommend books the user would hate
Aligns with Rating Prediction:
- When we say "this user would rate this book 4.5 stars", we want to minimize how wrong we are
- Directly matches our objective
Why Not Just Use np.dot
?¶
Learning vs Calculation:
np.dot
would just calculate a static dot product- Our goal is to LEARN good embeddings that make the dot product predict ratings well
- The magic happens in the embedding layers that are trained via backpropagation
End-to-End Training:
- The
Dot
layer in Keras is differentiable - Allows gradients to flow back through the network to update embeddings
np.dot
wouldn't connect properly to the training process
- The
Model Abstraction:
- Keras layers manage all the underlying tensor operations
- Handles batching, GPU acceleration, etc. automatically
- Would have to reimplement all this manually with
np.dot
The Big Picture¶
What makes this system work isn't the dot product itself, but the trained embeddings:
The model learns:
- 5-dimensional user preferences (what genres/themes they like)
- 5-dimensional book characteristics (what genres/themes they contain)
During training:
- Adam optimizer adjusts these embeddings
- MSE measures how well their dot products match actual ratings
- Over time, similar users and books cluster in embedding space
For prediction:
- High dot product = good match between user preferences and book characteristics
- The dot product is just the mathematical operation that combines them
Alternative Approach¶
You could theoretically do something like:
# Initialize random embeddings
user_embeddings = np.random.rand(n_users+1, 5)
book_embeddings = np.random.rand(n_books+1, 5)
# Manual dot product prediction
def predict(user_id, book_id):
return np.dot(user_embeddings[user_id], book_embeddings[book_id])
# Manual training loop
for epoch in range(epochs):
for user_id, book_id, rating in ratings:
prediction = predict(user_id, book_id)
error = rating - prediction
# Manually update embeddings using some learning rate
user_embeddings[user_id] += lr * error * book_embeddings[book_id]
book_embeddings[book_id] += lr * error * user_embeddings[user_id]
But this would be:
- Much slower (no GPU acceleration)
- Harder to implement correctly
- Less flexible for model changes
- Missing many optimizations that Adam provides
The Keras implementation gives us all this functionality in just a few lines of clean, maintainable code.
This recommendation system is a Collaborative Filtering model, specifically a neural network-based implementation of collaborative filtering. Here's why:
Key Characteristics of This System¶
Pure Interaction Data:
- Only uses user-book ratings (interaction data)
- Doesn't use any content features about books (like genre, author) or users (like demographics)
Embedding-Based Approach:
- Learns latent factors (embeddings) for users and books
- These embeddings represent learned patterns from the rating data
Dot Product Architecture:
- Predicts ratings based on similarity between user and book embeddings
- Classic collaborative filtering approach
Why Not Content-Based?¶
A content-based system would:
- Use features about the items (book content, metadata)
- Create user profiles based on the content of items they've liked
- Recommend items similar to what the user has liked based on content features
This system does none of those things - it only looks at who liked what, not why they might have liked it.
Collaborative Filtering Variant¶
This is specifically a neural collaborative filtering system with these properties:
Characteristic | This System | Traditional CF |
---|---|---|
Uses embeddings | ✓ | Sometimes |
Neural network | ✓ | ✗ |
Matrix factorization | Neural version | Classic SVD |
Handles cold start | ✗ | ✗ |
How It Works¶
User Embedding: Represents user's preferences in latent space
- Learned from their rating history
- Similar users end up close in embedding space
Book Embedding: Represents book characteristics in same latent space
- Learned from how users have rated it
- Similar books end up close together
Recommendation: Finds books whose embeddings are close to the user's embedding
- Measured by dot product (cosine similarity)
Advantages Over Traditional CF¶
- Non-Linear Relationships: Neural nets can learn more complex patterns
- Flexibility: Easy to extend (e.g., add more layers)
- Feature Combination: Can later add content features if needed
When This Works Best¶
- When you have lots of interaction data
- When you want to discover latent relationships
- When item content is hard to quantify
The system would become a hybrid model if you added content features (like book genres or descriptions) to the neural network. But in its current form, it's pure collaborative filtering.