Personal Recommendations with one single Matrix Multiplication
Personal Recommendations with one single Matrix Multiplication¶
This blog post shows a simple example of creating a simple but effective personal recommendation engine using the method described in the paper Embarrassingly Shallow Autoencoders for Sparse Data from Harald Steck.
The challenge of meaningful automated recommendations¶
Finding good recommendations for your customers in an automated way is not an easy task. No matter whether you try to sell movies, songs or fashion, the idea is always the same: finding meaningful products which match your customers' demands. While a "human recommender" may simply ask the customer for their preferences and, based on their own experience, will pick a selection of suitable products, a computer lacks all that intuition.
Therefore, different strategies have been developed in order to help computers gain a deeper understanding of the customer-product relationship.
The method described here today uses an autoencoder approach: given a vector of products a customer bought, it will try to infer relevant products by filling in the empty elements in the product vector.
Autoencoders¶
The task of an autoencoder is straightforward: given some input vector $x$, the output $y$ of the autoencoder should be identical to $x$. This seemingly trivial task is made more complicated, however, by adding restrictions on the internal processing steps of the autoencoder.
A typical autoencoder might be built like a neural network, consisting of an input layer, a hidden layer, and an output layer. Here the restriction would be that the size of the hidden layer is smaller than that of the input layer, and therefore the autoencoder is forced to learn a more efficient representation of the data it is trained on.
Shallow Autoencoders¶
In our case, the autoencoder is actually much more simple, as it only consists of a single matrix $B \in \mathbb{R} ^{\left|I\right| x \left|I\right|}$, i.e. this matrix has as many rows/columns as there are unique items our customers might buy.
Given a customer-item matrix $X$, matrix multiplication with $B$ should hopefully return a very similar matrix.
In order to find meaningful parameters for $B$, we define a quadratic loss:
$$ \min_B ||X - XB ||^2 + \lambda ||B||^2 $$Additionally, a regularization term is added to the loss in order keep the learned parameters small. What we are still lacking is a restriction which forces our autoencoder to learn something meaningful (in the current form the trivial solution would be simply an identity matrix which would not give us any new insight). The authors have the simple but ingenious idea to add a constraint that all diagonal elements of $B$ must be zero. What this means is that our autoencoder now has to learn whether item $i$ is present in the current basket while only given information about the presence or absence of all items except $i$.
The detailed solution procedure involves minimizing the loss function using Lagrangian multipliers to also consider the additional constraints. It can be found in the paper, so I will not discuss this further. Instead, let's write some Python code to see how well it actually works on some data!
Making some Movie Recommendations¶
Let's start by downloading some data of movie ratings.
In order to be able to use the code you need to install the following python libraries
pip install numpy pandas scipy tqdm
import numpy as np
import pandas as pd
from scipy.sparse import lil_matrix
from scipy import linalg
from tqdm import tqdm_notebook as tqdm
If the next three lines don't work for you, you can also just download the file using your browser and unzip it manually.
movielens_url = "http://files.grouplens.org/datasets/movielens/ml-20m.zip"
!wget {movielens_url}
!unzip ml-20m.zip
Next, we need to read the movie list and the list of ratings into pandas dataframes:
movies = pd.read_csv("ml-20m/movies.csv")
ratings = pd.read_csv("ml-20m/ratings.csv")
Next, we write a helper function which creates a sparse customer item matrix from the ratings dataframe
# create a mapping between the movie/user id and
# its index in the user item matrix
def create_mapping(unique_vals):
mapping = {}
for i, id_ in enumerate(unique_vals):
mapping[id_] = i
return mapping
def create_user_item_matrix(ratings, user_mapping, item_mapping):
ui_mat = lil_matrix((max(user_mapping.values()) + 1, max(item_mapping.values()) + 1))
for tup in tqdm(ratings.itertuples(), total=ratings.shape[0]):
ui_mat[user_mapping[tup.userId], item_mapping[tup.movieId]] = 1
return ui_mat.tocsr()
Creating the matrix should take a few minutes
user_mapping = create_mapping(ratings.userId.unique())
item_mapping = create_mapping(movies.movieId.unique())
ui_mat = create_user_item_matrix(ratings, user_mapping, item_mapping)
The next few lines are the actual optimization procedure. If we ignore the print lines, the whole algorithm needs only 6 lines of code!
def create_autoencoder(ui_mat, lambda_):
print("Create G")
G = (ui_mat.T @ ui_mat).toarray()
diag_indices = np.diag_indices_from(G)
G[diag_indices] += lambda_
print("Invert G")
# In order to save some memory, we overwrite our G matrix when
# calculating the inverse
B = linalg.inv(G, overwrite_a=True)
B /= (-np.diag(B))
B[diag_indices] = 0
return B
The whole procedure will take a few minutes. On my laptop with 8gb ram memory was sufficient. If you run into troubles, try and change the type of ui_mat from np.float32 to np.float16 and see if that helps
autoenc = create_autoencoder((ui_mat > 0).astype(np.float32), 500)
In order to be able to user our small recommendation engine, we write another helper function which, given a movie id or list of movie ids returns movie ids of similar movies
def recommend(movie_ids, item_mapping, N=10, filter_already_liked=False):
reverse_mapping = {v: k for k, v in item_mapping.items()}
if isinstance(movie_ids, int):
movie_ids = [movie_ids]
movie_ids = [item_mapping[id_] for id_ in movie_ids]
movievec = np.zeros(autoenc.shape[0], dtype=np.float32)
scores = np.zeros_like(movievec)
movievec[movie_ids] = 1
scores = movievec @ autoenc
#np.dot(movievec, autoenc, out=scores)
if filter_already_liked:
scores[movie_ids] = -100
indices = np.argsort(-scores)[:N]
return [reverse_mapping[idx] for idx in indices]
Let's see what kind of movies there are
movies.sample(20, random_state=3)
Let's see which movies are similar to High School Musical with id 46062
ids = recommend(46062, item_mapping)
movies.query("movieId in @ids")
This looks like some reasonable recommendations.