--- title: Prod2Vec keywords: fastai sidebar: home_sidebar summary: "Implementation of Prod2vec model." description: "Implementation of Prod2vec model." nb_path: "nbs/models/models.prod2vec.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class Prod2Vec[source]

Prod2Vec(min_count=2, negative=5, size=100, window=5, decay_alpha=0.9)

Implementation of the Prod2Vec skipgram model from Grbovic Mihajlo, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Jaikit Savla, Varun Bhagwan, and Doug Sharp. "E-commerce in your inbox: Product recommendations at scale." In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1809-1818. ACM, 2015.

{% endraw %} {% raw %}
{% endraw %} {% raw %}
import pandas as pd

!wget -q --show-progress https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx

df = pd.read_excel('Online Retail.xlsx')
Online Retail.xlsx  100%[===================>]  22.62M  10.7MB/s    in 2.1s    
{% endraw %} {% raw %}
df.dropna(inplace=True)

# Convert the StockCode to string datatype
df['StockCode']= df['StockCode'].astype(str)

# Check out the number of unique customers in our dataset
customers = df["CustomerID"].unique().tolist()

# shuffle customer ID's
import random
random.shuffle(customers)

# extract 90% of customer ID's
customers_train = [customers[i] for i in range(round(0.9*len(customers)))]

# split data into train and validation set
train_df = df[df['CustomerID'].isin(customers_train)]
validation_df = df[~df['CustomerID'].isin(customers_train)]

# list to capture purchase history of the customers
purchases_train = []

# populate the list with the product codes
from tqdm.notebook import tqdm
for i in tqdm(customers_train):
    temp = train_df[train_df["CustomerID"] == i]["StockCode"].tolist()
    purchases_train.append(temp)

# list to capture purchase history of the customers
purchases_val = []

# populate the list with the product codes
for i in tqdm(validation_df['CustomerID'].unique()):
    temp = validation_df[validation_df["CustomerID"] == i]["StockCode"].tolist()
    purchases_val.append(temp)
{% endraw %} {% raw %}
# train word2vec model
model = Prod2Vec(window=10, negative=5, size=100, min_count=2)
model.fit(purchases_train)
{% endraw %} {% raw %}
import warnings
warnings.filterwarnings('ignore')

products = train_df[["StockCode", "Description"]]

# remove duplicates
products.drop_duplicates(inplace=True, subset='StockCode', keep="last")

# create product-ID and product-description dictionary
products_dict = products.groupby('StockCode')['Description'].apply(list).to_dict()
{% endraw %} {% raw %}
random_sample = products.sample(1).values
recommendations = [[products_dict[a][0], b] for a,b in model.recommend(user_profile=random_sample[:,0])]

print(random_sample[:,1])
print(' ')
for rec in recommendations: print(rec)
['SET OF 3 BABUSHKA STACKING TINS']
 
['EDWARDIAN HEART PHOTO FRAME', 0.3702189028263092]
['SET OF 6 VINTAGE NOTELETS KIT', 0.34610092639923096]
['FRENCH STYLE STORAGE JAR JAM', 0.3301945626735687]
['BAG 500g SWIRLY MARBLES', 0.3177795708179474]
['SPOTTY BUNTING', 0.30998745560646057]
{% endraw %} {% raw %}
random_sample = products.sample(5).values
recommendations = [[products_dict[a][0], b] for a,b in model.recommend(user_profile=random_sample[:,0])]

print(random_sample[:,1])
print(' ')
for rec in recommendations: print(rec)
['SET OF 5 LUCKY CAT MAGNETS ' 'HEARTS GIFT TAPE'
 'PAINTED YELLOW WOODEN DAISY' 'COLOURFUL FLOWER FRUIT BOWL'
 'TUSCAN VILLA BIRD FEEDER']
 
['PAINTED YELLOW WOODEN DAISY', 0.498282253742218]
['HEARTS GIFT TAPE', 0.4829496443271637]
['TUSCAN VILLA BIRD FEEDER', 0.34984883666038513]
['STRAWBERRY RAFFIA FOOD COVER', 0.3352939486503601]
['IVORY PAPER CUP CAKE CASES ', 0.3215782642364502]
{% endraw %}