--- title: Replay Agents keywords: fastai sidebar: home_sidebar summary: "Replay Agents." description: "Replay Agents." nb_path: "nbs/rl/agents/rl.agents.replay_agents.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class ABTestReplayer[source]

ABTestReplayer(n_visits, n_test_visits, reward_history, item_col_name, visitor_col_name, reward_col_name, n_iterations=1)

A class to provide functionality for simulating the method on an A/B test.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

class EpsilonGreedyReplayer[source]

EpsilonGreedyReplayer(epsilon, n_visits, reward_history, item_col_name, visitor_col_name, reward_col_name, n_iterations=1)

A class to provide functionality for simulating the replayer method on an epsilon-Greedy bandit algorithm.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

class ThompsonSamplingReplayer[source]

ThompsonSamplingReplayer(n_visits, reward_history, item_col_name, visitor_col_name, reward_col_name, n_iterations=1)

A class to provide functionality for simulating the replayer method on a Thompson Sampling bandit algorithm

{% endraw %} {% raw %}
{% endraw %} {% raw %}

class ReplaySimulator[source]

ReplaySimulator(n_visits, reward_history, item_col_name, visitor_col_name, reward_col_name, n_iterations=1, random_seed=1)

A class to provide base functionality for simulating the replayer method for online algorithms.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

class UCBSamplingReplayer[source]

UCBSamplingReplayer(ucb_c, n_visits, reward_history, item_col_name, visitor_col_name, reward_col_name, n_iterations=1)

A class to provide functionality for simulating the replayer method on a Thompson Sampling bandit algorithm

{% endraw %} {% raw %}
{% endraw %} {% raw %}
# !pip install --upgrade --force-reinstall --no-deps kaggle
# !mkdir ~/.kaggle
# !cp /content/drive/MyDrive/kaggle.json ~/.kaggle/
# !chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d saurav9786/amazon-product-reviews
!unzip amazon-product-reviews.zip
Collecting kaggle
  Downloading kaggle-1.5.12.tar.gz (58 kB)
     |████████████████████████████████| 58 kB 2.7 MB/s 
Building wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... done
  Created wheel for kaggle: filename=kaggle-1.5.12-py3-none-any.whl size=73051 sha256=af09991cd9752149427a8c7425a45ec5996a6496469a41710e1d6a123ccd647e
  Stored in directory: /root/.cache/pip/wheels/62/d6/58/5853130f941e75b2177d281eb7e44b4a98ed46dd155f556dc5
Successfully built kaggle
Installing collected packages: kaggle
  Attempting uninstall: kaggle
    Found existing installation: kaggle 1.5.12
    Uninstalling kaggle-1.5.12:
      Successfully uninstalled kaggle-1.5.12
Successfully installed kaggle-1.5.12
Downloading amazon-product-reviews.zip to /content/recohut
 85% 93.0M/109M [00:00<00:00, 117MB/s]
100% 109M/109M [00:00<00:00, 115MB/s] 
Archive:  amazon-product-reviews.zip
  inflating: ratings_Electronics (1).csv  
{% endraw %} {% raw %}
import pandas as pd

header_list = ["User_ID", "Product_ID", "Rating", "Time_Stamp"]
rating_df = pd.read_csv('ratings_Electronics (1).csv', names=header_list)

reward_threshold = 4
rating_df['reward'] = rating_df.eval('Rating > @reward_threshold').astype(int)

n_visits = 500
n_iterations = 1
n_test_visits = 100

reward_history = rating_df[:1000]
item_col_name = 'Product_ID'
visitor_col_name = 'User_ID'
reward_col_name = 'reward'
{% endraw %} {% raw %}
print("A/B Test Simulations...starts...!!!")

ab_results = ABTestReplayer(n_visits, n_test_visits, reward_history,
                            item_col_name, visitor_col_name, reward_col_name,
                            n_iterations=n_iterations).simulator()

ab_results_df = pd.DataFrame(ab_results)
ab_results_df.to_csv('ab_results_df.csv')
A/B Test Simulations...starts...!!!
100%|██████████| 1/1 [01:09<00:00, 69.08s/it]
{% endraw %} {% raw %}
print("Epsilon - Greedy Simulations...starts...!!!")

epsilon = 0.05
epsilon_results = EpsilonGreedyReplayer(epsilon, n_visits, reward_history,
                                        item_col_name, visitor_col_name, reward_col_name,
                                        n_iterations=n_iterations).simulator()

epsilon_results_df = pd.DataFrame(epsilon_results)
epsilon_results_df.to_csv('epsilon_results_df.csv')
Epsilon - Greedy Simulations...starts...!!!
100%|██████████| 1/1 [00:02<00:00,  2.76s/it]
{% endraw %} {% raw %}
print("Thompson Sampling Simulations...starts...!!!")

thompson_results = ThompsonSamplingReplayer(n_visits, reward_history,
                                            item_col_name, visitor_col_name, reward_col_name,
                                            n_iterations=n_iterations).simulator()

thompson_results_df = pd.DataFrame(thompson_results)
thompson_results_df.to_csv('thompson_results_df.csv')
Thompson Sampling Simulations...starts...!!!
100%|██████████| 1/1 [15:27<00:00, 927.84s/it]
{% endraw %} {% raw %}
print("Upper Confidence Bounds Simulations...starts...!!!")

ucb = 2
ucb_results = UCBSamplingReplayer(ucb, n_visits, reward_history,
                                  item_col_name, visitor_col_name, reward_col_name,
                                  n_iterations=n_iterations).simulator()

ucb_results_df = pd.DataFrame(ucb_results)
ucb_results_df.to_csv('ucb_results_df.csv')
Upper Confidence Bounds Simulations...starts...!!!
100%|██████████| 1/1 [15:58<00:00, 958.75s/it]
{% endraw %} {% raw %}
ucb_results_df = pd.read_csv('ucb_results_df.csv').drop('Unnamed: 0', axis=1)
thompson_results_df = pd.read_csv('thompson_results_df.csv').drop('Unnamed: 0', axis=1)
epsilon_results_df = pd.read_csv('epsilon_results_df.csv').drop('Unnamed: 0', axis=1)
ab_results_df = pd.read_csv('ab_results_df.csv').drop('Unnamed: 0', axis=1)

#Grouping the each data frame with visit with mean
ucb_avg_results_df = ucb_results_df.groupby('visit', as_index=False).mean()
thompson_avg_results_df = thompson_results_df.groupby('visit', as_index=False).mean()
epsilon_avg_results_df = epsilon_results_df.groupby('visit', as_index=False).mean()
ab_avg_results_df = ab_results_df.groupby('visit', as_index=False).mean()

# using a color-blind friendly palette with 10 colors
color_blind_palette_10 = ['#cfcfcf', '#ffbc79', '#a2c8ec', '#898989', '#c85200',
                          '#5f9ed1', '#595959', '#ababab', '#ff800e', '#006ba4']

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12,10))

for (avg_results_df, style) in [(ucb_avg_results_df, 'r-'),
                                (thompson_avg_results_df, 'g--'),
                                (epsilon_avg_results_df, 'b-'),
                                (ab_avg_results_df, 'y--')]:
    
    ax.plot(avg_results_df.visit, avg_results_df.fraction_relevant, style, linewidth=3.5)


ax.set_title('Percentage of Liked Recommendations')
ax.set_xlabel('Recommendation #')
ax.set_ylabel('% of Recs Clicked')

#ax.set_xticks(range(0,22000,5000))
#ax.set_ylim(0.2, 0.6)
#ax.set_yticks(np.arange(0.2, 0.7, 0.1))

#rescale the y-axis tick labels to show them as a percentage
ax.set_yticklabels((ax.get_yticks()*100).astype(int))

ax.legend(['UCB ',
           'Thompson Sampling',
           '$\epsilon$ Greedy',
           'A/B Test'
          ],
          loc='lower right'
         )

plt.tight_layout()
plt.show()
{% endraw %}

From the above it is clear that Thompson smapling of multi arm bandit outperforms A/B testing. In the lower samples the A/B test perform better than all other algorithim, but as in when the number samples increase the thompson sampling starts performing beter and better.