--- title: DDPG keywords: fastai sidebar: home_sidebar summary: "An implementation of DDPG, Deep Deterministic Policy Gradient." description: "An implementation of DDPG, Deep Deterministic Policy Gradient." nb_path: "nbs/rl/agents/rl.agents.ddpg.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class DDPGAgent[source]

DDPGAgent(config, noise:OUNoise, group2members_dict:dict, verbose=False)

DDPG (Deep Deterministic Policy Gradient) Agent

{% endraw %} {% raw %}
{% endraw %} {% raw %}
class Config(object):
    tau = 1e-3
    gamma = 0.9
    embedding_size = 32
    item_num = 5
    user_num = 5
    actor_hidden_sizes = (128, 64)
    critic_hidden_sizes = (32, 16)
    batch_size = 64
    embedding_weight_decay = 1e-6
    actor_weight_decay = 1e-6
    critic_weight_decay = 1e-6
    embedding_learning_rate = 1e-4
    actor_learning_rate = 1e-4
    critic_learning_rate = 1e-4
    device = torch.device("cpu")
    history_length = 5
    buffer_size = 100
    state_size = history_length + 1
    action_size = 1
    embedded_state_size = state_size * embedding_size
    embedded_action_size = action_size * embedding_size
{% endraw %} {% raw %}
config = Config()
{% endraw %} {% raw %}
noise = OUNoise(embedded_action_size = 32,
                ou_mu = 0.0,
                ou_theta = 0.15,
                ou_sigma = 0.2,
                ou_epsilon = 1.0,
)

group2members_dict = {'0':[1,2,3], '1':[1,4,5]}

agent = DDPGAgent(config=config, noise=noise, group2members_dict=group2members_dict, verbose=True)
GroupEmbedding(
  (user_embedding): Embedding(6, 32)
  (item_embedding): Embedding(6, 32)
  (user_attention): Sequential(
    (0): Linear(in_features=32, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=1, bias=True)
  )
  (user_softmax): Softmax(dim=-1)
)
Actor(
  (net): Sequential(
    (0): Linear(in_features=192, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=64, bias=True)
    (3): ReLU()
    (4): Linear(in_features=64, out_features=32, bias=True)
  )
)
Critic(
  (net): Sequential(
    (0): Linear(in_features=224, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=16, bias=True)
    (3): ReLU()
    (4): Linear(in_features=16, out_features=1, bias=True)
  )
)
{% endraw %} {% raw %}

class DDPG[source]

DDPG(actor, actor_optim, critic, critic_optim, tau=0.001, gamma=0.99, policy_delay=1, item_embeds=None, device=device(type='cpu')) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

{% endraw %} {% raw %}
{% endraw %}