--- title: Incendio keywords: fastai sidebar: home_sidebar summary: "The basics for building and training models are contained in this module." ---
%load_ext autoreload
%autoreload 2
%matplotlib inline
# Used in notebook but not needed in package.
import numpy as np
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

from htools import assert_raises

At training time, we will typically want to put the model and the current mini batch on the GPU. When developing on a CPU, a GPU isn't available, so we define a variable that will automatically find the right device.

DEVICE
device(type='cpu')

class Trainer[source]

Trainer(net, ds_train, ds_val, dl_train, dl_val, criterion, mode:('binary', 'multiclass', 'regression'), out_dir, bucket=None, optim_type='Adam', eps=0.001, last_act=None, threshold=0.5, metrics=None, callbacks=None, device=device(type='cpu')) :: LoggerMixin

Mixin class that configures and returns a logger.

Examples

class Foo(LoggerMixin):

def __init__(self, a, log_file):
    self.a = a
    self.log_file = log_file
    self.logger = self.get_logger(log_file)

def walk(self, location):
    self.logger.info(f'walk received argument {location}')
    return f'walking to {location}'

BaseModel allows models to freeze/unfreeze layers and provides several methods for weight diagnostics. It should not be instantiated directly, but used as a parent class for a model. Like all PyTorch models, its children will still need to call super().__init__() and implement a forward() method.

class BaseModel[source]

BaseModel() :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

class SimpleModel(BaseModel):
    
    def __init__(self, dim):
        super().__init__()  
        self.fc1 = nn.Linear(dim, 2)
        self.fc2 = nn.Linear(2, 1)
        
    def forward(self, x):
        x = F.leaky_relu(self.fc1(x))
        return self.fc2(x)
class GroupedModel(BaseModel):
    
    def __init__(self, dim):
        super().__init__()  
        g1 = nn.Sequential(
             nn.Linear(dim, 8),
             nn.LeakyReLU(),
             nn.Linear(8, 4),
             nn.LeakyReLU()
        )
        g2 = nn.Linear(4, 1)
        self.groups = nn.ModuleList([g1, g2])
        
    def forward(self, x):
        for group in self.groups:
            x = group(x)
        return x
snet = SimpleModel(2)
snet.freeze()
for n in range(5):
    snet.unfreeze(n_layers=n)
    unfrozen = [x[1] for x in snet.trainable()]
    print('Unfrozen', unfrozen)
    assert sum(unfrozen) == n
    assert not any(unfrozen[:-n])
Unfrozen [False, False, False, False]
Unfrozen [False, False, False, True]
Unfrozen [False, False, True, True]
Unfrozen [False, True, True, True]
Unfrozen [True, True, True, True]
snet.freeze()
with assert_raises(AttributeError) as ar:
    for n in range(3):
        snet.unfreeze(n_groups=n)
As expected, got AttributeError('SimpleModel' object has no attribute 'groups').
gnet = GroupedModel(DIM)
gnet.freeze()
n_unfrozen = [0, 2, 6]
for n, nu in zip(range(3), n_unfrozen):
    gnet.unfreeze(n_groups=n)
    unfrozen = [x[1] for x in gnet.trainable()]
    print('Unfrozen', unfrozen)
    assert sum(unfrozen) == nu
Unfrozen [False, False, False, False, False, False]
Unfrozen [False, False, False, False, True, True]
Unfrozen [True, True, True, True, True, True]
gnet.freeze()
for n in range(7):
    gnet.unfreeze(n_layers=n)
    unfrozen = [x[1] for x in gnet.trainable()]
    print('Unfrozen', unfrozen)
    assert sum(unfrozen) == n
    assert not any(unfrozen[:-n])
Unfrozen [False, False, False, False, False, False]
Unfrozen [False, False, False, False, False, True]
Unfrozen [False, False, False, False, True, True]
Unfrozen [False, False, False, True, True, True]
Unfrozen [False, False, True, True, True, True]
Unfrozen [False, True, True, True, True, True]
Unfrozen [True, True, True, True, True, True]

Optimizers

Optimizers like Adam or RMSProp can contain multiple "parameter groups", each with a different learning rate. (Other hyperparameters can vary as well, but we ignore that for now.) The functions below allow us to get a new optimizer or update an existing one. It allows us to easily use differential learning rate, but that is not required: it can also use the same LR for each parameter group.

variable_lr_optimizer[source]

variable_lr_optimizer(model, lr=0.003, lr_mult=1.0, optimizer='Adam', eps=0.001, **kwargs)

Get an optimizer that uses different learning rates for different layer groups. Additional keyword arguments can be used to alter momentum and/or weight decay, for example, but for the sake of simplicity these values will be the same across layer groups.

Parameters

model: nn.Module A model object. If you intend to use differential learning rates, the model must have an attribute groups containing a ModuleList of layer groups in the form of Sequential objects. The number of layer groups must match the number of learning rates passed in. lr: float, Iterable[float] A number of list of numbers containing the learning rates to use for each layer group. There should generally be one LR for each layer group in the model. If fewer LR's are provided, lr_mult will be used to compute additional LRs. See update_optimizer for details. optimizer: torch optimizer The Torch optimizer to be created (Adam by default). eps: float Hyperparameter used by optimizer. The default of 1e-8 can lead to exploding gradients, so we typically override this.

Examples

optim = variable_lr_optimizer(model, lrs=[3e-3, 3e-2, 1e-1])

update_optimizer[source]

update_optimizer(optim, lrs, lr_mult=1.0)

Pass in 1 or more learning rates, 1 for each layer group, and update the optimizer accordingly. The optimizer is updated in place so nothing is returned.

Parameters

optim: torch.optim Optimizer object. lrs: float, Iterable[float] One or more learning rates. If using multiple values, usually the earlier values will be smaller and later values will be larger. This can be achieved by passing in a list of LRs that is the same length as the number of layer groups in the optimizer, or by passing in a single LR and a value for lr_mult. lr_mult: float If you pass in fewer LRs than layer groups, lr_mult will be used to compute additional learning rates from the one that was passed in.

Returns

None

Examples

If optim has 3 layer groups, this will result in LRs of [3e-5, 3e-4, 3e-3] in that order: update_optimizer(optim, lrs=3e-3, lr_mult=0.1)

Again, optim has 3 layer groups. We leave the default lr_mult of 1.0 so each LR will be 3e-3. update_optimizer(optim, lrs=3e-3)

Again, optim has 3 layer groups. 3 LRs are passed in so lr_mult is unused. update_optimizer(optim, lrs=[1e-3, 1e-3, 3e-3])

optim = variable_lr_optimizer(snet, 2e-3)
print(optim)

with assert_raises(ValueError) as ar:
    optim = variable_lr_optimizer(snet, [3e-3, 1e-1])
    optim
As expected, got ValueError(Received more learning rates than layer groups.).
update_optimizer(optim, 1e-3, 0.5)
assert len(optim.param_groups) == 1
assert optim.param_groups[0]['lr'] == 1e-3
lrs = [1e-3, 3e-3]
optim = variable_lr_optimizer(gnet, lrs)
print(optim)
assert [group['lr'] for group in optim.param_groups] == lrs

update_optimizer(optim, 2e-3, lr_mult=1/3)
print([group['lr'] for group in optim.param_groups])
assert np.isclose(optim.param_groups[1]['lr'], optim.param_groups[0]['lr'] * 3)
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 0.001
    lr: 0.001
    weight_decay: 0

Parameter Group 1
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 0.001
    lr: 0.003
    weight_decay: 0
)
[0.0006666666666666666, 0.002]
optim = variable_lr_optimizer(gnet, 1e-3, lr_mult=0.5)
print([group['lr'] for group in optim.param_groups])
assert np.isclose(optim.param_groups[1]['lr'], optim.param_groups[0]['lr'] * 2)
[0.0005, 0.001]