thelper.train package¶
Trainer package.
This package contains classes specialized for training models on various tasks.
Submodules¶
thelper.train.base module¶
Model training/evaluation base interface module.
This module contains the interface required to train and/or evaluate a model based on different tasks. The trainers based on this interface are instantiated in launched sessions based on configuration dictionaries.
-
class
thelper.train.base.
Trainer
(session_name, save_dir, model, task, loaders, config, ckptdata=None)[source]¶ Bases:
object
Abstract trainer interface that defines basic session i/o and setup operations.
This interface defines the general behavior of a training session which includes configuration parsing, tensorboard setup, metrics and goal setup, and loss/optimizer setup. It also provides utilities for uploading models and tensors on specific devices, and for saving the state of a session. This interface should be specialized for every task by implementing the
train_epoch
andeval_epoch
functions in a derived class. Seethelper.train.classif.ImageClassifTrainer
for an example.The main parameters that will be parsed by this interface from a configuration dictionary are the following:
epochs
(mandatory if training): number of epochs to train for; one epoch is one iteration over all mini-batches.optimization
(mandatory if training): sub-dictionary containing types and extra parameters required for instantiating the loss, optimizer, and scheduler objects. See the code of each related loading function for more information on special parameters.save_freq
(optional, default=1): checkpoint save frequency (will save every epoch multiple of given number).save_raw
(optional, default=True): specifies whether to save raw types or thelper objects in checkpoints.use_tbx
(optional, default=False): defines whether to use tensorboardX writers for logging or not.device
(optional): specifies which device to train/evaluate the model on (default=all available).metrics
: list of metrics to instantiate and update during training/evaluation; see related loading function for more information.monitor
: specifies the name of the metric that should be monitored on the validation set for model improvement.
Example configuration file:
# ... "trainer": { # type of trainer to instantiate (linked to task type) "type": "thelper.train.ImageClassifTrainer", # train for 40 epochs "epochs": 40, # save every 5 epochs "save_freq": 5, # monitor validation accuracy and save best model based on that "monitor": "accuracy", # optimization parameters block "optimization": { # all types & params below provided by PyTorch "loss": { "type": "torch.nn.CrossEntropyLoss" }, "optimizer": { "type": "torch.optim.SGD", "params": { "lr": 0.1, "momentum": 0.9, "weight_decay": 1e-06, "nesterov": true } }, "scheduler": { "type": "torch.optim.lr_scheduler.StepLR", "params": { "step_size": 10, "step_size": 0.1 } } }, # in this example, we use two consumers in total # (one metric for monitoring, and one for logging) "metrics": { "accuracy": { "type": "thelper.optim.Accuracy" }, "fullreport": { "type": "thelper.train.ClassifReport" } } } # ...
- Variables
logger – used to output debug/warning/error messages to session log.
name – name of the session, used for printing and creating log folders.
epochs – number of epochs to train the model for.
optimization_config – dictionary of optim-related parameters, parsed at training time.
save_freq – frequency of checkpoint saves while training (i.e. save every X epochs).
save_raw – specifies whether to save raw types or thelper objects in checkpoints.
checkpoint_dir – session checkpoint output directory (located within ‘save_dir’).
use_tbx – defines whether to use tensorboardX writers for logging or not.
model – model to train; will be uploaded to target device(s) at runtime.
config – full configuration dictionary of the session; will be incorporated into all saved checkpoints.
devices – list of (cuda) device IDs to upload the model/tensors to; can be empty if only the CPU is available.
monitor – name of the training/validation metric that should be monitored for model improvement.
TODO: move static utils to their related modules
See also
-
__init__
(session_name, save_dir, model, task, loaders, config, ckptdata=None)[source]¶ Receives the trainer configuration dictionary, parses it, and sets up the session.
-
eval
()[source]¶ Starts the evaluation process.
This function will evaluate the model using the test data (or the validation data, if no test data is available), and return the results. Note that the code related to the forwarding of samples inside the model itself is implemented in a derived class via
thelper.train.base.Trainer.eval_epoch()
.
-
abstract
eval_epoch
(model, epoch, dev, loader, metrics)[source]¶ Evaluates the model using the provided objects.
- Parameters
model – the model to evaluate that is already uploaded to the target device(s).
epoch – the epoch index we are training for (0-based).
dev – the target device that tensors should be uploaded to.
loader – the data loader used to get transformed valid/test samples.
metrics – the dictionary of metrics/consumers to update every iteration.
-
train
()[source]¶ Starts the training process.
This function will train the model until the required number of epochs is reached, and then evaluate it on the test data. The setup of loggers, tensorboard writers is done here, so is model improvement tracking via monitored metrics. However, the code related to loss computation and back propagation is implemented in a derived class via
thelper.train.base.Trainer.train_epoch()
.
-
abstract
train_epoch
(model, epoch, dev, loss, optimizer, loader, metrics)[source]¶ Trains the model for a single epoch using the provided objects.
- Parameters
model – the model to train that is already uploaded to the target device(s).
epoch – the epoch index we are training for (0-based).
dev – the target device that tensors should be uploaded to.
loss – the loss function used to evaluate model fidelity.
optimizer – the optimizer used for back propagation.
loader – the data loader used to get transformed training samples.
metrics – the dictionary of metrics/consumers to update every iteration.
thelper.train.classif module¶
Classification trainer/evaluator implementation module.
-
class
thelper.train.classif.
ImageClassifTrainer
(session_name, save_dir, model, task, loaders, config, ckptdata=None)[source]¶ Bases:
thelper.train.base.Trainer
Trainer interface specialized for image classification.
This class implements the abstract functions of
thelper.train.base.Trainer
required to train/evaluate a model for image classification or recognition. It also provides a utility function for fetching i/o packets (images, class labels) from a sample, and that converts those into tensors for forwarding and loss estimation.See also
-
__init__
(session_name, save_dir, model, task, loaders, config, ckptdata=None)[source]¶ Receives session parameters, parses image/label keys from task object, and sets up metrics.
-
eval_epoch
(model, epoch, dev, loader, metrics)[source]¶ Evaluates the model using the provided objects.
- Parameters
model – the model to evaluate that is already uploaded to the target device(s).
epoch – the epoch index we are training for (0-based).
dev – the target device that tensors should be uploaded to.
loader – the data loader used to get transformed valid/test samples.
metrics – the dictionary of metrics/consumers to update every iteration.
-
train_epoch
(model, epoch, dev, loss, optimizer, loader, metrics)[source]¶ Trains the model for a single epoch using the provided objects.
- Parameters
model – the model to train that is already uploaded to the target device(s).
epoch – the epoch index we are training for (0-based).
dev – the target device that tensors should be uploaded to.
loss – the loss function used to evaluate model fidelity.
optimizer – the optimizer used for back propagation.
loader – the data loader used to get transformed training samples.
metrics – the dictionary of metrics/consumers to update every iteration.
-
thelper.train.detect module¶
Object detection trainer/evaluator implementation module.
-
class
thelper.train.detect.
ObjDetectTrainer
(session_name, save_dir, model, task, loaders, config, ckptdata=None)[source]¶ Bases:
thelper.train.base.Trainer
Trainer interface specialized for object detection.
This class implements the abstract functions of
thelper.train.base.Trainer
required to train/evaluate a model for object detection (i.e. 2D bounding box regression). It also provides a utility function for fetching i/o packets (input images, bounding boxes) from a sample, and that converts those into tensors for forwarding and loss estimation.See also
-
__init__
(session_name, save_dir, model, task, loaders, config, ckptdata=None)[source]¶ Receives session parameters, parses tensor/target keys from task object, and sets up metrics.
-
eval_epoch
(model, epoch, dev, loader, metrics)[source]¶ Evaluates the model using the provided objects.
- Parameters
model – the model to evaluate that is already uploaded to the target device(s).
epoch – the epoch index we are training for (0-based).
dev – the target device that tensors should be uploaded to.
loader – the data loader used to get transformed valid/test samples.
metrics – the dictionary of metrics/consumers to update every iteration.
-
train_epoch
(model, epoch, dev, loss, optimizer, loader, metrics)[source]¶ Trains the model for a single epoch using the provided objects.
- Parameters
model – the model to train that is already uploaded to the target device(s).
epoch – the epoch index we are training for (0-based).
dev – the target device that tensors should be uploaded to.
loss – the loss function used to evaluate model fidelity.
optimizer – the optimizer used for back propagation.
loader – the data loader used to get transformed training samples.
metrics – the dictionary of metrics/consumers to update every iteration.
-
thelper.train.regr module¶
Regression trainer/evaluator implementation module.
-
class
thelper.train.regr.
RegressionTrainer
(session_name, save_dir, model, task, loaders, config, ckptdata=None)[source]¶ Bases:
thelper.train.base.Trainer
Trainer interface specialized for generic (n-dim) regression.
This class implements the abstract functions of
thelper.train.base.Trainer
required to train/evaluate a model for generic regression (i.e. n-dim target value prediction). It also provides a utility function for fetching i/o packets (input tensors, target values) from a sample, and that converts those into tensors for forwarding and loss estimation.See also
-
__init__
(session_name, save_dir, model, task, loaders, config, ckptdata=None)[source]¶ Receives session parameters, parses tensor/target keys from task object, and sets up metrics.
-
eval_epoch
(model, epoch, dev, loader, metrics)[source]¶ Evaluates the model using the provided objects.
- Parameters
model – the model to evaluate that is already uploaded to the target device(s).
epoch – the epoch index we are training for (0-based).
dev – the target device that tensors should be uploaded to.
loader – the data loader used to get transformed valid/test samples.
metrics – the dictionary of metrics/consumers to update every iteration.
-
train_epoch
(model, epoch, dev, loss, optimizer, loader, metrics)[source]¶ Trains the model for a single epoch using the provided objects.
- Parameters
model – the model to train that is already uploaded to the target device(s).
epoch – the epoch index we are training for (0-based).
dev – the target device that tensors should be uploaded to.
loss – the loss function used to evaluate model fidelity.
optimizer – the optimizer used for back propagation.
loader – the data loader used to get transformed training samples.
metrics – the dictionary of metrics/consumers to update every iteration.
-
thelper.train.segm module¶
Segmentation trainer/evaluator implementation module.
-
class
thelper.train.segm.
ImageSegmTrainer
(session_name, save_dir, model, task, loaders, config, ckptdata=None)[source]¶ Bases:
thelper.train.base.Trainer
Trainer interface specialized for image segmentation.
This class implements the abstract functions of
thelper.train.base.Trainer
required to train/evaluate a model for image segmentation (i.e. pixel-level classification/labeling). It also provides a utility function for fetching i/o packets (images, class labels) from a sample, and that converts those into tensors for forwarding and loss estimation.See also
-
__init__
(session_name, save_dir, model, task, loaders, config, ckptdata=None)[source]¶ Receives session parameters, parses image/label keys from task object, and sets up metrics.
-
eval_epoch
(model, epoch, dev, loader, metrics)[source]¶ Evaluates the model using the provided objects.
- Parameters
model – the model to evaluate that is already uploaded to the target device(s).
epoch – the epoch index we are training for (0-based).
dev – the target device that tensors should be uploaded to.
loader – the data loader used to get transformed valid/test samples.
metrics – the dictionary of metrics/consumers to update every iteration.
-
train_epoch
(model, epoch, dev, loss, optimizer, loader, metrics)[source]¶ Trains the model for a single epoch using the provided objects.
- Parameters
model – the model to train that is already uploaded to the target device(s).
epoch – the epoch index we are training for (0-based).
dev – the target device that tensors should be uploaded to.
loss – the loss function used to evaluate model fidelity.
optimizer – the optimizer used for back propagation.
loader – the data loader used to get transformed training samples.
metrics – the dictionary of metrics/consumers to update every iteration.
-
thelper.train.utils module¶
Training/evaluation utilities module.
This module contains utilities and tools used to instantiate training sessions. It also contains
the prediction consumer interface used by metrics and loggers to receive iteration data during
training. See thelper.optim.metrics
for more information on metrics.
-
class
thelper.train.utils.
ClassifLogger
(top_k=1, conf_threshold=None, class_names=None, target_name=None, viz_count=0, report_count=None, log_keys=None, force_softmax=True)[source]¶ Bases:
thelper.train.utils.PredictionConsumer
Classification output logger.
This class provides a simple logging interface for accumulating and saving the predictions of a classifier. By default, all predictions will be logged. However, a confidence threshold can be set to focus on “hard” samples if necessary. It also optionally offers tensorboardX-compatible output images that can be saved locally or posted to tensorboard for browser-based visualization.
Usage examples inside a session configuration file:
# ... # lists all metrics to instantiate as a dictionary "metrics": { # ... # this is the name of the example consumer; it is used for lookup/printing only "logger": { # this type is used to instantiate the confusion matrix report object "type": "thelper.train.utils.ClassifLogger", "params": { # log the three 'best' predictions for each sample "top_k": 3, # keep updating a set of 10 samples for visualization via tensorboardX "viz_count": 10 } }, # ... } # ...
- Variables
top_k – number of ‘best’ predictions to keep for each sample (along with the gt label).
conf_threshold – threshold used to eliminate all but the most uncertain predictions.
class_names – holds the list of class label names provided by the dataset parser. If it is not provided when the constructor is called, it will be set by the trainer at runtime.
target_name – name of the targeted label (may be ‘None’ if all classes are used).
target_idx – index of the targeted label (may be ‘None’ if all classes are used).
viz_count – number of tensorboardX images to generate and update at each epoch.
report_count – number of samples to print in reports (use ‘None’ if all samples must be printed).
log_keys – list of metadata field keys to copy from samples into the log for each prediction.
force_softmax – specifies whether a softmax operation should be applied to the prediction scores obtained from the trainer.
score – array used to store prediction scores for logging.
true – array used to store groundtruth labels for logging.
meta – array used to store metadata pulled from samples for logging.
-
__init__
(top_k=1, conf_threshold=None, class_names=None, target_name=None, viz_count=0, report_count=None, log_keys=None, force_softmax=True)[source]¶ Receives the logging parameters & the optional class label names used to decorate the log.
-
render
()[source]¶ Returns an image of predicted outputs as a numpy-compatible RGBA image drawn by pyplot.
-
report
()[source]¶ Returns the logged metadata of predicted samples.
The returned object is a print-friendly CSV string that can be consumed directly by tensorboardX. Note that this string might be very long if the dataset is large (i.e. it will contain one line per sample).
-
set_class_names
(class_names)[source]¶ Sets the class label names that must be predicted by the model.
-
update
(task, input, pred, target, sample, loss, iter_idx, max_iters, epoch_idx, max_epochs, **kwargs)[source]¶ Receives the latest predictions and target values from the training session.
The exact signature of this function should match the one of the callbacks defined in
thelper.train.base.Trainer
and specified bythelper.typedefs.IterCallbackParams
.
-
class
thelper.train.utils.
ClassifReport
(class_names=None, sample_weight=None, digits=4)[source]¶ Bases:
thelper.train.utils.PredictionConsumer
Classification report interface.
This class provides a simple interface to
sklearn.metrics.classification_report
so that all count-based metrics can be reported at once under a string-based representation.Usage example inside a session configuration file:
# ... # lists all metrics to instantiate as a dictionary "metrics": { # ... # this is the name of the example consumer; it is used for lookup/printing only "classifreport": { # this type is used to instantiate the classification report object "type": "thelper.train.utils.ClassifReport", # we do not need to provide any parameters to the constructor, defaults are fine "params": {} }, # ... } # ...
- Variables
gen_report – report generator function, called at evaluation time to generate the output string.
class_names – holds the list of class label names provided by the dataset parser. If it is not provided when the constructor is called, it will be set by the trainer at runtime.
pred – queue used to store the top-1 (best) predicted class indices at each iteration.
gt – queue used to store the groundtruth class indices at each iteration.
-
__init__
(class_names=None, sample_weight=None, digits=4)[source]¶ Receives the optional class names and arguments passed to the report generator function.
- Parameters
class_names – holds the list of class label names provided by the dataset parser. If it is not provided when the constructor is called, it will be set by the trainer at runtime.
sample_weight – sample weights, forwarded to
sklearn.metrics.classification_report
.digits – metrics output digit count, forwarded to
sklearn.metrics.classification_report
.
-
set_class_names
(class_names)[source]¶ Sets the class label names that must be predicted by the model.
-
update
(task, input, pred, target, sample, loss, iter_idx, max_iters, epoch_idx, max_epochs, **kwargs)[source]¶ Receives the latest predictions and target values from the training session.
The exact signature of this function should match the one of the callbacks defined in
thelper.train.base.Trainer
and specified bythelper.typedefs.IterCallbackParams
.
-
class
thelper.train.utils.
ConfusionMatrix
(class_names=None, draw_normalized=True)[source]¶ Bases:
thelper.train.utils.PredictionConsumer
Confusion matrix report interface.
This class provides a simple interface to
sklearn.metrics.confusion_matrix
so that a full confusion matrix can be easily reported under a string-based representation. It also offers a tensorboardX-compatible output image that can be saved locally or posted to tensorboard for browser-based visualization.Usage example inside a session configuration file:
# ... # lists all metrics to instantiate as a dictionary "metrics": { # ... # this is the name of the example consumer; it is used for lookup/printing only "confmat": { # this type is used to instantiate the confusion matrix report object "type": "thelper.train.utils.ConfusionMatrix", # we do not need to provide any parameters to the constructor, defaults are fine "params": {} }, # ... } # ...
- Variables
matrix – report generator function, called at evaluation time to generate the output string.
class_names – holds the list of class label names provided by the dataset parser. If it is not provided when the constructor is called, it will be set by the trainer at runtime.
draw_normalized – defines whether rendered confusion matrices should be normalized or not.
pred – queue used to store the top-1 (best) predicted class indices at each iteration.
gt – queue used to store the groundtruth class indices at each iteration.
-
__init__
(class_names=None, draw_normalized=True)[source]¶ Receives the optional class label names used to decorate the output string.
- Parameters
class_names – holds the list of class label names provided by the dataset parser. If it is not provided when the constructor is called, it will be set by the trainer at runtime.
draw_normalized – defines whether rendered confusion matrices should be normalized or not.
-
set_class_names
(class_names)[source]¶ Sets the class label names that must be predicted by the model.
-
update
(task, input, pred, target, sample, loss, iter_idx, max_iters, epoch_idx, max_epochs, **kwargs)[source]¶ Receives the latest predictions and target values from the training session.
The exact signature of this function should match the one of the callbacks defined in
thelper.train.base.Trainer
and specified bythelper.typedefs.IterCallbackParams
.
-
class
thelper.train.utils.
PredictionCallback
(callback_func, callback_kwargs=None)[source]¶ Bases:
thelper.train.utils.PredictionConsumer
Callback function wrapper compatible with the consumer interface.
This interface is used to hide user-defined callbacks into the list of prediction consumers given to trainer implementations. The callbacks must always be compatible with the list of arguments defined by thelper.typedefs.IterCallbackParams, but may also receive extra arguments defined in advance and passed to the constructor of this class.
- Variables
callback_func – user-defined function to call on every update from the trainer.
callback_kwargs – user-defined extra arguments to provide to the callback function.
-
class
thelper.train.utils.
PredictionConsumer
[source]¶ Bases:
abc.ABC
Abstract model prediction consumer class.
This interface defines basic functions required so that
thelper.train.base.Trainer
can figure out how to instantiate and update a model prediction consumer. The most notable class derived from this interface isthelper.optim.metrics.Metric
which is used to monitor the improvement of a model during a training session. Other prediction consumers defined inthelper.train.utils
will instead log predictions to local files, create graphs, etc.-
reset
()[source]¶ Resets the internal state of the consumer.
May be called for example by the trainer between two evaluation epochs. The default implementation does nothing, and if a reset behavior is needed, it should be implemented by the derived class.
-
abstract
update
(task, input, pred, target, sample, loss, iter_idx, max_iters, epoch_idx, max_epochs, **kwargs)[source]¶ Receives the latest prediction and groundtruth tensors from the training session.
The data given here will be “consumed” internally, but it should NOT be modified. For example, a classification accuracy metric would accumulate the correct number of predictions in comparison to groundtruth labels, while a plotting logger would add new corresponding dots to a curve.
Remember that input, prediction, and target tensors received here will all have a batch dimension!
The exact signature of this function should match the one of the callbacks defined in
thelper.train.base.Trainer
and specified bythelper.typedefs.IterCallbackParams
.
-
-
thelper.train.utils.
create_consumers
(config)[source]¶ Instantiates and returns the prediction consumers defined in the configuration dictionary.
All arguments are expected to be handed in through the configuration via a dictionary named ‘params’.
-
thelper.train.utils.
create_trainer
(session_name, save_dir, config, model, task, loaders, ckptdata=None)[source]¶ Instantiates the trainer object based on the type contained in the config dictionary.
The trainer type is expected to be in the configuration dictionary’s trainer field, under the type key. For more information on the configuration, refer to
thelper.train.base.Trainer
. The instantiated type must be compatible with the constructor signature ofthelper.train.base.Trainer
. The object’s constructor will be given the full config dictionary and the checkpoint data for resuming the session (if available).If the trainer type is missing, it will be automatically deduced based on the task object.
- Parameters
session_name – name of the training session used for printing and to create internal tensorboardX directories.
save_dir – path to the session directory where logs and checkpoints will be saved.
config – full configuration dictionary that will be parsed for trainer parameters and saved in checkpoints.
model – model to train/evaluate; should be compatible with
thelper.nn.utils.Module
.task – global task interface defining the type of model and training goal for the session.
loaders – a tuple containing the training/validation/test data loaders (a loader can be
None
if empty).ckptdata – raw checkpoint to parse data from when resuming a session (if
None
, will start from scratch).
- Returns
The fully-constructed trainer object, ready to begin model training/evaluation.
See also