ULMFineTuner
(learn
, name_fmt
='cls_stage_{}'
)
Fine tune language model using ULM Fit procedure. I noticed the built-in
`fine_tune` method does not unfreeze 1 layer at a time as the paper
describes - not sure if they found that to be a better practice or if it's
just simpler for an automated method.
Originally, part of the reason for building this was to also decrease the
batch size at each stage since unfreezing eats up more memory with stored
gradients. However, I decided I'd rather not have to account for changing
batch size when selecting each stage's LR (we could run lr_find before each
stage but I opted for the simpler approach).