Corefunctions

Corefunctionality for data preparation of sequential data for pytorch, fastai models

5. Dataloaders Creation

A Datasets combines all implemented components on item level.


source

pad_sequence

 pad_sequence (batch, sorting=False)

collate_fn for padding of sequences of different lengths, use in before_batch of databunch, still quite slow

5.1 Low-Level with Transforms

from nbdev.config import get_config
from tsfast.data.core import CreateDict, ValidClmContains,DfHDFCreateWindows
project_root = get_config().config_file.parent
f_path = project_root / 'test_data/WienerHammerstein'
hdf_files = get_files(f_path,extensions='.hdf5',recurse=True)
tfm_src = CreateDict([ValidClmContains(['valid']),DfHDFCreateWindows(win_sz=100+1,stp_sz=10,clm='u')])
src_dicts = tfm_src(hdf_files)
tfm_src = CreateDict([ValidClmContains(['valid']),DfHDFCreateWindows(win_sz=100+1,stp_sz=10,clm='u')])
src_dicts = tfm_src(hdf_files)

tfms=[  [HDF2Sequence(['u','y']),SeqSlice(l_slc=1),toTensorSequencesInput],
        [HDF2Sequence(['y']),SeqSlice(r_slc=-1),toTensorSequencesOutput]]
splits = PercentageSplitter()([x['path'] for x in src_dicts])
dsrc = Datasets(src_dicts,tfms=tfms,splits=splits)
# %%timeit
# dsrc[0]
db = dsrc.dataloaders(bs=128,after_batch=[SeqNoiseInjection(std=[1.1,0.01]),Normalize(axes=[0,1])],before_batch=pad_sequence)
db.one_batch()[0].shape
torch.Size([128, 100, 2])

5.2 Mid-Level with Datablock API


source

SequenceBlock

 SequenceBlock (seq_extract, padding=False)

A basic wrapper that links defaults transforms for the data block API

seq = DataBlock(blocks=(SequenceBlock.from_hdf(['u','y'],TensorSequencesInput,padding=True,cached=None),
                        SequenceBlock.from_hdf(['y'],TensorSequencesOutput,cached=None)),
                get_items=tfm_src,
                splitter=ApplyToDict(ParentSplitter()))
dls = seq.dataloaders(hdf_files)

source

ScalarBlock

 ScalarBlock (scl_extract)

A basic wrapper that links defaults transforms for the data block API


source

ScalarNormalize

 ScalarNormalize (mean=None, std=None, axes=(0,))

A transform with a __repr__ that shows its attrs