Data Generators

DataGenerator

class sconce.data_generators.DataGenerator(data_loader)[source]

A thin wrapper around a DataLoader that automatically yields tuples of torch.Tensor (that live on cpu or on cuda). A DataGenerator will iterate endlessly.

Like the underlying DataLoader, a DataGenerator’s __next__ method yields two values, which we refer to as the inputs and the targets.

Parameters:data_loader (DataLoader) – the wrapped data_loader.
batch_size

the wrapped data_loader’s batch_size

cuda(device=None)[source]

Put the inputs and targets (yielded by this DataGenerator) on the specified device.

Parameters:device (int or bool or dict) – if int or bool, sets the behavior for both inputs and targets. To set them individually, pass a dictionary with keys {‘inputs’, ‘targets’} instead. See torch.Tensor.cuda() for details.

Example

>>> g = DataGenerator.from_dataset(dataset, batch_size=100)
>>> g.cuda()
>>> g.next()
(Tensor containing:
 [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)],
 Tensor containing:
 [torch.cuda.LongTensor of size 100 (GPU 0)])
>>> g.cuda(False)
>>> g.next()
(Tensor containing:
 [torch.FloatTensor of size 100x1x28x28],
 Tensor containing:
 [torch.LongTensor of size 100])
>>> g.cuda(device={'inputs':0, 'targets':1})
>>> g.next()
(Tensor containing:
 [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)],
 Tensor containing:
 [torch.cuda.LongTensor of size 100 (GPU 1)])
dataset

the wrapped data_loader’s Dataset

classmethod from_dataset(dataset, split=None, **kwargs)[source]

Create a DataGenerator from an instantiated dataset.

Parameters:
  • dataset (Dataset) – the pytorch dataset.
  • split (float, optional) – If not None, it specifies the fraction of the dataset that should be placed into the first of two data_generators. The remaining data is used for the second data_generator. Both data_generators will be returned.
  • **kwargs – passed directly to the DataLoader) constructor.
num_samples

the len of the wrapped data_loader’s Dataset

reset()[source]

Start iterating through the data_loader from the begining.

ImageDataGenerator (deprecated)

class sconce.data_generators.ImageDataGenerator(*args, **kwargs)[source]

Warning

This class has been deprecated for SingleClassImageDataGenerator and will be removed soon. It will continue to work for now, but please update your code accordingly.

SingleClassImageDataGenerator

class sconce.data_generators.SingleClassImageDataGenerator(data_loader)[source]

Bases: sconce.data_generators.base.DataGenerator, sconce.data_generators.image_mixin.ImageMixin

An ImageDataGenerator class for use when each image belongs to exactly one class.

New in 0.10.0

batch_size

the wrapped data_loader’s batch_size

cuda(device=None)

Put the inputs and targets (yielded by this DataGenerator) on the specified device.

Parameters:device (int or bool or dict) – if int or bool, sets the behavior for both inputs and targets. To set them individually, pass a dictionary with keys {‘inputs’, ‘targets’} instead. See torch.Tensor.cuda() for details.

Example

>>> g = DataGenerator.from_dataset(dataset, batch_size=100)
>>> g.cuda()
>>> g.next()
(Tensor containing:
 [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)],
 Tensor containing:
 [torch.cuda.LongTensor of size 100 (GPU 0)])
>>> g.cuda(False)
>>> g.next()
(Tensor containing:
 [torch.FloatTensor of size 100x1x28x28],
 Tensor containing:
 [torch.LongTensor of size 100])
>>> g.cuda(device={'inputs':0, 'targets':1})
>>> g.next()
(Tensor containing:
 [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)],
 Tensor containing:
 [torch.cuda.LongTensor of size 100 (GPU 1)])
dataset

the wrapped data_loader’s Dataset

classmethod from_dataset(dataset, split=None, **kwargs)

Create a DataGenerator from an instantiated dataset.

Parameters:
  • dataset (Dataset) – the pytorch dataset.
  • split (float, optional) – If not None, it specifies the fraction of the dataset that should be placed into the first of two data_generators. The remaining data is used for the second data_generator. Both data_generators will be returned.
  • **kwargs – passed directly to the DataLoader) constructor.
classmethod from_image_folder(root, loader_kwargs=None, **dataset_kwargs)[source]

Create a DataGenerator from a folder of images. See torchvision.datasets.ImageFolder.

Parameters:
  • root (path) – the root directory path.
  • loader_kwargs (dict) – keyword args provided to the DataLoader constructor.
  • **dataset_kwargs – keyword args provided to the torchvision.datasets.ImageFolder constructor.
classmethod from_torchvision(batch_size=500, data_location=None, dataset_class=<class 'torchvision.datasets.mnist.MNIST'>, fraction=1.0, num_workers=0, pin_memory=True, shuffle=True, train=True, transform=ToTensor())[source]

Create a DataGenerator from a torchvision dataset class.

Parameters:
  • batch_size (int) – how large the yielded inputs and targets should be. See DataLoader for details.
  • data_location (path) – where downloaded dataset should be stored. If None a system dependent temporary location will be used.
  • dataset_class (class) – a torchvision dataset class that supports constructor arguments {‘root’, ‘train’, ‘download’, ‘transform’}. For example, MNIST, FashionMnist, CIFAR10, or CIFAR100.
  • fraction (float) – (0.0 - 1.0] how much of the original dataset’s data to use.
  • num_workers (int) – how many subprocesses to use for data loading. See DataLoader for details.
  • pin_memory (bool) – if True, the data loader will copy tensors into CUDA pinned memory before returning them. See DataLoader for details.
  • shuffle (bool) – set to True to have the data reshuffled at every epoch. See DataLoader for details.
  • train (bool) – if True, creates dataset from training set, otherwise creates from test set.
  • transform (callable) – a function/transform that takes in an PIL image and returns a transformed version.
get_class_df()

Return a pandas dataframe that contains the classes in the dataset.

get_image_size_df()

Return a pandas dataframe that contains the image sizes in the dataset (before transforms).

num_channels

The number of image channels, based on looking at the first image in the dataset.

num_samples

the len of the wrapped data_loader’s Dataset

plot_class_summary(**kwargs)

Generate a barchart showing how many images of each class there are.

plot_image_size_summary()

Generate a scatter plot showing the sizes of the images in the dataset.

plot_transforms(index, num_samples=5, num_cols=5, figure_width=15, image_height=3, return_fig=False)

Plot the same image from this DataGenerator multiple times to see how the transforms affect them.

Parameters:
  • index (int) – the index of the image to plot.
  • num_samples (int, optional) – the number of times to plot the image (1 original, n - 1 transformed variations).
  • num_cols (int) – the number of columns in the plot grid.
  • num_cols – the number of columns to plot, one image per column.
  • figure_width (float) – the size, in matplotlib-inches, for the width of the whole figure.
  • image_height (float) – the size, in matplotlib-inches, for the height of a single image.
  • return_fig (bool) – return the generated matplotlib figure or not.

New in 0.10.3

reset()

Start iterating through the data_loader from the begining.

MultiClassImageDataGenerator

class sconce.data_generators.MultiClassImageDataGenerator(data_loader)[source]

Bases: sconce.data_generators.base.DataGenerator, sconce.data_generators.image_mixin.ImageMixin

An ImageDataGenerator class for use when each image may belong to more than one class.

New in 0.10.0

batch_size

the wrapped data_loader’s batch_size

cuda(device=None)

Put the inputs and targets (yielded by this DataGenerator) on the specified device.

Parameters:device (int or bool or dict) – if int or bool, sets the behavior for both inputs and targets. To set them individually, pass a dictionary with keys {‘inputs’, ‘targets’} instead. See torch.Tensor.cuda() for details.

Example

>>> g = DataGenerator.from_dataset(dataset, batch_size=100)
>>> g.cuda()
>>> g.next()
(Tensor containing:
 [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)],
 Tensor containing:
 [torch.cuda.LongTensor of size 100 (GPU 0)])
>>> g.cuda(False)
>>> g.next()
(Tensor containing:
 [torch.FloatTensor of size 100x1x28x28],
 Tensor containing:
 [torch.LongTensor of size 100])
>>> g.cuda(device={'inputs':0, 'targets':1})
>>> g.next()
(Tensor containing:
 [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)],
 Tensor containing:
 [torch.cuda.LongTensor of size 100 (GPU 1)])
dataset

the wrapped data_loader’s Dataset

classmethod from_dataset(dataset, split=None, **kwargs)

Create a DataGenerator from an instantiated dataset.

Parameters:
  • dataset (Dataset) – the pytorch dataset.
  • split (float, optional) – If not None, it specifies the fraction of the dataset that should be placed into the first of two data_generators. The remaining data is used for the second data_generator. Both data_generators will be returned.
  • **kwargs – passed directly to the DataLoader) constructor.
get_class_df()

Return a pandas dataframe that contains the classes in the dataset.

get_image_size_df()

Return a pandas dataframe that contains the image sizes in the dataset (before transforms).

num_channels

The number of image channels, based on looking at the first image in the dataset.

num_samples

the len of the wrapped data_loader’s Dataset

plot_class_summary(**kwargs)

Generate a barchart showing how many images of each class there are.

plot_image_size_summary()

Generate a scatter plot showing the sizes of the images in the dataset.

plot_transforms(index, num_samples=5, num_cols=5, figure_width=15, image_height=3, return_fig=False)

Plot the same image from this DataGenerator multiple times to see how the transforms affect them.

Parameters:
  • index (int) – the index of the image to plot.
  • num_samples (int, optional) – the number of times to plot the image (1 original, n - 1 transformed variations).
  • num_cols (int) – the number of columns in the plot grid.
  • num_cols – the number of columns to plot, one image per column.
  • figure_width (float) – the size, in matplotlib-inches, for the width of the whole figure.
  • image_height (float) – the size, in matplotlib-inches, for the height of a single image.
  • return_fig (bool) – return the generated matplotlib figure or not.

New in 0.10.3

reset()

Start iterating through the data_loader from the begining.