Data Feeds¶
DataFeed¶
-
class
sconce.data_feeds.
DataFeed
(data_loader)[source]¶ A thin wrapper around a
DataLoader
that automatically yields tuples oftorch.Tensor
(that live on cpu or on cuda). A DataFeed will iterate endlessly.Like the underlying
DataLoader
, a DataFeed’s__next__
method yields two values, which we refer to as the inputs and the targets.Parameters: data_loader ( DataLoader
) – the wrapped data_loader.-
batch_size
¶ the wrapped data_loader’s batch_size
-
cuda
(device=None)[source]¶ Put the inputs and targets (yielded by this DataFeed) on the specified device.
Parameters: device (int or bool or dict) – if int or bool, sets the behavior for both inputs and targets. To set them individually, pass a dictionary with keys {‘inputs’, ‘targets’} instead. See torch.Tensor.cuda()
for details.Example
>>> g = DataFeed.from_dataset(dataset, batch_size=100) >>> g.cuda() >>> g.next() (Tensor containing: [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)], Tensor containing: [torch.cuda.LongTensor of size 100 (GPU 0)]) >>> g.cuda(False) >>> g.next() (Tensor containing: [torch.FloatTensor of size 100x1x28x28], Tensor containing: [torch.LongTensor of size 100]) >>> g.cuda(device={'inputs':0, 'targets':1}) >>> g.next() (Tensor containing: [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)], Tensor containing: [torch.cuda.LongTensor of size 100 (GPU 1)])
-
classmethod
from_dataset
(dataset, split=None, **kwargs)[source]¶ Create a DataFeed from an instantiated dataset.
Parameters: - dataset (
Dataset
) – the pytorch dataset. - split (float, optional) – If not
None
, it specifies the fraction of the dataset that should be placed into the first of two data_feeds. The remaining data is used for the second data_feed. Both data_feeds will be returned. - **kwargs – passed directly to the
DataLoader
) constructor.
- dataset (
-
split
(split_factor, validation_transform=None, **kwargs)[source]¶ Create a training and validation DataFeed from this one.
Parameters: - split_factor (float) – [0.0, 1.0] the fraction of the dataset that should be put into the new training feed.
- validation_transform (callable) – override the existing validation transform with this.
- **kwargs – passed directly to the
DataLoader
) constructor.
Returns: training_feed, validation_feed
-
ImageFeed¶
SingleClassImageFeed¶
-
class
sconce.data_feeds.
SingleClassImageFeed
(data_loader)[source]¶ Bases:
sconce.data_feeds.image.ImageFeed
An ImageFeed class for use when each image belongs to exactly one class.
-
batch_size
¶ the wrapped data_loader’s batch_size
-
cuda
(device=None)¶ Put the inputs and targets (yielded by this DataFeed) on the specified device.
Parameters: device (int or bool or dict) – if int or bool, sets the behavior for both inputs and targets. To set them individually, pass a dictionary with keys {‘inputs’, ‘targets’} instead. See torch.Tensor.cuda()
for details.Example
>>> g = DataFeed.from_dataset(dataset, batch_size=100) >>> g.cuda() >>> g.next() (Tensor containing: [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)], Tensor containing: [torch.cuda.LongTensor of size 100 (GPU 0)]) >>> g.cuda(False) >>> g.next() (Tensor containing: [torch.FloatTensor of size 100x1x28x28], Tensor containing: [torch.LongTensor of size 100]) >>> g.cuda(device={'inputs':0, 'targets':1}) >>> g.next() (Tensor containing: [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)], Tensor containing: [torch.cuda.LongTensor of size 100 (GPU 1)])
-
classmethod
from_dataset
(dataset, split=None, **kwargs)¶ Create a DataFeed from an instantiated dataset.
Parameters: - dataset (
Dataset
) – the pytorch dataset. - split (float, optional) – If not
None
, it specifies the fraction of the dataset that should be placed into the first of two data_feeds. The remaining data is used for the second data_feed. Both data_feeds will be returned. - **kwargs – passed directly to the
DataLoader
) constructor.
- dataset (
-
classmethod
from_image_folder
(root, loader_kwargs=None, **dataset_kwargs)[source]¶ Create a Datafeed from a folder of images. See
torchvision.datasets.ImageFolder
.Parameters: - root (path) – the root directory path.
- loader_kwargs (dict) – keyword args provided to the DataLoader constructor.
- **dataset_kwargs – keyword args provided to the
torchvision.datasets.ImageFolder
constructor.
-
classmethod
from_torchvision
(batch_size=500, data_location=None, dataset_class=<class 'torchvision.datasets.mnist.MNIST'>, fraction=1.0, num_workers=0, pin_memory=True, shuffle=True, train=True, transform=ToTensor())[source]¶ Create a Datafeed from a torchvision dataset class.
Parameters: - batch_size (int) – how large the yielded inputs and targets
should be. See
DataLoader
for details. - data_location (path) – where downloaded dataset should be stored. If
None
a system dependent temporary location will be used. - dataset_class (class) – a torchvision dataset class that supports constructor arguments {‘root’, ‘train’, ‘download’, ‘transform’}. For example, MNIST, FashionMnist, CIFAR10, or CIFAR100.
- fraction (float) – (0.0 - 1.0] how much of the original dataset’s data to use.
- num_workers (int) – how many subprocesses to use for data loading.
See
DataLoader
for details. - pin_memory (bool) – if
True
, the data loader will copy tensors into CUDA pinned memory before returning them. SeeDataLoader
for details. - shuffle (bool) – set to
True
to have the data reshuffled at every epoch. SeeDataLoader
for details. - train (bool) – if
True
, creates dataset from training set, otherwise creates from test set. - transform (callable) – a function/transform that takes in an PIL image and returns a transformed version.
- batch_size (int) – how large the yielded inputs and targets
should be. See
-
get_class_df
()¶ Return a pandas dataframe that contains the classes in the dataset.
-
get_image_size_df
()¶ Return a pandas dataframe that contains the image sizes in the dataset (before transforms).
-
num_channels
¶ The number of image channels, based on looking at the first image in the dataset.
-
plot_class_summary
(**kwargs)¶ Generate a barchart showing how many images of each class there are.
-
plot_image_size_summary
()¶ Generate a scatter plot showing the sizes of the images in the dataset.
-
plot_transforms
(index, num_samples=5, num_cols=5, figure_width=15, image_height=3, return_fig=False)¶ Plot the same image from this Datafeed multiple times to see how the transforms affect them.
Parameters: - index (int) – the index of the image to plot.
- num_samples (int, optional) – the number of times to plot the image (1 original, n - 1 transformed variations).
- num_cols (int) – the number of columns in the plot grid.
- num_cols – the number of columns to plot, one image per column.
- figure_width (float) – the size, in matplotlib-inches, for the width of the whole figure.
- image_height (float) – the size, in matplotlib-inches, for the height of a single image.
- return_fig (bool) – return the generated matplotlib figure or not.
-
reset
()¶ Start iterating through the data_loader from the begining.
-
split
(split_factor, validation_transform=None, **kwargs)¶ Create a training and validation DataFeed from this one.
Parameters: - split_factor (float) – [0.0, 1.0] the fraction of the dataset that should be put into the new training feed.
- validation_transform (callable) – override the existing validation transform with this.
- **kwargs – passed directly to the
DataLoader
) constructor.
Returns: training_feed, validation_feed
-
MultiClassImageFeed¶
-
class
sconce.data_feeds.
MultiClassImageFeed
(data_loader)[source]¶ Bases:
sconce.data_feeds.image.ImageFeed
An ImageFeed class for use when each image may belong to more than one class.
-
batch_size
¶ the wrapped data_loader’s batch_size
-
cuda
(device=None)¶ Put the inputs and targets (yielded by this DataFeed) on the specified device.
Parameters: device (int or bool or dict) – if int or bool, sets the behavior for both inputs and targets. To set them individually, pass a dictionary with keys {‘inputs’, ‘targets’} instead. See torch.Tensor.cuda()
for details.Example
>>> g = DataFeed.from_dataset(dataset, batch_size=100) >>> g.cuda() >>> g.next() (Tensor containing: [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)], Tensor containing: [torch.cuda.LongTensor of size 100 (GPU 0)]) >>> g.cuda(False) >>> g.next() (Tensor containing: [torch.FloatTensor of size 100x1x28x28], Tensor containing: [torch.LongTensor of size 100]) >>> g.cuda(device={'inputs':0, 'targets':1}) >>> g.next() (Tensor containing: [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)], Tensor containing: [torch.cuda.LongTensor of size 100 (GPU 1)])
-
classmethod
from_dataset
(dataset, split=None, **kwargs)¶ Create a DataFeed from an instantiated dataset.
Parameters: - dataset (
Dataset
) – the pytorch dataset. - split (float, optional) – If not
None
, it specifies the fraction of the dataset that should be placed into the first of two data_feeds. The remaining data is used for the second data_feed. Both data_feeds will be returned. - **kwargs – passed directly to the
DataLoader
) constructor.
- dataset (
-
get_class_df
()¶ Return a pandas dataframe that contains the classes in the dataset.
-
get_image_size_df
()¶ Return a pandas dataframe that contains the image sizes in the dataset (before transforms).
-
num_channels
¶ The number of image channels, based on looking at the first image in the dataset.
-
plot_class_summary
(**kwargs)¶ Generate a barchart showing how many images of each class there are.
-
plot_image_size_summary
()¶ Generate a scatter plot showing the sizes of the images in the dataset.
-
plot_transforms
(index, num_samples=5, num_cols=5, figure_width=15, image_height=3, return_fig=False)¶ Plot the same image from this Datafeed multiple times to see how the transforms affect them.
Parameters: - index (int) – the index of the image to plot.
- num_samples (int, optional) – the number of times to plot the image (1 original, n - 1 transformed variations).
- num_cols (int) – the number of columns in the plot grid.
- num_cols – the number of columns to plot, one image per column.
- figure_width (float) – the size, in matplotlib-inches, for the width of the whole figure.
- image_height (float) – the size, in matplotlib-inches, for the height of a single image.
- return_fig (bool) – return the generated matplotlib figure or not.
-
reset
()¶ Start iterating through the data_loader from the begining.
-
split
(split_factor, validation_transform=None, **kwargs)¶ Create a training and validation DataFeed from this one.
Parameters: - split_factor (float) – [0.0, 1.0] the fraction of the dataset that should be put into the new training feed.
- validation_transform (callable) – override the existing validation transform with this.
- **kwargs – passed directly to the
DataLoader
) constructor.
Returns: training_feed, validation_feed
-