Stability	experimental
Portability	non-portable
Safe Haskell	None
Language	Haskell2010

Numeric.Dataloader

Description

A Dataloader is an extension of a Dataset and is primarily intended for compute-intensive, batch loading interfaces. When used with ImageFolder representations of Datasets, it shuffles the order of files to be loaded and leverages the async library when possible.

Concurrent loading primarily takes place in batchStream. stream exists primarily to provide a unified API with training that is not batch-oriented.

Synopsis

data Dataloader a b = Dataloader {
- batchSize :: Int
- shuffle :: Maybe (Vector Int)
- dataset :: Dataset a
- transform :: a -> b
}
uniformIxline :: Dataset a -> GenIO -> IO (Vector Int)
stream :: (MonadThrow io, MonadIO io) => Dataloader a b -> Stream (Of b) io ()
batchStream :: (MonadThrow io, MonadIO io, NFData b) => Dataloader a b -> Stream (Of [b]) io ()

Documentation

data Dataloader a b Source #

Options for a data loading functions.

Constructors

Dataloader

Fields

batchSize :: Int
Batch size used with batchStream.
shuffle :: Maybe (Vector Int)
Optional shuffle order (forces the dataset to be loaded in memory if it wasn't already).
dataset :: Dataset a
Dataset associated with the dataloader.
transform :: a -> b
Transformation associated with the dataloader which will be run in parallel. If using an ImageFolder, this is where you would transform image filepaths to an image (or other compute-optimized form). Additionally, this is where you should perform any static normalization.

uniformIxline :: Dataset a -> GenIO -> IO (Vector Int) Source #

Generate a uniformly random index line from a dataset and a generator.

stream :: (MonadThrow io, MonadIO io) => Dataloader a b -> Stream (Of b) io () Source #

Stream a dataset in-memory, applying a transformation function.

batchStream :: (MonadThrow io, MonadIO io, NFData b) => Dataloader a b -> Stream (Of [b]) io () Source #

Stream batches of a dataset, concurrently processing each element

NOTE: Run with -threaded -rtsopts to concurrently load data in-memory.