datasets-0.4.0: Classical data sets for statistics and machine learning

Stabilityexperimental
Portabilitynon-portable
Safe HaskellNone
LanguageHaskell2010

Numeric.Dataloader

Description

A Dataloader is an extension of a Dataset and is primarily intended for compute-intensive, batch loading interfaces. When used with ImageFolder representations of Datasets, it shuffles the order of files to be loaded and leverages the async library when possible.

Concurrent loading primarily takes place in batchStream. stream exists primarily to provide a unified API with training that is not batch-oriented.

Synopsis

Documentation

data Dataloader a b Source #

Options for a data loading functions.

Constructors

Dataloader 

Fields

  • batchSize :: Int

    Batch size used with batchStream.

  • shuffle :: Maybe (Vector Int)

    Optional shuffle order (forces the dataset to be loaded in memory if it wasn't already).

  • dataset :: Dataset a

    Dataset associated with the dataloader.

  • transform :: a -> b

    Transformation associated with the dataloader which will be run in parallel. If using an ImageFolder, this is where you would transform image filepaths to an image (or other compute-optimized form). Additionally, this is where you should perform any static normalization.

uniformIxline :: Dataset a -> GenIO -> IO (Vector Int) Source #

Generate a uniformly random index line from a dataset and a generator.

stream :: (MonadThrow io, MonadIO io) => Dataloader a b -> Stream (Of b) io () Source #

Stream a dataset in-memory, applying a transformation function.

batchStream :: (MonadThrow io, MonadIO io, NFData b) => Dataloader a b -> Stream (Of [b]) io () Source #

Stream batches of a dataset, concurrently processing each element

NOTE: Run with -threaded -rtsopts to concurrently load data in-memory.