mnist-idx-conduit-0.4.0.0: conduit utilities for MNIST IDX files
Safe HaskellNone
LanguageHaskell2010

Data.IDX.Conduit

Description

Streaming (de)serialization and encode-decode functions for the IDX format used in the MNIST handwritten digit recognition dataset [1].

Both sparse and dense decoders are provided. In either case, the range of the data is the same as the raw data (one unsigned byte per pixel).

Links

1) http://yann.lecun.com/exdb/mnist/

Synopsis

Source

Labels

sourceIdxLabels Source #

Arguments

:: MonadResource m 
=> (ByteString -> Either e o)

parser for the labels, where the bytestring buffer contains exactly one unsigned byte

-> FilePath

filepath of uncompressed IDX labels file

-> Maybe Int

optional maximum number of entries to retrieve

-> ConduitT () (Either e o) m r 

Outputs the labels corresponding to the data

mnistLabels :: ByteString -> Either String Int Source #

Parser for the labels, can be plugged in as an argument to sourceIdxLabels

Data

Dense

sourceIdx Source #

Arguments

:: MonadResource m 
=> FilePath

filepath of uncompressed IDX data file

-> Maybe Int

optional maximum number of entries to retrieve

-> ConduitT () (Vector Word8) m () 

Outputs dense data buffers in the 0-255 range

In the case of MNIST dataset, 0 corresponds to the background of the image.

Sparse

sourceIdxSparse Source #

Arguments

:: MonadResource m 
=> FilePath

filepath of uncompressed IDX data file

-> Maybe Int

optional maximum number of entries to retrieve

-> ConduitT () (Sparse Word8) m () 

Outputs sparse data buffers (i.e without zero components)

This incurs at least one additional data copy of each vector, but the resulting vectors take up less space.

Sink

Data

Dense

sinkIdx Source #

Arguments

:: (MonadResource m, Foldable t) 
=> FilePath

file to write

-> Int

number of data items that will be written

-> t Word32

data dimension sizes

-> ConduitT (Vector Word8) Void m () 

Warning: this produces an incomplete header for some reason, causing the decoder to chop the data items at the wrong length. Do not use until https://github.com/ocramz/mnist-idx-conduit/issues/1 is resolved.

Write a dataset to disk

Contents are written as unsigned bytes, so make sure 8 bit data comes in without losses

Sparse

sinkIdxSparse Source #

Arguments

:: (Foldable t, MonadResource m) 
=> FilePath

file to write

-> Int

number of data items that will be written

-> t Word32

data dimension sizes

-> ConduitT (Sparse Word8) Void m () 

Warning: this produces an incomplete header for some reason, causing the decoder to chop the data items at the wrong length. Do not use until https://github.com/ocramz/mnist-idx-conduit/issues/1 is resolved.

Write a sparse dataset to disk

Contents are written as unsigned bytes, so make sure 8 bit data comes in without losses

Types

data Sparse a Source #

Sparse buffer (containing only nonzero entries)

Instances

Instances details
(Unbox a, Eq a) => Eq (Sparse a) Source # 
Instance details

Defined in Data.IDX.Conduit

Methods

(==) :: Sparse a -> Sparse a -> Bool #

(/=) :: Sparse a -> Sparse a -> Bool #

(Show a, Unbox a) => Show (Sparse a) Source # 
Instance details

Defined in Data.IDX.Conduit

Methods

showsPrec :: Int -> Sparse a -> ShowS #

show :: Sparse a -> String #

showList :: [Sparse a] -> ShowS #

sBufSize :: Sparse a -> Int Source #

total number of entries in the _dense_ buffer, i.e. including the zeros

sNzComponents :: Sparse a -> Vector (Int, a) Source #

nonzero components, together with the linear index into their dense counterpart

Debug

readHeader Source #

Arguments

:: FilePath

path of IDX file

-> IO (IDXMagic, Int32, Vector Int32)

"magic number", number of data items, list of dimension sizes of each data item

Decode the header of an IDX data file and print out its contents