reservoir-0.2.0.0: Unweighted reservoir sampling

Safe HaskellSafe
LanguageHaskell2010

System.Random.Reservoir

Description

Unweighted reservoir algorithm suitable for sampling from data streams of large or unknown size.

Synopsis

Documentation

data Res g a Source #

Wrapper for the state kept by algorithm R. Keeps the sample as an IntMap, the number of elements seen, and the random seed.

Constructors

Res 

Fields

  • resSample :: !(IntMap a)

    The current sample. Should usually not be modified by hand.

  • numSeen :: !Int

    The number of elements seen. Should almost never be modified by hand.

  • resSeed :: !g

    The random seed for the sample

Instances

Functor (Res g) Source # 

Methods

fmap :: (a -> b) -> Res g a -> Res g b #

(<$) :: a -> Res g b -> Res g a #

(Show g, Show a) => Show (Res g a) Source # 

Methods

showsPrec :: Int -> Res g a -> ShowS #

show :: Res g a -> String #

showList :: [Res g a] -> ShowS #

getSample :: Res g a -> [a] Source #

Extracts the sample as a list.

emptyRes :: RandomGen g => g -> Res g a Source #

Creates a Res with nothing in the sample and with the counter at zero.

reservoirSample Source #

Arguments

:: RandomGen g 
=> Int

The maximum number of elements to be in the sample.

-> a

The next element to be considered

-> Res g a

The current wrapped sample

-> Res g a 

Jeffrey Vitter's "Algorithm R". Given the elements in the existing sample and a new element, every element has a equal probability of being selected. Intended to be partially applied on the sample size and used with fold-like functions to sample large streams of data.