reservoir-0.1.0.1: Reservoir sampling algorithms

Safe HaskellNone
LanguageHaskell2010

System.Random.Reservoir

Contents

Description

  • Module : System.Random.Reservoir
  • Description : Reservoir Samplers
  • Copyright : (c) Mark Hay, 2018
  • License : BSD3
  • Maintainer : mah6@williams.edu
  • Stability : experimental
  • Reservoir algorithms suitable for sampling from data streams of large or unknown size.

Synopsis

Unweighted Sampling

data Res g a Source #

Wrapper for the state kept by algorithm R. Keeps the sample as an IntMap, the number of elements seen, and the random seed.

Constructors

Res 

Fields

  • resSample :: !(IntMap a)

    The current sample. Should usually not be modified by hand.

  • numSeen :: !Int

    The number of elements seen. Should almost never be modified by hand.

  • resSeed :: !g

    The random seed for the sample

Instances

Functor (Res g) Source # 

Methods

fmap :: (a -> b) -> Res g a -> Res g b #

(<$) :: a -> Res g b -> Res g a #

(Show g, Show a) => Show (Res g a) Source # 

Methods

showsPrec :: Int -> Res g a -> ShowS #

show :: Res g a -> String #

showList :: [Res g a] -> ShowS #

getSample :: Res g a -> [a] Source #

Extracts the sample as a list.

emptyRes :: RandomGen g => g -> Res g a Source #

Creates a Res with nothing in the sample and with the counter at zero.

emptyRes' :: Word64 -> Res PureMT a Source #

A version of emptyRes specialized to System.Random.Mersenne.Pure64.

reservoirSample Source #

Arguments

:: RandomGen g 
=> Int

The maximum number of elements to be in the sample.

-> a

The next element to be considered

-> Res g a

The current wrapped sample

-> Res g a 

Jeffrey Vitter's "Algorithm R". Given the elements in the existing sample and a new element, every element has a equal probability of being selected. Intended to be partially applied on the sample size and used with fold-like functions to sample large streams of data.