Safe Haskell | None |
---|---|
Language | Haskell98 |
Flexible modeling and sampling of random variables.
The central abstraction in this library is the concept of a random variable. It is not fully formalized in the standard measure-theoretic language, but rather is informally defined as a "thing you can get random values out of". Different random variables may have different types of values they can return or the same types but different probabilities for each value they can return. The random values you get out of them are traditionally called "random variates".
Most imperative-language random number libraries are all about obtaining and manipulating random variates. This one is about defining, manipulating and sampling random variables. Computationally, the distinction is small and mostly just a matter of perspective, but from a program design perspective it provides both a powerfully composable abstraction and a very useful separation of concerns.
Abstract random variables as implemented by RVar
are composable. They can
be defined in a monadic / "imperative" style that amounts to manipulating
variates, but with strict type-level isolation. Concrete random variables
are also provided, but they do not compose as generically. The Distribution
type class allows concrete random variables to "forget" their concreteness
so that they can be composed. For examples of both, see the documentation
for RVar
and Distribution
, as well as the code for any of the concrete
distributions such as Uniform
, Gamma
, etc.
Both abstract and concrete random variables can be sampled (despite the types GHCi may list for the functions) by the functions in Data.Random.Sample.
Random variable sampling is done with regard to a generic basis of primitive
random variables defined in Data.Random.Internal.Primitives. This basis
is very low-level and the actual set of primitives is still fairly experimental,
which is why it is in the "Internal" sub-heirarchy. User-defined variables
should use the existing high-level variables such as Uniform
and Normal
rather than these basis variables. Data.Random.Source defines classes for
entropy sources that provide implementations of these primitive variables.
Several implementations are available in the Data.Random.Source.* modules.
- type RVar = RVarT Identity
- data RVarT m a :: (* -> *) -> * -> *
- runRVar :: RandomSource m s => RVar a -> s -> m a
- runRVarT :: (Lift n m, RandomSource m s) => RVarT n a -> s -> m a
- runRVarTWith :: RandomSource m s => (forall t. n t -> m t) -> RVarT n a -> s -> m a
- class Distribution d t where
- class Distribution d t => CDF d t where
- class Distribution d t => PDF d t where
- class Sampleable d m t where
- sample :: (Sampleable d m t, MonadRandom m) => d t -> m t
- sampleState :: (Sampleable d (State s) t, MonadRandom (State s)) => d t -> s -> (t, s)
- sampleStateT :: (Sampleable d (StateT s m) t, MonadRandom (StateT s m)) => d t -> s -> m (t, s)
- data Uniform t = Uniform !t !t
- uniform :: Distribution Uniform a => a -> a -> RVar a
- uniformT :: Distribution Uniform a => a -> a -> RVarT m a
- data StdUniform t = StdUniform
- stdUniform :: Distribution StdUniform a => RVar a
- stdUniformT :: Distribution StdUniform a => RVarT m a
- data Normal a
- normal :: Distribution Normal a => a -> a -> RVar a
- stdNormal :: Distribution Normal a => RVar a
- normalT :: Distribution Normal a => a -> a -> RVarT m a
- stdNormalT :: Distribution Normal a => RVarT m a
- data Gamma a = Gamma a a
- gamma :: Distribution Gamma a => a -> a -> RVar a
- gammaT :: Distribution Gamma a => a -> a -> RVarT m a
- class Monad m => MonadRandom m
- class Monad m => RandomSource m s
- data StdRandom :: * = StdRandom
- randomElement :: [a] -> RVar a
- shuffle :: [a] -> RVar [a]
- shuffleN :: Int -> [a] -> RVar [a]
- shuffleNofM :: Int -> Int -> [a] -> RVar [a]
Random variables
Abstract (RVar
)
An opaque type modeling a "random variable" - a value
which depends on the outcome of some random event. RVar
s
can be conveniently defined by an imperative-looking style:
normalPair = do u <- stdUniform t <- stdUniform let r = sqrt (-2 * log u) theta = (2 * pi) * t x = r * cos theta y = r * sin theta return (x,y)
OR by a more applicative style:
logNormal = exp <$> stdNormal
Once defined (in any style), there are several ways to sample RVar
s:
- In a monad, using a
RandomSource
:
runRVar (uniform 1 100) DevRandom :: IO Int
- In a monad, using a
MonadRandom
instance:
sampleRVar (uniform 1 100) :: State PureMT Int
- As a pure function transforming a functional RNG:
sampleState (uniform 1 100) :: StdGen -> (Int, StdGen)
(where sampleState = runState . sampleRVar
)
data RVarT m a :: (* -> *) -> * -> * #
A random variable with access to operations in an underlying monad. Useful examples include any form of state for implementing random processes with hysteresis, or writer monads for implementing tracing of complicated algorithms.
For example, a simple random walk can be implemented as an RVarT
IO
value:
rwalkIO :: IO (RVarT IO Double) rwalkIO d = do lastVal <- newIORef 0 let x = do prev <- lift (readIORef lastVal) change <- rvarT StdNormal let new = prev + change lift (writeIORef lastVal new) return new return x
To run the random walk it must first be initialized, after which it can be sampled as usual:
do rw <- rwalkIO x <- sampleRVarT rw y <- sampleRVarT rw ...
The same random-walk process as above can be implemented using MTL types
as follows (using import Control.Monad.Trans as MTL
):
rwalkState :: RVarT (State Double) Double rwalkState = do prev <- MTL.lift get change <- rvarT StdNormal let new = prev + change MTL.lift (put new) return new
Invocation is straightforward (although a bit noisy) if you're used to MTL:
rwalk :: Int -> Double -> StdGen -> ([Double], StdGen) rwalk count start gen = flip evalState start . flip runStateT gen . sampleRVarTWith MTL.lift $ replicateM count rwalkState
runRVar :: RandomSource m s => RVar a -> s -> m a #
"Run" an RVar
- samples the random variable from the provided
source of entropy.
runRVarT :: (Lift n m, RandomSource m s) => RVarT n a -> s -> m a Source #
Like runRVarTWith
, but using an implicit lifting (provided by the
Lift
class)
runRVarTWith :: RandomSource m s => (forall t. n t -> m t) -> RVarT n a -> s -> m a #
"Runs" an RVarT
, sampling the random variable it defines.
The first argument lifts the base monad into the sampling monad. This operation must obey the "monad transformer" laws:
lift . return = return lift (x >>= f) = (lift x) >>= (lift . f)
One example of a useful non-standard lifting would be one that takes
State s
to another monad with a different state representation (such as
IO
with the state mapped to an IORef
):
embedState :: (Monad m) => m s -> (s -> m ()) -> State s a -> m a embedState get put = \m -> do s <- get (res,s) <- return (runState m s) put s return res
The ability to lift is very important - without it, every RVar
would have
to either be given access to the full capability of the monad in which it
will eventually be sampled (which, incidentally, would also have to be
monomorphic so you couldn't sample one RVar
in more than one monad)
or functions manipulating RVar
s would have to use higher-ranked
types to enforce the same kind of isolation and polymorphism.
Concrete (Distribution
)
class Distribution d t where Source #
A Distribution
is a data representation of a random variable's probability
structure. For example, in Data.Random.Distribution.Normal, the Normal
distribution is defined as:
data Normal a = StdNormal | Normal a a
Where the two parameters of the Normal
data constructor are the mean and
standard deviation of the random variable, respectively. To make use of
the Normal
type, one can convert it to an rvar
and manipulate it or
sample it directly:
x <- sample (rvar (Normal 10 2)) x <- sample (Normal 10 2)
A Distribution
is typically more transparent than an RVar
but less composable (precisely because of that transparency). There are
several practical uses for types implementing Distribution
:
- Typically, a
Distribution
will expose several parameters of a standard mathematical model of a probability distribution, such as mean and std deviation for the normal distribution. Thus, they can be manipulated analytically using mathematical insights about the distributions they represent. For example, a collection of bernoulli variables could be simplified into a (hopefully) smaller collection of binomial variables. - Because they are generally just containers for parameters, they can be easily serialized to persistent storage or read from user-supplied configurations (eg, initialization data for a simulation).
- If a type additionally implements the
CDF
subclass, which extendsDistribution
with a cumulative density function, an arbitrary random variablex
can be tested against the distribution by testingfmap (cdf dist) x
for uniformity.
On the other hand, most Distribution
s will not be closed under all the
same operations as RVar
(which, being a monad, has a fully turing-complete
internal computational model). The sum of two uniformly-distributed
variables, for example, is not uniformly distributed. To support general
composition, the Distribution
class defines a function rvar
to
construct the more-abstract and more-composable RVar
representation
of a random variable.
class Distribution d t => CDF d t where Source #
cdf :: d t -> t -> Double Source #
Return the cumulative distribution function of this distribution.
That is, a function taking x :: t
to the probability that the next
sample will return a value less than or equal to x, according to some
order or partial order (not necessarily an obvious one).
In the case where t
is an instance of Ord, cdf
should correspond
to the CDF with respect to that order.
In other cases, cdf
is only required to satisfy the following law:
fmap (cdf d) (rvar d)
must be uniformly distributed over (0,1). Inclusion of either endpoint is optional,
though the preferred range is (0,1].
Note that this definition requires that cdf
for a product type
should _not_ be a joint CDF as commonly defined, as that definition
violates both conditions.
Instead, it should be a univariate CDF over the product type. That is,
it should represent the CDF with respect to the lexicographic order
of the product.
The present specification is probably only really useful for testing conformance of a variable to its target distribution, and I am open to suggestions for more-useful specifications (especially with regard to the interaction with product types).
class Distribution d t => PDF d t where Source #
Sampling random variables
class Sampleable d m t where Source #
A typeclass allowing Distribution
s and RVar
s to be sampled. Both may
also be sampled via runRVar
or runRVarT
, but I find it psychologically
pleasing to be able to sample both using this function, as they are two
separate abstractions for one base concept: a random variable.
sampleFrom :: RandomSource m s => s -> d t -> m t Source #
Directly sample from a distribution or random variable, using the given source of entropy.
Distribution d t => Sampleable d m t Source # | |
Lift m n => Sampleable (RVarT m) n t Source # | |
sample :: (Sampleable d m t, MonadRandom m) => d t -> m t Source #
Sample a random variable using the default source of entropy for the monad in which the sampling occurs.
sampleState :: (Sampleable d (State s) t, MonadRandom (State s)) => d t -> s -> (t, s) Source #
Sample a random variable in a "functional" style. Typical instantiations
of s
are System.Random.StdGen
or System.Random.Mersenne.Pure64.PureMT
.
sampleStateT :: (Sampleable d (StateT s m) t, MonadRandom (StateT s m)) => d t -> s -> m (t, s) Source #
Sample a random variable in a "semi-functional" style. Typical instantiations
of s
are System.Random.StdGen
or System.Random.Mersenne.Pure64.PureMT
.
A few very common distributions
A definition of a uniform distribution over the type t
. See also uniform
.
Uniform !t !t | A uniform distribution defined by a lower and upper range bound.
For |
data StdUniform t Source #
A name for the "standard" uniform distribution over the type t
,
if one exists. See also stdUniform
.
For Integral
and Enum
types that are also Bounded
, this is
the uniform distribution over the full range of the type.
For un-Bounded
Integral
types this is not defined.
For Fractional
types this is a random variable in the range [0,1)
(that is, 0 to 1 including 0 but not including 1).
stdUniform :: Distribution StdUniform a => RVar a Source #
Get a "standard" uniformly distributed variable.
For integral types, this means uniformly distributed over the full range
of the type (there is no support for Integer
). For fractional
types, this means uniformly distributed on the interval [0,1).
stdUniformT :: Distribution StdUniform a => RVarT m a Source #
Get a "standard" uniformly distributed process.
For integral types, this means uniformly distributed over the full range
of the type (there is no support for Integer
). For fractional
types, this means uniformly distributed on the interval [0,1).
A specification of a normal distribution over the type a
.
normal :: Distribution Normal a => a -> a -> RVar a Source #
normal m s
is a random variable with distribution
.Normal
m s
normalT :: Distribution Normal a => a -> a -> RVarT m a Source #
normalT m s
is a random process with distribution
.Normal
m s
stdNormalT :: Distribution Normal a => RVarT m a Source #
stdNormalT
is a normal process with distribution StdNormal
.
Gamma a a |
(Real a, Distribution Gamma a) => CDF Gamma a Source # | |
(Floating a, Ord a, Distribution Normal a, Distribution StdUniform a) => Distribution Gamma a Source # | |
Entropy Sources
class Monad m => MonadRandom m #
A typeclass for monads with a chosen source of entropy. For example,
RVar
is such a monad - the source from which it is (eventually) sampled
is the only source from which a random variable is permitted to draw, so
when directly requesting entropy for a random variable these functions
are used.
Minimum implementation is either the internal getRandomPrim
or all
other functions. Additionally, this class's interface is subject to
extension at any time, so it is very, very strongly recommended that
the monadRandom
Template Haskell function be used to implement this
function rather than directly implementing it. That function takes care
of choosing default implementations for any missing functions; as long as
at least one function is implemented, it will derive sensible
implementations of all others.
To use monadRandom
, just wrap your instance declaration as follows (and
enable the TemplateHaskell and GADTs language extensions):
$(monadRandom [d| instance MonadRandom FooM where getRandomDouble = return pi getRandomWord16 = return 4 {- etc... -} |])
MonadRandom (RVarT n) | |
class Monad m => RandomSource m s #
A source of entropy which can be used in the given monad.
See also MonadRandom
.
Minimum implementation is either the internal getRandomPrimFrom
or all
other functions. Additionally, this class's interface is subject to
extension at any time, so it is very, very strongly recommended that
the randomSource
Template Haskell function be used to implement this
function rather than directly implementing it. That function takes care
of choosing default implementations for any missing functions; as long as
at least one function is implemented, it will derive sensible
implementations of all others.
To use randomSource
, just wrap your instance declaration as follows (and
enable the TemplateHaskell, MultiParamTypeClasses and GADTs language
extensions, as well as any others required by your instances, such as
FlexibleInstances):
$(randomSource [d| instance RandomSource FooM Bar where {- at least one RandomSource function... -} |])
MonadRandom m => RandomSource m StdRandom | |
Monad m => RandomSource m (GetPrim m) | |
A token representing the "standard" entropy source in a MonadRandom
monad. Its sole purpose is to make the following true (when the types check):
runRVar x StdRandom === sampleRVar
MonadRandom m => RandomSource m StdRandom | |
Useful list-based operations
randomElement :: [a] -> RVar a Source #
A random variable returning an arbitrary element of the given list.
Every element has equal probability of being chosen. Because it is a
pure RVar
it has no memory - that is, it "draws with replacement."
shuffle :: [a] -> RVar [a] Source #
A random variable that returns the given list in an arbitrary shuffled order. Every ordering of the list has equal probability.
shuffleN :: Int -> [a] -> RVar [a] Source #
A random variable that shuffles a list of a known length (or a list prefix of the specified length). Useful for shuffling large lists when the length is known in advance. Avoids needing to traverse the list to discover its length. Each ordering has equal probability.