Copyright	Patrick Steele 2015
License	MIT (see the LICENSE file)
Maintainer	prs233@cornell.edu
Safe Haskell	None
Language	Haskell98

Algorithms.MDP

Contents

Markov decision processes
Types
Utility functions
Validation

Description

Algorithms and data structures for expressing and solving Markov decision processes (MDPs).

See the following for references on the algorithms implemented, along with general terminology.

"Dynamic Programmand and Optimal Control, Vol. II", by Dimitri P. Bertsekas, Athena Scientific, Belmont, Massachusetts.
"Stochastic Dynamic Programming and the Control of Queueing Systems", by Linn I. Sennott, A Wiley- Interscience Publication, New York.

The module Algorithms.MDP.Examples contains implementations of several example problems from these texts.

To actually solve an MDP, use (for example) the valueIteration function from the Algorithms.MDP.ValueIteration module.

Synopsis

Markov decision processes

data MDP a b t Source

A Markov decision process.

An MDP consists of a state space, an action space, state- and action-dependent costs, and state- and action-dependent transition probabilities. The goal is to compute a policy -- a mapping from states to actions -- which minimizes the total discounted cost of the problem, assuming a given discount factor in the range (0, 1].

Here the type variable a represents the type of the states, b represents the type of the actions, and t represents the numeric type used in computations. Generally choosing t to be a Double is fine, although there is no reason a higher-precision type cannot be used.

This type should not be constructed directly; use the mkDiscountedMDP or mkUndiscountedMDP constructors instead.

Constructors

MDP
Fields _states :: Vector a _actions :: Vector b _costs :: Vector (Vector t) _trans :: Vector (Vector (Vector t)) _discount :: t _actionSet :: Vector (Vector Int)

mkDiscountedMDP Source

Arguments

:: Eq b
=> [a]	The state space
-> [b]	The action space
-> Transitions a b t	The transition probabilities
-> Costs a b t	The action-dependent costs
-> ActionSet a b	The state-dependent actions
-> t	The discount factor
-> MDP a b t	The resulting DiscountedMDP

Creates a discounted MDP.

mkUndiscountedMDP Source

Arguments

:: (Eq b, Num t)
=> [a]	The state space
-> [b]	The action space
-> Transitions a b t	The transition probabilities
-> Costs a b t	The action-dependent costs
-> ActionSet a b	The state-dependent actions
-> MDP a b t	The resulting DiscountedMDP

Creates an undiscounted MDP.

Types

type Transitions a b t = b -> a -> a -> t Source

A type representing an action- and state-dependent probablity vector.

type Costs a b t = b -> a -> t Source

A type representing an action- and state-dependent cost.

type ActionSet a b = a -> [b] Source

A type representing the allowed actions in a state.

type CF a b t = Vector (a, b, t) Source

A cost function is a vector containing (state, action, cost) triples. Each triple describes the cost of taking the action in that state.

data CFBounds a b t Source

A cost function with error bounds. The cost in a (state, action, cost) triple is guaranteed to be in the range [cost + lb, cost + ub]

Constructors

CFBounds
Fields _CF :: CF a b t _lb :: t _ub :: t

Utility functions

cost :: Eq a => a -> CF a b t -> t Source

Get the cost associated with a state.

This function is only defined over the state values passed in to the original MDP.

action :: Eq a => a -> CF a b t -> b Source

Get the action associated with a state.

This function is only defined over the state values passed in to the original MDP.

optimalityGap :: Num t => CFBounds a b t -> t Source

Compute the optimality gap associated with a CFBounds.

This error is absolute, not relative.

Validation

verifyStochastic :: (Ord t, Num t) => MDP a b t -> t -> Either (MDPError a b t) () Source

Returns the non-stochastic (action, state) pairs in an MDP.

An (action, state) pair is not stochastic if any transitions out of the state occur with negative probability, or if the total probability all possible transitions is not 1 (within the given tolerance).

Verifies that the MDP is stochastic.

An MDP is stochastic if all transition probabilities are non-negative, and the total sum of transitions out of a state under a legal action sum to one.

We verify sums to within the given tolerance.

data MDPError a b t Source

An error describing the ways an MDP can be poorly-defined.

An MDP can be poorly defined by having negative transition probabilities, or having the total probability associated with a state and action exceeding one.

Constructors

MDPError
Fields _negativeProbability :: [(b, a, a, t)] _notOneProbability :: [(b, a, t)]

Instances

(Show a, Show b, Show t) => Show (MDPError a b t) Source