mdp-0.1.0.0: Tools for solving Markov Decision Processes.

Safe HaskellNone
LanguageHaskell98

Algorithms.MDP.Examples

Contents

Description

This module shows how to solve several example problems using this library.

Synopsis

A discounted problem

We consider the problem defined in Algorithms.MDP.Examples.Ex_3_1; this example comes from Bersekas p. 22.

We will solve this problem using regular value iteration. Having constructed the MDP, we can do this using the valueIteration function.

import Algorithms.MDP.Examples.Ex_3_1
import Algorithms.MDP.ValueIteration

iterations :: [CF State Control Double]
iterations = valueIteration mdp

The iterates returned contain estimates of the cost of being at each state. To see the costs of the state A over the first 10 iterations, we could do

estimates :: [Double]
estimates = map (cost A) (take 10 iterations)

A discounted problem with error bounds

We consider the same example as above, but this time we use relative value iteration to compute error bounds on the costs. This will allow us to use fewer iterations to obtain an accurate cost estimate.

Since we have already defined the problem, we do this via the relativeValueIteration function.

import Algorithms.MDP.Examples.Ex_3_1
import Algorithms.MDP.ValueIteration

iterations :: [CFBounds State Control Double]
iterations = relativeValueIteration mdp

The iterates returned contain estimates of the cost of being at each state, along with associated error bounds. To see the costs of the state A over the first 10 iterations adjusted for the error bounds, we could do

estimate state (CFBounds cf lb ub) = (z + lb, z + ub)
  where
    z = cost state cf

estimates :: [(Double, Double)]
estimates = map (estimate A) (take 10 iterations)

Note that the lower- and upper-bounds returned in the first iteration are always +/-Infinity, and so it can be useful to consider only the tail of the iterations.

An average cost problem

We consider the problem defined in Algorithms.MDP.Examples.Ex_3_2; this example comes from Bersekas p. 210.

Here we are interested in computing the long-run average cost of an undiscounted MDP. For this we use the undiscountedRelativeValueIteration function.

import Algorithms.MDP.Examples.Ex_3_2
import Algorithms.MDP.ValueIteration

iterations :: [CFBounds State Control Double]
iterations = undiscountedRelativeValueIteration mdp

We can compute cost estimates in the same fashion as above.

estimate state (CFBounds cf lb ub) = (lb, ub)

estimates :: [(Double, Double)]
estimates = map (estimate A) (take 10 iterations)

It is important to note that in this problem the cost function returned in each CFBounds object is not to be interpreted as a vector of costs, but rather as a differential cost vector; however, the estimates above retrain the same interpretation.

A continuous-time undiscounted problem

We now consider a family of problems described by Sennot p. 248.

Here we are interested in first converting a CTMDP to an MDP via uniformization, and then computing the long-run average cost of the optimal policy.

To begin, we construct one of the scenarios provided (each scenario is just an instance of the problem with certain parameters). We then convert the scenario to an MDP using the uniformize function.

import Algorithms.MDP.Examples.MM1
import Algorithms.MDP.CTMDP
import Algorithms.MDP.ValueIteration

scenario :: CTMDP State Action Double
scenario = mkInstance scenario1

mdp :: MDP State Action Double
mdp = uniformize scenario

As above, we can use the undiscountedRelativeValueIteration function to compute cost estimates.

iterations :: [CFBounds State Action Double]
iterations = undiscountedRelativeValueIteration mdp

estimate state (CFBounds _ lb ub) = (lb, ub)

estimates :: [(Double, Double)]
estimates = map (estimate A) (take 10 iterations)