Copyright | (c) Tim Watson 2012 - 2013 |
---|---|
License | BSD3 (see the file LICENSE) |
Maintainer | Tim Watson <watson.timothy@gmail.com> |
Stability | experimental |
Portability | non-portable (requires concurrency) |
Safe Haskell | None |
Language | Haskell98 |
This module implements a process which supervises a set of other processes, referred to as its children. These child processes can be either workers (i.e., processes that do something useful in your application) or other supervisors. In this way, supervisors may be used to build a hierarchical process structure called a supervision tree, which provides a convenient structure for building fault tolerant software.
Unless otherwise stated, all client functions in this module will cause the calling process to exit unless the specified supervisor process can be resolved.
- Supervision Principles
A supervisor is responsible for starting, stopping and monitoring its child processes so as to keep them alive by restarting them when necessary.
The supervisor's children are defined as a list of child specifications (see ChildSpec). When a supervisor is started, its children are started in left-to-right (insertion order) according to this list. When a supervisor stops (or exits for any reason), it will stop all its children before exiting. Child specs can be added to the supervisor after it has started, either on the left or right of the existing list of child specs.
- Restart Strategies
Supervisors are initialised with a RestartStrategy
, which describes how
the supervisor should respond to a child that exits and should be restarted
(see below for the rules governing child restart eligibility). Each restart
strategy comprises a RestartMode
and RestartLimit
, which govern how
the restart should be handled, and the point at which the supervisor
should give up and stop itself respectively.
With the exception of the RestartOne
strategy, which indicates that the
supervisor will restart only the one individual failing child, each
strategy describes a way to select the set of children that should be
restarted if any child fails. The RestartAll
strategy, as its name
suggests, selects all children, whilst the RestartLeft
and RestartRight
strategies select all children to the left or right of the failed child,
in insertion (i.e., startup) order.
Note that a branch restart will only occur if the child that exited is
meant to be restarted. Since Temporary
children are never restarted and
Transient
children are not restarted if they exit normally, in both these
circumstances we leave the remaining supervised children alone. Otherwise,
the failing child is always included in the branch to be restarted.
For a hypothetical set of children a
through d
, the following pseudocode
demonstrates how the restart strategies work.
let children = [a..d] let failure = c restartsFor RestartOne children failure = [c] restartsFor RestartAll children failure = [a,b,c,d] restartsFor RestartLeft children failure = [a,b,c] restartsFor RestartRight children failure = [c,d]
- Branch Restarts
We refer to a restart (strategy) that involves a set of children as a
branch restart from now on. The behaviour of branch restarts can be further
refined by the RestartMode
with which a RestartStrategy
is parameterised.
The RestartEach
mode treats each child sequentially, first stopping the
respective child process and then restarting it. Each child is stopped and
started fully before moving on to the next, as the following imaginary
example demonstrates for children [a,b,c]
:
stop a start a stop b start b stop c start c
By contrast, RestartInOrder
will first run through the selected list of
children, stopping them. Then, once all the children have been stopped, it
will make a second pass, to handle (re)starting them. No child is started
until all children have been stopped, as the following imaginary example
demonstrates:
stop a stop b stop c start a start b start c
Both the previous examples have shown children being stopped and started
from left to right, but that is up to the user. The RestartMode
data
type's constructors take a RestartOrder
, which determines whether the
selected children will be processed from LeftToRight
or RightToLeft
.
Sometimes it is desireable to stop children in one order and start them
in the opposite. This is typically the case when children are in some
way dependent on one another, such that restarting them in the wrong order
might cause the system to misbehave. For this scenarios, there is another
RestartMode
that will shut children down in the given order, but then
restarts them in the reverse. Using RestartRevOrder
mode, if we have
children [a,b,c]
such that b
depends on a
and c
on b
, we can stop
them in the reverse of their startup order, but restart them the other way
around like so:
RestartRevOrder RightToLeft
The effect will be thus:
stop c stop b stop a start a start b start c
- Restart Intensity Limits
If a child process repeatedly crashes during (or shortly after) starting,
it is possible for the supervisor to get stuck in an endless loop of
restarts. In order prevent this, each restart strategy is parameterised
with a RestartLimit
that caps the number of restarts allowed within a
specific time period. If the supervisor exceeds this limit, it will stop,
stopping all its children (in left-to-right order) and exit with the
reason ExitOther ReachedMaxRestartIntensity
.
The MaxRestarts
type is a positive integer, and together with a specified
TimeInterval
forms the RestartLimit
to which the supervisor will adhere.
Since a great many children can be restarted in close succession when
a branch restart occurs (as a result of RestartAll
, RestartLeft
or
RestartRight
being triggered), the supervisor will track the operation
as a single restart attempt, since otherwise it would likely exceed its
maximum restart intensity too quickly.
- Child Restart and Stop Policies
When the supervisor detects that a child has died, the RestartPolicy
configured in the child specification is used to determin what to do. If
the this is set to Permanent
, then the child is always restarted.
If it is Temporary
, then the child is never restarted and the child
specification is removed from the supervisor. A Transient
child will
be restarted only if it exits abnormally, otherwise it is left
inactive (but its specification is left in place). Finally, an Intrinsic
child is treated like a Transient
one, except that if this kind of child
exits normally, then the supervisor will also exit normally.
When the supervisor does stop a child process, the ChildStopPolicy
provided with the ChildSpec
determines how the supervisor should go
about doing so. If this is StopImmediately, then the child will
be killed without further notice, which means the child will not have
an opportunity to clean up any internal state and/or release any held
resources. If the policy is StopTimeout delay
however, the child
will be sent an exit signal instead, i.e., the supervisor will cause
the child to exit via exit childPid ExitShutdown
, and then will wait
until the given delay
for the child to exit normally. If this does not
happen within the given delay, the supervisor will revert to the more
aggressive StopImmediately policy and try again. Any errors that
occur during a timed-out shutdown will be logged, however exit reasons
resulting from StopImmediately are ignored.
- Creating Child Specs
The ToChildStart
typeclass simplifies the process of defining a ChildStart
providing two default instances from which a ChildStart
datum can be
generated. The first, takes a Closure (Process ())
, where the enclosed
action (in the Process
monad) is the actual (long running) code that we
wish to supervise. In the case of a managed process, this is usually the
server loop, constructed by evaluating some variant of ManagedProcess.serve
.
The second instance supports returning a handle which can contain extra data about the child process - usually this is a newtype wrapper used by clients to communicate with the process.
When the supervisor spawns its child processes, they should be linked to their
parent (i.e., the supervisor), such that even if the supervisor is killed
abruptly by an asynchronous exception, the children will still be taken down
with it, though somewhat less ceremoniously in that case. This behaviour is
injected by the supervisor for any ChildStart built on Closure (Process ())
automatically, but the handle based approach requires that the Closure
responsible for spawning does the linking itself.
Finally, we provide a simple shortcut to staticClosure
, for consumers
who've manually registered with the remote table and don't with to use
tempate haskell (e.g. users of the Explicit closures API).
- Supervision Trees & Supervisor Shutdown
To create a supervision tree, one simply adds supervisors below one another
as children, setting the childType
field of their ChildSpec
to
Supervisor
instead of Worker
. Supervision tree can be arbitrarilly
deep, and it is for this reason that we recommend giving a Supervisor
child
an arbitrary length of time to stop, by setting the delay to Infinity
or a very large TimeInterval
.
- data ChildSpec = ChildSpec {}
- type ChildKey = String
- data ChildType
- = Worker
- | Supervisor
- data ChildStopPolicy
- data ChildStart
- = RunClosure !(Closure (Process ()))
- | CreateHandle !(Closure (SupervisorPid -> Process (ChildPid, Message)))
- data RegisteredName
- data RestartPolicy
- data ChildRef
- isRunning :: ChildRef -> Bool
- isRestarting :: ChildRef -> Bool
- type Child = (ChildRef, ChildSpec)
- type StaticLabel = String
- type SupervisorPid = ProcessId
- type ChildPid = ProcessId
- class ToChildStart a where
- start :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process SupervisorPid
- run :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process ()
- data MaxRestarts
- maxRestarts :: Int -> MaxRestarts
- data RestartLimit = RestartLimit {
- maxR :: !MaxRestarts
- maxT :: !TimeInterval
- limit :: MaxRestarts -> TimeInterval -> RestartLimit
- defaultLimits :: RestartLimit
- data RestartMode
- = RestartEach {
- order :: !RestartOrder
- | RestartInOrder {
- order :: !RestartOrder
- | RestartRevOrder {
- order :: !RestartOrder
- = RestartEach {
- data RestartOrder
- data RestartStrategy
- = RestartOne { }
- | RestartAll {
- intensity :: !RestartLimit
- mode :: !RestartMode
- | RestartLeft {
- intensity :: !RestartLimit
- mode :: !RestartMode
- | RestartRight {
- intensity :: !RestartLimit
- mode :: !RestartMode
- data ShutdownMode
- restartOne :: RestartStrategy
- restartAll :: RestartStrategy
- restartLeft :: RestartStrategy
- restartRight :: RestartStrategy
- addChild :: Addressable a => a -> ChildSpec -> Process AddChildResult
- data AddChildResult
- data StartChildResult
- startChild :: Addressable a => a -> ChildKey -> Process StartChildResult
- startNewChild :: Addressable a => a -> ChildSpec -> Process AddChildResult
- stopChild :: Addressable a => a -> ChildKey -> Process StopChildResult
- data StopChildResult
- deleteChild :: Addressable a => a -> ChildKey -> Process DeleteChildResult
- data DeleteChildResult
- restartChild :: Addressable a => a -> ChildKey -> Process RestartChildResult
- data RestartChildResult
- shutdown :: Resolvable a => a -> Process ()
- shutdownAndWait :: Resolvable a => a -> Process ()
- lookupChild :: Addressable a => a -> ChildKey -> Process (Maybe (ChildRef, ChildSpec))
- listChildren :: Addressable a => a -> Process [Child]
- data SupervisorStats = SupervisorStats {
- _children :: Int
- _supervisors :: Int
- _workers :: Int
- _running :: Int
- _activeSupervisors :: Int
- _activeWorkers :: Int
- totalRestarts :: Int
- statistics :: Addressable a => a -> Process SupervisorStats
- getRestartIntensity :: RestartStrategy -> RestartLimit
- definedChildren :: SupervisorStats -> Int
- definedWorkers :: SupervisorStats -> Int
- definedSupervisors :: SupervisorStats -> Int
- runningChildren :: SupervisorStats -> Int
- runningWorkers :: SupervisorStats -> Int
- runningSupervisors :: SupervisorStats -> Int
- data StartFailure
- data ChildInitFailure
Defining and Running a Supervisor
Specification for a child process. The child must be uniquely identified
by it's childKey
within the supervisor. The supervisor will start the child
itself, therefore childRun
should contain the child process' implementation
e.g., if the child is a long running server, this would be the server loop,
as with e.g., ManagedProces.start
.
ChildSpec | |
|
Specifies whether the child is another supervisor, or a worker.
data ChildStopPolicy Source #
Governs how the supervisor will instruct child processes to stop.
data ChildStart Source #
Defines the way in which a child process is to be started.
RunClosure !(Closure (Process ())) | |
CreateHandle !(Closure (SupervisorPid -> Process (ChildPid, Message))) |
data RegisteredName Source #
Represents a registered name, for registration locally using the
register
primitive, or via a Closure (ChildPid -> Process ())
such that
registration can be performed using alternative process registries.
data RestartPolicy Source #
Describes when a stopped child process should be restarted.
A reference to a (possibly running) child.
ChildRunning !ChildPid | a reference to the (currently running) child |
ChildRunningExtra !ChildPid !Message | also a currently running child, with extra child info |
ChildRestarting !ChildPid | a reference to the old (previous) child (now restarting) |
ChildStopped | indicates the child is not currently running |
ChildStartIgnored | a non-temporary child exited with |
type StaticLabel = String Source #
Static labels (in the remote table) are strings.
type SupervisorPid = ProcessId Source #
The ProcessId of a supervisor.
class ToChildStart a where Source #
A type that can be converted to a ChildStart
.
toChildStart :: a -> Process ChildStart Source #
ToChildStart (Closure (SupervisorPid -> Process (ChildPid, Message))) Source # | |
ToChildStart (Closure (Process ())) Source # | |
ToChildStart (Static (Process ())) Source # | |
start :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process SupervisorPid Source #
Start a supervisor (process), running the supplied children and restart strategy.
start = spawnLocal . run
run :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process () Source #
Run the supplied children using the provided restart strategy.
Limits and Defaults
data MaxRestarts Source #
The maximum number of restarts a supervisor will tollerate, created by evaluating "maxRestarts".
maxRestarts :: Int -> MaxRestarts Source #
Smart constructor for MaxRestarts
. The maximum restart count must be a
positive integer, otherwise you will see error "MaxR must be >= 0"
.
data RestartLimit Source #
A compulsary limit on the number of restarts that a supervisor will
tolerate before it stops all child processes and then itself.
If > MaxRestarts
occur within the specified TimeInterval
, the child
will be stopped. This prevents the supervisor from entering an infinite loop
of child process stops and restarts.
RestartLimit | |
|
limit :: MaxRestarts -> TimeInterval -> RestartLimit Source #
Smart constructor for RestartLimit.
defaultLimits :: RestartLimit Source #
Default RestartLimit of MaxR 1
within Seconds 1
.
data RestartMode Source #
Instructs a supervisor on how to restart its children.
RestartEach | stop then start each child sequentially, i.e., |
| |
RestartInOrder | stop all children first, then restart them sequentially |
| |
RestartRevOrder | stop all children in the given order, but start them in reverse |
|
data RestartOrder Source #
Specifies the order in which a supervisor should apply restarts.
data RestartStrategy Source #
Strategy used by a supervisor to handle child restarts, whether due to unexpected child failure or explicit restart requests from a client.
Some terminology: We refer to child processes managed by the same supervisor
as siblings. When restarting a child process, the RestartNone
policy
indicates that sibling processes should be left alone, whilst the RestartAll
policy will cause all children to be restarted (in the same order they were
started).
The other two restart strategies refer to prior and subsequent siblings, which describe's those children's configured position in insertion order in the child specs. These latter modes allow one to control the order in which siblings are restarted, and to exclude some siblings from restarting, without having to resort to grouping them using a child supervisor.
RestartOne | restart only the failed child process |
RestartAll | also restart all siblings |
| |
RestartLeft | restart prior siblings (i.e., prior start order) |
| |
RestartRight | restart subsequent siblings (i.e., subsequent start order) |
|
data ShutdownMode Source #
Instructs a supervisor on how to instruct its children to stop running when the supervisor itself is shutting down.
restartOne :: RestartStrategy Source #
Provides a default RestartStrategy
for RestartOne
.
> restartOne = RestartOne defaultLimits
restartAll :: RestartStrategy Source #
Provides a default RestartStrategy
for RestartAll
.
> restartOne = RestartAll defaultLimits (RestartEach LeftToRight)
restartLeft :: RestartStrategy Source #
Provides a default RestartStrategy
for RestartLeft
.
> restartOne = RestartLeft defaultLimits (RestartEach LeftToRight)
restartRight :: RestartStrategy Source #
Provides a default RestartStrategy
for RestartRight
.
> restartOne = RestartRight defaultLimits (RestartEach LeftToRight)
Adding and Removing Children
addChild :: Addressable a => a -> ChildSpec -> Process AddChildResult Source #
Add a new child.
data AddChildResult Source #
The result of an addChild
request.
ChildAdded !ChildRef | The child was added correctly |
ChildFailedToStart !StartFailure | The child failed to start |
data StartChildResult Source #
The result of a startChild
request.
ChildStartOk !ChildRef | The child started successfully |
ChildStartFailed !StartFailure | The child failed to start |
ChildStartUnknownId | The child key was not recognised by the supervisor |
startChild :: Addressable a => a -> ChildKey -> Process StartChildResult Source #
startNewChild :: Addressable a => a -> ChildSpec -> Process AddChildResult Source #
Atomically add and start a new child spec. Will fail if a child with the given key is already present.
stopChild :: Addressable a => a -> ChildKey -> Process StopChildResult Source #
Stop a running child.
data StopChildResult Source #
The result of a stopChild
request.
StopChildOk | The child was stopped successfully |
StopChildUnknownId | The child key was not recognised by the supervisor |
deleteChild :: Addressable a => a -> ChildKey -> Process DeleteChildResult Source #
Delete a supervised child. The child must already be stopped (see
stopChild
).
data DeleteChildResult Source #
The result of a call to removeChild
.
ChildDeleted | the child specification was successfully removed |
ChildNotFound | the child specification was not found |
ChildNotStopped !ChildRef | the child was not removed, as it was not stopped. |
restartChild :: Addressable a => a -> ChildKey -> Process RestartChildResult Source #
Forcibly restart a running child.
data RestartChildResult Source #
The result of a restartChild
request.
ChildRestartOk !ChildRef | The child restarted successfully |
ChildRestartFailed !StartFailure | The child failed to restart |
ChildRestartUnknownId | The child key was not recognised by the supervisor |
Normative Shutdown
shutdown :: Resolvable a => a -> Process () Source #
Gracefully stop/shutdown a running supervisor. Returns immediately if the address cannot be resolved.
shutdownAndWait :: Resolvable a => a -> Process () Source #
As shutdown
, but waits until the supervisor process has exited, at which
point the caller can be sure that all children have also stopped. Returns
immediately if the address cannot be resolved.
Queries and Statistics
lookupChild :: Addressable a => a -> ChildKey -> Process (Maybe (ChildRef, ChildSpec)) Source #
Lookup a possibly supervised child, given its ChildKey
.
listChildren :: Addressable a => a -> Process [Child] Source #
List all know (i.e., configured) children.
data SupervisorStats Source #
Statistics about a running supervisor
SupervisorStats | |
|
statistics :: Addressable a => a -> Process SupervisorStats Source #
Obtain statistics about a running supervisor.
getRestartIntensity :: RestartStrategy -> RestartLimit Source #
The RestartLimit for a given RestartStrategy
definedChildren :: SupervisorStats -> Int Source #
How many child specs are defined for this supervisor
definedWorkers :: SupervisorStats -> Int Source #
How many child specs define a worker (non-supervisor)
definedSupervisors :: SupervisorStats -> Int Source #
How many child specs define a supervisor?
runningChildren :: SupervisorStats -> Int Source #
How many running child processes.
runningWorkers :: SupervisorStats -> Int Source #
How many worker (non-supervisor) child processes are running.
runningSupervisors :: SupervisorStats -> Int Source #
How many supervisor child processes are running
Additional (Misc) Types
data StartFailure Source #
Provides failure information when (re-)start failure is indicated.
StartFailureDuplicateChild !ChildRef | a child with this |
StartFailureAlreadyRunning !ChildRef | the child is already up and running |
StartFailureBadClosure !StaticLabel | a closure cannot be resolved |
StartFailureDied !DiedReason | a child died (almost) immediately on starting |
data ChildInitFailure Source #
A child process failure during init will be reported using this datum
ChildInitFailure !String | The init failed with the corresponding message |
ChildInitIgnore | The child told the supervisor to ignore its startup procedure |