Copyright | (c) Tim Watson 2012 - 2013 |
---|---|
License | BSD3 (see the file LICENSE) |
Maintainer | Tim Watson <watson.timothy@gmail.com> |
Stability | experimental |
Portability | non-portable (requires concurrency) |
Safe Haskell | None |
Language | Haskell98 |
This module implements a process which supervises a set of other processes, referred to as its children. These child processes can be either workers (i.e., processes that do something useful in your application) or other supervisors. In this way, supervisors may be used to build a hierarchical process structure called a supervision tree, which provides a convenient structure for building fault tolerant software.
Unless otherwise stated, all functions in this module will cause the calling process to exit unless the specified supervisor process exists.
- Supervision Principles
A supervisor is responsible for starting, stopping and monitoring its child processes so as to keep them alive by restarting them when necessary.
The supervisors children are defined as a list of child specifications
(see ChildSpec
). When a supervisor is started, its children are started
in left-to-right (insertion order) according to this list. When a supervisor
stops (or exits for any reason), it will terminate its children in reverse
(i.e., from right-to-left of insertion) order. Child specs can be added to
the supervisor after it has started, either on the left or right of the
existing list of children.
When the supervisor spawns its child processes, they are always linked to their parent (i.e., the supervisor), therefore even if the supervisor is terminated abruptly by an asynchronous exception, the children will still be taken down with it, though somewhat less ceremoniously in that case.
- Restart Strategies
Supervisors are initialised with a RestartStrategy
, which describes how
the supervisor should respond to a child that exits and should be restarted
(see below for the rules governing child restart eligibility). Each restart
strategy comprises a RestartMode
and RestartLimit
, which govern how
the restart should be handled, and the point at which the supervisor
should give up and terminate itself respectively.
With the exception of the RestartOne
strategy, which indicates that the
supervisor will restart only the one individual failing child, each
strategy describes a way to select the set of children that should be
restarted if any child fails. The RestartAll
strategy, as its name
suggests, selects all children, whilst the RestartLeft
and RestartRight
strategies select all children to the left or right of the failed child,
in insertion (i.e., startup) order.
Note that a branch restart will only occur if the child that exited is
meant to be restarted. Since Temporary
children are never restarted and
Transient
children are not restarted if they exit normally, in both these
circumstances we leave the remaining supervised children alone. Otherwise,
the failing child is always included in the branch to be restarted.
For a hypothetical set of children a
through d
, the following pseudocode
demonstrates how the restart strategies work.
let children = [a..d] let failure = c restartsFor RestartOne children failure = [c] restartsFor RestartAll children failure = [a,b,c,d] restartsFor RestartLeft children failure = [a,b,c] restartsFor RestartRight children failure = [c,d]
- Branch Restarts
We refer to a restart (strategy) that involves a set of children as a
branch restart from now on. The behaviour of branch restarts can be further
refined by the RestartMode
with which a RestartStrategy
is parameterised.
The RestartEach
mode treats each child sequentially, first stopping the
respective child process and then restarting it. Each child is stopped and
started fully before moving on to the next, as the following imaginary
example demonstrates for children [a,b,c]
:
stop a start a stop b start b stop c start c
By contrast, RestartInOrder
will first run through the selected list of
children, stopping them. Then, once all the children have been stopped, it
will make a second pass, to handle (re)starting them. No child is started
until all children have been stopped, as the following imaginary example
demonstrates:
stop a stop b stop c start a start b start c
Both the previous examples have shown children being stopped and started
from left to right, but that is up to the user. The RestartMode
data
type's constructors take a RestartOrder
, which determines whether the
selected children will be processed from LeftToRight
or RightToLeft
.
Sometimes it is desireable to stop children in one order and start them
in the opposite. This is typically the case when children are in some
way dependent on one another, such that restarting them in the wrong order
might cause the system to misbehave. For this scenarios, there is another
RestartMode
that will shut children down in the given order, but then
restarts them in the reverse. Using RestartRevOrder
mode, if we have
children [a,b,c]
such that b
depends on a
and c
on b
, we can stop
them in the reverse of their startup order, but restart them the other way
around like so:
RestartRevOrder RightToLeft
The effect will be thus:
stop c stop b stop a start a start b start c
- Restart Intensity Limits
If a child process repeatedly crashes during (or shortly after) starting,
it is possible for the supervisor to get stuck in an endless loop of
restarts. In order prevent this, each restart strategy is parameterised
with a RestartLimit
that caps the number of restarts allowed within a
specific time period. If the supervisor exceeds this limit, it will stop,
terminating all its children (in left-to-right order) and exit with the
reason ExitOther ReachedMaxRestartIntensity
.
The MaxRestarts
type is a positive integer, and together with a specified
TimeInterval
forms the RestartLimit
to which the supervisor will adhere.
Since a great many children can be restarted in close succession when
a branch restart occurs (as a result of RestartAll
, RestartLeft
or
RestartRight
being triggered), the supervisor will track the operation
as a single restart attempt, since otherwise it would likely exceed its
maximum restart intensity too quickly.
- Child Restart and Termination Policies
When the supervisor detects that a child has died, the RestartPolicy
configured in the child specification is used to determin what to do. If
the this is set to Permanent
, then the child is always restarted.
If it is Temporary
, then the child is never restarted and the child
specification is removed from the supervisor. A Transient
child will
be restarted only if it terminates abnormally, otherwise it is left
inactive (but its specification is left in place). Finally, an Intrinsic
child is treated like a Transient
one, except that if this kind of child
exits normally, then the supervisor will also exit normally.
When the supervisor does terminate a child, the ChildTerminationPolicy
provided with the ChildSpec
determines how the supervisor should go
about doing so. If this is TerminateImmediately
, then the child will
be killed without further notice, which means the child will not have
an opportunity to clean up any internal state and/or release any held
resources. If the policy is TerminateTimeout delay
however, the child
will be sent an exit signal instead, i.e., the supervisor will cause
the child to exit via exit childPid ExitShutdown
, and then will wait
until the given delay
for the child to exit normally. If this does not
happen within the given delay, the supervisor will revert to the more
aggressive TerminateImmediately
policy and try again. Any errors that
occur during a timed-out shutdown will be logged, however exit reasons
resulting from TerminateImmediately
are ignored.
- Creating Child Specs
The ToChildStart
typeclass simplifies the process of defining a ChildStart
providing three default instances from which a ChildStart
datum can be
generated. The first, takes a Closure (Process ())
, where the enclosed
action (in the Process
monad) is the actual (long running) code that we
wish to supervise. In the case of a managed process, this is usually the
server loop, constructed by evaluating some variant of ManagedProcess.serve
.
The other two instances provide a means for starting children without having
to provide a Closure
. Both instances wrap the supplied Process
action in
some necessary boilerplate code, which handles spawning a new process and
communicating its ProcessId
to the supervisor. The instance for
Addressable a => SupervisorPid -> Process a
is special however, since this
API is intended for uses where the typical interactions with a process take
place via an opaque handle, for which an instance of the Addressable
typeclass is provided. This latter approach requires the expression which is
responsible for yielding the Addressable
handle to handling linking the
target process with the supervisor, since we have delegated responsibility
for spawning the new process and cannot perform the link oepration ourselves.
- Supervision Trees & Supervisor Termination
To create a supervision tree, one simply adds supervisors below one another
as children, setting the childType
field of their ChildSpec
to
Supervisor
instead of Worker
. Supervision tree can be arbitrarilly
deep, and it is for this reason that we recommend giving a Supervisor
child
an arbitrary length of time to stop, by setting the delay to Infinity
or a very large TimeInterval
.
- data ChildSpec = ChildSpec {}
- type ChildKey = String
- data ChildType
- = Worker
- | Supervisor
- data ChildTerminationPolicy
- data ChildStart
- = RunClosure !(Closure (Process ()))
- | CreateHandle !(Closure (SupervisorPid -> Process (ChildPid, Message)))
- | StarterProcess !StarterPid
- data RegisteredName
- data RestartPolicy
- data ChildRef
- isRunning :: ChildRef -> Bool
- isRestarting :: ChildRef -> Bool
- type Child = (ChildRef, ChildSpec)
- type StaticLabel = String
- type SupervisorPid = ProcessId
- type ChildPid = ProcessId
- type StarterPid = ProcessId
- class ToChildStart a where
- toChildStart :: a -> Process ChildStart
- start :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process SupervisorPid
- run :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process ()
- data MaxRestarts
- maxRestarts :: Int -> MaxRestarts
- data RestartLimit = RestartLimit {
- maxR :: !MaxRestarts
- maxT :: !TimeInterval
- limit :: MaxRestarts -> TimeInterval -> RestartLimit
- defaultLimits :: RestartLimit
- data RestartMode
- = RestartEach {
- order :: !RestartOrder
- | RestartInOrder {
- order :: !RestartOrder
- | RestartRevOrder {
- order :: !RestartOrder
- = RestartEach {
- data RestartOrder
- data RestartStrategy
- = RestartOne { }
- | RestartAll {
- intensity :: !RestartLimit
- mode :: !RestartMode
- | RestartLeft {
- intensity :: !RestartLimit
- mode :: !RestartMode
- | RestartRight {
- intensity :: !RestartLimit
- mode :: !RestartMode
- data ShutdownMode
- restartOne :: RestartStrategy
- restartAll :: RestartStrategy
- restartLeft :: RestartStrategy
- restartRight :: RestartStrategy
- addChild :: Addressable a => a -> ChildSpec -> Process AddChildResult
- data AddChildResult
- data StartChildResult
- startChild :: Addressable a => a -> ChildKey -> Process StartChildResult
- startNewChild :: Addressable a => a -> ChildSpec -> Process AddChildResult
- terminateChild :: Addressable a => a -> ChildKey -> Process TerminateChildResult
- data TerminateChildResult
- deleteChild :: Addressable a => a -> ChildKey -> Process DeleteChildResult
- data DeleteChildResult
- restartChild :: Addressable a => a -> ChildKey -> Process RestartChildResult
- data RestartChildResult
- shutdown :: Resolvable a => a -> Process ()
- shutdownAndWait :: Resolvable a => a -> Process ()
- lookupChild :: Addressable a => a -> ChildKey -> Process (Maybe (ChildRef, ChildSpec))
- listChildren :: Addressable a => a -> Process [Child]
- data SupervisorStats = SupervisorStats {
- _children :: Int
- _supervisors :: Int
- _workers :: Int
- _running :: Int
- _activeSupervisors :: Int
- _activeWorkers :: Int
- totalRestarts :: Int
- statistics :: Addressable a => a -> Process SupervisorStats
- data StartFailure
- data ChildInitFailure
Defining and Running a Supervisor
Specification for a child process. The child must be uniquely identified
by it's childKey
within the supervisor. The supervisor will start the child
itself, therefore childRun
should contain the child process' implementation
e.g., if the child is a long running server, this would be the server loop,
as with e.g., ManagedProces.start
.
ChildSpec | |
|
Specifies whether the child is another supervisor, or a worker.
data ChildStart Source
RunClosure !(Closure (Process ())) | |
CreateHandle !(Closure (SupervisorPid -> Process (ChildPid, Message))) | |
StarterProcess !StarterPid |
data RegisteredName Source
data RestartPolicy Source
Describes when a terminated child process should be restarted.
A reference to a (possibly running) child.
ChildRunning !ChildPid | a reference to the (currently running) child |
ChildRunningExtra !ChildPid !Message | also a currently running child, with extra child info |
ChildRestarting !ChildPid | a reference to the old (previous) child (now restarting) |
ChildStopped | indicates the child is not currently running |
ChildStartIgnored | a non-temporary child exited with |
isRestarting :: ChildRef -> Bool Source
type StaticLabel = String Source
Static labels (in the remote table) are strings.
type SupervisorPid = ProcessId Source
type StarterPid = ProcessId Source
class ToChildStart a where Source
A type that can be converted to a ChildStart
.
toChildStart :: a -> Process ChildStart Source
ToChildStart (Process ()) Source | |
ToChildStart (Closure (SupervisorPid -> Process (ChildPid, Message))) Source | |
ToChildStart (Closure (Process ())) Source | |
Resolvable a => ToChildStart (SupervisorPid -> Process a) Source |
start :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process SupervisorPid Source
Start a supervisor (process), running the supplied children and restart strategy.
start = spawnLocal . run
run :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process () Source
Run the supplied children using the provided restart strategy.
Limits and Defaults
data MaxRestarts Source
maxRestarts :: Int -> MaxRestarts Source
Smart constructor for MaxRestarts
. The maximum
restart count must be a positive integer.
data RestartLimit Source
A compulsary limit on the number of restarts that a supervisor will
tolerate before it terminates all child processes and then itself.
If > MaxRestarts
occur within the specified TimeInterval
, termination
will occur. This prevents the supervisor from entering an infinite loop of
child process terminations and restarts.
RestartLimit | |
|
limit :: MaxRestarts -> TimeInterval -> RestartLimit Source
data RestartMode Source
RestartEach | stop then start each child sequentially, i.e., |
| |
RestartInOrder | stop all children first, then restart them sequentially |
| |
RestartRevOrder | stop all children in the given order, but start them in reverse |
|
data RestartOrder Source
data RestartStrategy Source
Strategy used by a supervisor to handle child restarts, whether due to unexpected child failure or explicit restart requests from a client.
Some terminology: We refer to child processes managed by the same supervisor
as siblings. When restarting a child process, the RestartNone
policy
indicates that sibling processes should be left alone, whilst the RestartAll
policy will cause all children to be restarted (in the same order they were
started).
The other two restart strategies refer to prior and subsequent siblings, which describe's those children's configured position (i.e., insertion order). These latter modes allow one to control the order in which siblings are restarted, and to exclude some siblings from the restart without having to resort to grouping them using a child supervisor.
RestartOne | restart only the failed child process |
RestartAll | also restart all siblings |
| |
RestartLeft | restart prior siblings (i.e., prior start order) |
| |
RestartRight | restart subsequent siblings (i.e., subsequent start order) |
|
data ShutdownMode Source
restartOne :: RestartStrategy Source
Provides a default RestartStrategy
for RestartOne
.
> restartOne = RestartOne defaultLimits
restartAll :: RestartStrategy Source
Provides a default RestartStrategy
for RestartAll
.
> restartOne = RestartAll defaultLimits (RestartEach LeftToRight)
restartLeft :: RestartStrategy Source
Provides a default RestartStrategy
for RestartLeft
.
> restartOne = RestartLeft defaultLimits (RestartEach LeftToRight)
restartRight :: RestartStrategy Source
Provides a default RestartStrategy
for RestartRight
.
> restartOne = RestartRight defaultLimits (RestartEach LeftToRight)
Adding and Removing Children
addChild :: Addressable a => a -> ChildSpec -> Process AddChildResult Source
Add a new child.
data AddChildResult Source
data StartChildResult Source
startChild :: Addressable a => a -> ChildKey -> Process StartChildResult Source
startNewChild :: Addressable a => a -> ChildSpec -> Process AddChildResult Source
Atomically add and start a new child spec. Will fail if a child with the given key is already present.
terminateChild :: Addressable a => a -> ChildKey -> Process TerminateChildResult Source
Terminate a running child.
deleteChild :: Addressable a => a -> ChildKey -> Process DeleteChildResult Source
Delete a supervised child. The child must already be stopped (see
terminateChild
).
data DeleteChildResult Source
The result of a call to removeChild
.
ChildDeleted | the child specification was successfully removed |
ChildNotFound | the child specification was not found |
ChildNotStopped !ChildRef | the child was not removed, as it was not stopped. |
restartChild :: Addressable a => a -> ChildKey -> Process RestartChildResult Source
Forcibly restart a running child.
data RestartChildResult Source
Normative Shutdown
shutdown :: Resolvable a => a -> Process () Source
Gracefully terminate a running supervisor. Returns immediately if the address cannot be resolved.
shutdownAndWait :: Resolvable a => a -> Process () Source
As shutdown
, but waits until the supervisor process has exited, at which
point the caller can be sure that all children have also stopped. Returns
immediately if the address cannot be resolved.
Queries and Statistics
lookupChild :: Addressable a => a -> ChildKey -> Process (Maybe (ChildRef, ChildSpec)) Source
Lookup a possibly supervised child, given its ChildKey
.
listChildren :: Addressable a => a -> Process [Child] Source
List all know (i.e., configured) children.
data SupervisorStats Source
SupervisorStats | |
|
statistics :: Addressable a => a -> Process SupervisorStats Source
Obtain statistics about a running supervisor.
Additional (Misc) Types
data StartFailure Source
Provides failure information when (re-)start failure is indicated.
StartFailureDuplicateChild !ChildRef | a child with this |
StartFailureAlreadyRunning !ChildRef | the child is already up and running |
StartFailureBadClosure !StaticLabel | a closure cannot be resolved |
StartFailureDied !DiedReason | a child died (almost) immediately on starting |