distributed-process-supervisor-0.1.3.1: Supervisors for The Cloud Haskell Application Platform

Copyright(c) Tim Watson 2012 - 2013
LicenseBSD3 (see the file LICENSE)
MaintainerTim Watson <watson.timothy@gmail.com>
Stabilityexperimental
Portabilitynon-portable (requires concurrency)
Safe HaskellNone
LanguageHaskell98

Control.Distributed.Process.Supervisor

Contents

Description

This module implements a process which supervises a set of other processes, referred to as its children. These child processes can be either workers (i.e., processes that do something useful in your application) or other supervisors. In this way, supervisors may be used to build a hierarchical process structure called a supervision tree, which provides a convenient structure for building fault tolerant software.

Unless otherwise stated, all functions in this module will cause the calling process to exit unless the specified supervisor process exists.

Supervision Principles

A supervisor is responsible for starting, stopping and monitoring its child processes so as to keep them alive by restarting them when necessary.

The supervisors children are defined as a list of child specifications (see ChildSpec). When a supervisor is started, its children are started in left-to-right (insertion order) according to this list. When a supervisor stops (or exits for any reason), it will terminate its children in reverse (i.e., from right-to-left of insertion) order. Child specs can be added to the supervisor after it has started, either on the left or right of the existing list of children.

When the supervisor spawns its child processes, they are always linked to their parent (i.e., the supervisor), therefore even if the supervisor is terminated abruptly by an asynchronous exception, the children will still be taken down with it, though somewhat less ceremoniously in that case.

Restart Strategies

Supervisors are initialised with a RestartStrategy, which describes how the supervisor should respond to a child that exits and should be restarted (see below for the rules governing child restart eligibility). Each restart strategy comprises a RestartMode and RestartLimit, which govern how the restart should be handled, and the point at which the supervisor should give up and terminate itself respectively.

With the exception of the RestartOne strategy, which indicates that the supervisor will restart only the one individual failing child, each strategy describes a way to select the set of children that should be restarted if any child fails. The RestartAll strategy, as its name suggests, selects all children, whilst the RestartLeft and RestartRight strategies select all children to the left or right of the failed child, in insertion (i.e., startup) order.

Note that a branch restart will only occur if the child that exited is meant to be restarted. Since Temporary children are never restarted and Transient children are not restarted if they exit normally, in both these circumstances we leave the remaining supervised children alone. Otherwise, the failing child is always included in the branch to be restarted.

For a hypothetical set of children a through d, the following pseudocode demonstrates how the restart strategies work.

let children = [a..d]
let failure = c
restartsFor RestartOne   children failure = [c]
restartsFor RestartAll   children failure = [a,b,c,d]
restartsFor RestartLeft  children failure = [a,b,c]
restartsFor RestartRight children failure = [c,d]
Branch Restarts

We refer to a restart (strategy) that involves a set of children as a branch restart from now on. The behaviour of branch restarts can be further refined by the RestartMode with which a RestartStrategy is parameterised. The RestartEach mode treats each child sequentially, first stopping the respective child process and then restarting it. Each child is stopped and started fully before moving on to the next, as the following imaginary example demonstrates for children [a,b,c]:

stop  a
start a
stop  b
start b
stop  c
start c

By contrast, RestartInOrder will first run through the selected list of children, stopping them. Then, once all the children have been stopped, it will make a second pass, to handle (re)starting them. No child is started until all children have been stopped, as the following imaginary example demonstrates:

stop  a
stop  b
stop  c
start a
start b
start c

Both the previous examples have shown children being stopped and started from left to right, but that is up to the user. The RestartMode data type's constructors take a RestartOrder, which determines whether the selected children will be processed from LeftToRight or RightToLeft.

Sometimes it is desireable to stop children in one order and start them in the opposite. This is typically the case when children are in some way dependent on one another, such that restarting them in the wrong order might cause the system to misbehave. For this scenarios, there is another RestartMode that will shut children down in the given order, but then restarts them in the reverse. Using RestartRevOrder mode, if we have children [a,b,c] such that b depends on a and c on b, we can stop them in the reverse of their startup order, but restart them the other way around like so:

RestartRevOrder RightToLeft

The effect will be thus:

stop  c
stop  b
stop  a
start a
start b
start c
Restart Intensity Limits

If a child process repeatedly crashes during (or shortly after) starting, it is possible for the supervisor to get stuck in an endless loop of restarts. In order prevent this, each restart strategy is parameterised with a RestartLimit that caps the number of restarts allowed within a specific time period. If the supervisor exceeds this limit, it will stop, terminating all its children (in left-to-right order) and exit with the reason ExitOther ReachedMaxRestartIntensity.

The MaxRestarts type is a positive integer, and together with a specified TimeInterval forms the RestartLimit to which the supervisor will adhere. Since a great many children can be restarted in close succession when a branch restart occurs (as a result of RestartAll, RestartLeft or RestartRight being triggered), the supervisor will track the operation as a single restart attempt, since otherwise it would likely exceed its maximum restart intensity too quickly.

Child Restart and Termination Policies

When the supervisor detects that a child has died, the RestartPolicy configured in the child specification is used to determin what to do. If the this is set to Permanent, then the child is always restarted. If it is Temporary, then the child is never restarted and the child specification is removed from the supervisor. A Transient child will be restarted only if it terminates abnormally, otherwise it is left inactive (but its specification is left in place). Finally, an Intrinsic child is treated like a Transient one, except that if this kind of child exits normally, then the supervisor will also exit normally.

When the supervisor does terminate a child, the ChildTerminationPolicy provided with the ChildSpec determines how the supervisor should go about doing so. If this is TerminateImmediately, then the child will be killed without further notice, which means the child will not have an opportunity to clean up any internal state and/or release any held resources. If the policy is TerminateTimeout delay however, the child will be sent an exit signal instead, i.e., the supervisor will cause the child to exit via exit childPid ExitShutdown, and then will wait until the given delay for the child to exit normally. If this does not happen within the given delay, the supervisor will revert to the more aggressive TerminateImmediately policy and try again. Any errors that occur during a timed-out shutdown will be logged, however exit reasons resulting from TerminateImmediately are ignored.

Creating Child Specs

The ToChildStart typeclass simplifies the process of defining a ChildStart providing three default instances from which a ChildStart datum can be generated. The first, takes a Closure (Process ()), where the enclosed action (in the Process monad) is the actual (long running) code that we wish to supervise. In the case of a managed process, this is usually the server loop, constructed by evaluating some variant of ManagedProcess.serve.

The other two instances provide a means for starting children without having to provide a Closure. Both instances wrap the supplied Process action in some necessary boilerplate code, which handles spawning a new process and communicating its ProcessId to the supervisor. The instance for Addressable a => SupervisorPid -> Process a is special however, since this API is intended for uses where the typical interactions with a process take place via an opaque handle, for which an instance of the Addressable typeclass is provided. This latter approach requires the expression which is responsible for yielding the Addressable handle to handling linking the target process with the supervisor, since we have delegated responsibility for spawning the new process and cannot perform the link oepration ourselves.

Supervision Trees & Supervisor Termination

To create a supervision tree, one simply adds supervisors below one another as children, setting the childType field of their ChildSpec to Supervisor instead of Worker. Supervision tree can be arbitrarilly deep, and it is for this reason that we recommend giving a Supervisor child an arbitrary length of time to stop, by setting the delay to Infinity or a very large TimeInterval.

Synopsis

Defining and Running a Supervisor

data ChildSpec Source

Specification for a child process. The child must be uniquely identified by it's childKey within the supervisor. The supervisor will start the child itself, therefore childRun should contain the child process' implementation e.g., if the child is a long running server, this would be the server loop, as with e.g., ManagedProces.start.

type ChildKey = String Source

Identifies a child process by name.

data ChildType Source

Specifies whether the child is another supervisor, or a worker.

Constructors

Worker 
Supervisor 

data RestartPolicy Source

Describes when a terminated child process should be restarted.

Constructors

Permanent

a permanent child will always be restarted

Temporary

a temporary child will never be restarted

Transient

A transient child will be restarted only if it terminates abnormally

Intrinsic

as Transient, but if the child exits normally, the supervisor also exits normally

data ChildRef Source

A reference to a (possibly running) child.

Constructors

ChildRunning !ChildPid

a reference to the (currently running) child

ChildRunningExtra !ChildPid !Message

also a currently running child, with extra child info

ChildRestarting !ChildPid

a reference to the old (previous) child (now restarting)

ChildStopped

indicates the child is not currently running

ChildStartIgnored

a non-temporary child exited with ChildInitIgnore

type StaticLabel = String Source

Static labels (in the remote table) are strings.

start :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process SupervisorPid Source

Start a supervisor (process), running the supplied children and restart strategy.

start = spawnLocal . run

run :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process () Source

Run the supplied children using the provided restart strategy.

Limits and Defaults

maxRestarts :: Int -> MaxRestarts Source

Smart constructor for MaxRestarts. The maximum restart count must be a positive integer.

data RestartLimit Source

A compulsary limit on the number of restarts that a supervisor will tolerate before it terminates all child processes and then itself. If > MaxRestarts occur within the specified TimeInterval, termination will occur. This prevents the supervisor from entering an infinite loop of child process terminations and restarts.

Constructors

RestartLimit 

data RestartMode Source

Constructors

RestartEach

stop then start each child sequentially, i.e., foldlM stopThenStart children

Fields

order :: !RestartOrder
 
RestartInOrder

stop all children first, then restart them sequentially

Fields

order :: !RestartOrder
 
RestartRevOrder

stop all children in the given order, but start them in reverse

Fields

order :: !RestartOrder
 

data RestartStrategy Source

Strategy used by a supervisor to handle child restarts, whether due to unexpected child failure or explicit restart requests from a client.

Some terminology: We refer to child processes managed by the same supervisor as siblings. When restarting a child process, the RestartNone policy indicates that sibling processes should be left alone, whilst the RestartAll policy will cause all children to be restarted (in the same order they were started).

The other two restart strategies refer to prior and subsequent siblings, which describe's those children's configured position (i.e., insertion order). These latter modes allow one to control the order in which siblings are restarted, and to exclude some siblings from the restart without having to resort to grouping them using a child supervisor.

Constructors

RestartOne

restart only the failed child process

RestartAll

also restart all siblings

RestartLeft

restart prior siblings (i.e., prior start order)

RestartRight

restart subsequent siblings (i.e., subsequent start order)

restartOne :: RestartStrategy Source

Provides a default RestartStrategy for RestartOne. > restartOne = RestartOne defaultLimits

restartAll :: RestartStrategy Source

Provides a default RestartStrategy for RestartAll. > restartOne = RestartAll defaultLimits (RestartEach LeftToRight)

restartLeft :: RestartStrategy Source

Provides a default RestartStrategy for RestartLeft. > restartOne = RestartLeft defaultLimits (RestartEach LeftToRight)

restartRight :: RestartStrategy Source

Provides a default RestartStrategy for RestartRight. > restartOne = RestartRight defaultLimits (RestartEach LeftToRight)

Adding and Removing Children

addChild :: Addressable a => a -> ChildSpec -> Process AddChildResult Source

Add a new child.

startChild :: Addressable a => a -> ChildKey -> Process StartChildResult Source

Start an existing (configured) child. The ChildSpec must already be present (see addChild), otherwise the operation will fail.

startNewChild :: Addressable a => a -> ChildSpec -> Process AddChildResult Source

Atomically add and start a new child spec. Will fail if a child with the given key is already present.

terminateChild :: Addressable a => a -> ChildKey -> Process TerminateChildResult Source

Terminate a running child.

deleteChild :: Addressable a => a -> ChildKey -> Process DeleteChildResult Source

Delete a supervised child. The child must already be stopped (see terminateChild).

data DeleteChildResult Source

The result of a call to removeChild.

Constructors

ChildDeleted

the child specification was successfully removed

ChildNotFound

the child specification was not found

ChildNotStopped !ChildRef

the child was not removed, as it was not stopped.

restartChild :: Addressable a => a -> ChildKey -> Process RestartChildResult Source

Forcibly restart a running child.

Normative Shutdown

shutdown :: Resolvable a => a -> Process () Source

Gracefully terminate a running supervisor. Returns immediately if the address cannot be resolved.

shutdownAndWait :: Resolvable a => a -> Process () Source

As shutdown, but waits until the supervisor process has exited, at which point the caller can be sure that all children have also stopped. Returns immediately if the address cannot be resolved.

Queries and Statistics

lookupChild :: Addressable a => a -> ChildKey -> Process (Maybe (ChildRef, ChildSpec)) Source

Lookup a possibly supervised child, given its ChildKey.

listChildren :: Addressable a => a -> Process [Child] Source

List all know (i.e., configured) children.

statistics :: Addressable a => a -> Process SupervisorStats Source

Obtain statistics about a running supervisor.

Additional (Misc) Types

data StartFailure Source

Provides failure information when (re-)start failure is indicated.

Constructors

StartFailureDuplicateChild !ChildRef

a child with this ChildKey already exists

StartFailureAlreadyRunning !ChildRef

the child is already up and running

StartFailureBadClosure !StaticLabel

a closure cannot be resolved

StartFailureDied !DiedReason

a child died (almost) immediately on starting