genprog-0.1.0.2: Genetic programming library

Portabilitynon-portable
Stabilityexperimental
MaintainerJan Snajder <jan.snajder@fer.hr>
Safe HaskellNone

GenProg

Contents

Description

The Genetic Programming Library.

Genetic programming is an evolutionary optimization technique inspired by biological evolution. It is similar to genetic algorithms except that the individual solutions are programs (or, more generally, expressions) representing a solution to a given problem. A genetic program is represented as an abstract syntax tree and associated with a custom-defined fitness value indicating the quality of the solution. Starting from a randomly generated initial population of genetic programs, the genetic operators of selection, crossover, and (occasionally) mutation are used to evolve programs of increasingly better quality.

Standard reference is: John Koza. Genetic programming: On the Programming of Computers by Means of Natural Selection. MIT Press, 1992.

In GenProg, a genetic program is represented by a value of an algebraic datatype. To use a datatype as a genetic program, it suffices to define it as an instance of the GenProg typeclass. A custom datatype can be made an instance of the GenProg typeclass, provided it is an instance of the Data typeclass (see GenProg.GenExpr.Data).

An example of how to use this library is given below.

Synopsis

Genetic programs

class (Eq e, GenExpr e, MonadRandom m) => GenProg m e | e -> m whereSource

A typeclass defining a genetic program interface. Datatypes e that are to be used as genetic programs must be instances of the GenExpr typeclass and must implement this interface.

Methods

terminal :: m eSource

Generates a random terminal T.

nonterminal :: m eSource

Generates a random nonterminal (functional) node F(T,...,T) whose arguments are again terminals (this condition is not verified).

Expressions

generateFullExpr :: GenProg m e => Int -> m eSource

Generates a random expression fully expanded to the specified depth.

generateGrownExpr :: GenProg m e => Int -> m eSource

Generates a random expression of limited depth. The maximum depth of the resulting expression may be less than the specified depth limit, and paths may be of different length.

depth :: GenExpr e => e -> IntSource

The depth of an expression. Equals 1 for single-node expressions.

nodes :: GenExpr e => e -> IntSource

Number of nodes an expression has.

Individuals

data Ind e Source

A genetically programmed individual, representing a basic unit of evolution. (Basically a wrapper around a genetically programmable expression.)

Instances

Eq e => Eq (Ind e) 
Eq e => Ord (Ind e) 
Show e => Show (Ind e) 

unInd :: Ind e -> eSource

Returns the expression wrapped by an individual.

mkInd :: GenProg m e => Fitness e -> e -> Ind eSource

Wraps an expression into an individual.

aFitness :: Ind e -> DoubleSource

Adjusted fitness of an individual. Adjusted fitness equals 1/(1+s), where s is the standardized fitness as computed by fitness. To reduce computational costs, this value is computed only once and then cached.

sFitness :: Ind e -> DoubleSource

Standardized fitness of an individual as computed by fitness

Population

data Pop e Source

A population of individuals. (Basically a wrapper around a list of individuals.)

Instances

Eq e => Eq (Pop e) 
Show e => Show (Pop e) 

unPop :: Pop e -> [Ind e]Source

Unwraps a population.

mkPop :: [Ind e] -> Pop eSource

Wraps a list of individuals into a population.

generatePop :: GenProg m e => EvolParams m e -> m (Pop e)Source

Generate population of given size and given depth limit using ramped half-and-half method (Koza, 1992): for each depth value from 0 to the initial depth limit iDepth, 50% of individuals are generated using generateFullExpr and 50% are generated using generateGrownExpr. Afterwards, duplicates are removed, thus the size of the resulting population may actually be less than the specified size.

replenishPop :: GenProg m e => EvolParams m e -> Pop e -> m (Pop e)Source

Replenishes a population up to popSize by randomly generating new individuals.

mergePop :: GenProg m e => EvolParams m e -> Pop e -> Pop e -> Pop eSource

Merges two populations by taking popSize best-fitted individuals from the union of the two populations.

best :: Pop e -> Ind eSource

Population's best-fitted individual.

avgFitness :: Pop e -> DoubleSource

Population's average standardized fitness.

avgDepth :: GenProg m e => Pop e -> DoubleSource

Average depth of expressions in the population.

avgNodes :: GenProg m e => Pop e -> DoubleSource

Average number of expression nodes in the population.

Genetic operators

The following functions are not meant to be used directly. They are exposed for debugging purposes.

crossoverInd :: GenProg m e => EvolParams m e -> Ind e -> Ind e -> m (Ind e, Ind e)Source

Crossover operation of two individuals, resulting in two offsprings. Crossover is performed by choosing at random two nodes in each expressions, and then by exchanging the subexpressions rooted at these nodes between the two individuals. The probability that an internal (functional) node is chosen as crossover point is set by the ciProb parameter in EvolParams, whereas the probability that an external (terminal) node is chosen equals 1-ciProb. Among internal and external nodes, nodes are chosen uniformly at random. If the depth of a created offspring exceeds the depth limit cDepth specified by evolution parameters EvolParams, that offspring is discarded and a parent is reproduced (i.e., copied as-is).

mutateInd :: GenProg m e => EvolParams m e -> Ind e -> m (Ind e)Source

Mutates an individual by applying the mutation function mutate to a randomly selected node. The probability that an internal (functional) node is chosen for muration is set by the miProb parameter in EvolParams, whereas the probability that an external (terminal) node is chosen equals 1-miProb. Among internal and external nodes, nodes are chosen uniformly at random. If the depth of the mutated expression exceeds the depth limit cDepth specified by evolution parameters EvolParams, the individual is left unaltered.

crossoverPop :: GenProg m e => EvolParams m e -> Pop e -> m (Ind e, Ind e)Source

Applies crossover to two randomly chosen individuals from a population. The probability of an individual being chosen as parent is fitness-proportionate (individuals with better fitness have better chanches of being chosen for crossover).

mutatePop :: GenProg m e => EvolParams m e -> Pop e -> m (Pop e)Source

Applies mutation operation to individuals from a population. The probability of mutating each individual is determined by mProb parameter from EvalParams.

Evolution state

data EvolState e Source

The state of the evolution.

Constructors

EvolState 

Fields

pop :: Pop e

Current population.

iter :: Int

Iteration (current generation number).

cachedBest :: Ind e

Best individual evolved so far.

Instances

Eq e => Eq (EvolState e) 
Show e => Show (EvolState e) 

Control parameters

type Fitness e = e -> DoubleSource

Standardized fitness. It takes on values from 0 (best fitness) to +infinity (worst fitness).

type Mutate m e = e -> m eSource

A function to mutate a chosen expression node.

defaultMutation :: GenProg m e => EvolParams m e -> Mutate m eSource

Default mutation. Replaces a node, irrespective of its value, with a randomly generated subexpression whose depth is limited to iDepth.

type Terminate e = EvolState e -> BoolSource

Termination predicate.

tSuccess :: (e -> Bool) -> Terminate eSource

Termination predicate: terminate if any individual satisfies the specified predicate.

tFitness :: GenProg m e => Double -> Terminate eSource

Termination predicate: terminate if best individual's standardized fitness is greater than or equal to the specified value.

tGeneration :: Int -> Terminate eSource

Termination predicate: terminate after running for the specified number of iterations.

data EvolParams m e Source

Parameters governing the evolution.

Default evolution parameters, as used in (Koza, 1992), are defined by defaultEvolParams and indicated below. At least the fitness function fitness should be overriden.

Constructors

EvolParams 

Fields

popSize :: Int

Population size (number of individuals). Default is 500.

iDepth :: Int

Depth of expressions in initial population. Default is 6.

cDepth :: Int

Maximum depth of expressions created during the evolution. Default is 17.

cProb :: Double

Probability of crossover. Default is 0.9. If crossover is not chosen, an individual is simply reproduced (copied as-is) into the next generation.

ciProb :: Double

Probability that an internal (functional) node is chosen as a crossover point. Default is 0.9. If an internal node is not chosen, an external (terminal) node is chosen.

mProb :: Double

Probability that an individual gets mutated. Default is 0 (no mutation).

miProb :: Double

Probability that an internal (functional) node is chosen for mutation. Default is 0.1.

fitness :: Fitness e

Standardized fitness function. Default value is undefined (must be overriden).

mutate :: Mutate m e

Mutation function. Defines how to change a randomly chosen node. Default is defaultMutation defaultEvolParams (replacement of a chosen node with a randomly generated subexpression).

elitists :: Int

Elitist factor: number of best-fitted individuals that are preserved from each generation (reproduced as-is into next evolution state). Default is 0.

terminate :: Terminate e

Termination predicate. Default is 50 (terminate after 50 generations).

Evolution

evolve :: GenProg m e => EvolParams m e -> m (EvolState e)Source

Creates an initial population and evolves it until termination predicate is satisfied, returning the last evolution state.

evolveFrom :: GenProg m e => EvolParams m e -> Pop e -> m (EvolState e)Source

Evolves a given initial population until termination predicate is satisfied, returning the last evolution state. If the size of the initial population is less than popSize, the population will be replenished (see replenishPop).

evolveTrace :: GenProg m e => EvolParams m e -> m [EvolState e]Source

Creates an initial population and runs evolution until termination predicate is satisfied. Returns a list of successive evolution states.

evolveTraceFrom :: GenProg m e => EvolParams m e -> Pop e -> m [EvolState e]Source

Runs evolution on a given initial population until termination predicate is satisfied and returns a list of successive evolution states. If the size of the initial population is less than popSize, the population will be replenished (see replenishPop).

Example

This is a simple, worked through example of how to use the GenProg library. Given a target number n, out aim is to evolve an arithmetic expression that evaluates to n. For example, given 13 as the target number, one possible solution is (3 * 5) - 2. The constants allowed to appear in the expression are restricted to integers from 1 to 9. The allowed operations are +, -, *, and integer division without remainder.

We begin by defining the datatype for the genetically programed expression:

-- The following language extensions need to be enabled:
-- DeriveDataTypeable, FlexibleInstances, MultiParamTypeClasses

import GenProg
import Data.Generics
import Control.Monad
import Control.Monad.Random

data E = Plus E E
       | Minus E E
       | Times E E
       | Div E E
       | Const Int
       deriving (Typeable,Data,Eq,Show)

In order to evolve arithmetic expressions, we need to be able to compute their values. To this end we define

eval :: E -> Maybe Int
eval (Const c)     = Just c
eval (Plus e1 e2)  = liftM2 (+) (eval e1) (eval e2)
eval (Minus e1 e2) = liftM2 (-) (eval e1) (eval e2)
eval (Times e1 e2) = liftM2 (*) (eval e1) (eval e2)
eval (Div e1 e2) | ok        = liftM2 div x1 x2
                 | otherwise = Nothing
  where (x1,x2) = (eval e1,eval e2)
        ok = x2 /= Just 0 && liftM2 mod x1 x2 == Just 0

Dividing by zero and dividing with a remainder are not allowed and in such cases we return Nothing.

Because we have made E an instance of the Data typeclass, it can be readily used as a genetically programmable expression. Next step is to make E an instance of the GenProg typeclass:

instance GenProg (Rand StdGen) E where
  terminal    = Const liftM getRandomR (1,9)
  nonterminal = do
    r <- getRandomR (0,3)
    [liftM2 Plus terminal terminal,
     liftM2 Minus terminal terminal,
     liftM2 Times terminal terminal,
     liftM2 Div terminal terminal] !! r

Thus, a random terminal node contains one of the constants from 1 to 9. A nonterminal node can be one of the four arithmetic operations, each with terminal nodes as arguments. Note that computations are run within the standard random generator monad (Rand StdGen).

The fitness function evaluates the accurateness of the arithmetic expression with respect to the target number. If the value of the expression is far off from the target number n, the standardized fitness should be high. Moreover, we would like to keep the expression as simple as possible. To this end, we include a parsimony factor that is proportional to the number of nodes an expression has. We define the overall standardized fitness as

myFitness :: Int -> E -> Double
myFitness n e = error + size
  where error = realToFrac $ maybe maxBound (abs . (n-)) (eval e)
        size  = (realToFrac $ nodes e) / 100

The number of nodes is divided by a factor of 100 to make it less important than the numeric accuracy of the expression.

We now have everything in place to get the evolution going. We will use default evolution parameters and choose 12345 as the target number:

>>> let params = defaultEvolParams { fitness = myFitness 12345 }

Let us first create a random number generator:

>>> let g = mkStdGen 0

We are doing this because we want our results to be reproducible, and because we want to be able to compare the results of different evolution runs. Normally, you would use getStdGen to get a random generator with random seed.

To run the evolution and get the best evolved individual, we type

>>> let i = cachedBest $ evalRand (evolve params) g

To check out its standardized fitness, we type

>>> sFitness i
39.61

Let us see how the actual expression looks like:

>>> unInd i
Times (Minus (Minus (Minus (Plus (Const 4) (Const 4)) (Plus (Const 6) 
(Const 7))) (Minus (Minus (Const 5) (Const 9)) (Plus (Minus (Const 5) 
(Const 9)) (Minus (Const 4) (Const 4))))) (Plus (Times (Plus (Const 5) 
(Const 1)) (Const 6)) (Times (Plus (Const 9) (Const 3)) (Minus (Const 1) 
(Const 8))))) (Div (Times (Plus (Plus (Const 3) (Const 5)) (Times (Const 4) 
(Const 7))) (Plus (Const 4) (Const 4))) (Minus (Minus (Plus (Const 2) 
(Const 8)) (Plus (Const 6) (Const 7))) (Plus (Minus (Const 5) (Const 9)) 
(Minus (Const 4) (Const 4)))))

The number of nodes is

>>> nodes $ unInd i
61

Let us see to what number the expression evaluates:

>>> eval $ unInd i
Just 12384

So in this run we didn't get a perfect match, but we were close. Let us see if we can do better.

When doing genetic programming, it is always a good idea to experiment a bit with the parameters. There are no parameters that work best for any given problem. You can learn a lot about how parameters influence the evolution by analysing how the evolution progresses in time. This can be accomplised by evolving an evolution trace:

>>> let trace = evalRand (evolveTrace params) g

We can now analyse how the standardized fitness of the best individual improves during the evolution:

>>> map (sFitness . best . pop) trace
[9591.35,2343.59,1935.59,2343.59,903.51,903.45,585.59,585.59,327.45,225.41,
225.41,135.43,57.49,39.61,39.61,39.61,39.61,39.61,57.43,57.47,57.43,57.45,
57.33,57.43,57.43,57.45,57.43,57.43,57.35,57.35,57.43,57.27,57.33,57.33,57.43,
57.29,57.33,57.41,57.29,57.43,57.33,57.35,57.35,57.33,57.39,57.39,57.39,57.33,
57.37,57.37]

We see that at some point the fitness decreases and then increases again. This indicates that the best fitted individual was lost by evolving from one generation to the other. We can prevent this by employing the elitist strategy. Let us see what happens if we preserve a best fitted individual in each generation:

>>> let trace = evalRand (evolveTrace params {elitists = 1}) g
>>> map (sFitness . best . pop) trace
[9591.35,2343.59,711.61,711.61,711.61,711.61,57.55,57.53,57.39,57.39,57.39,
57.39,57.37,57.37,57.37,57.37,57.37,57.37,57.37,57.37,57.35,57.35,57.35,
57.35,57.35,57.35,57.35,57.35,57.35,57.35,57.33,57.33,57.33,57.33,57.33,
57.33,57.33,57.33,57.33,25.31,25.31,25.31,25.31,25.31,25.31,25.296,25.296,
25.296,25.296,25.296]

This gives us better fitness, but still not an exact match:

>>> let i = cachedBest $ last trace
>>> eval $ unInd i
Just 12320

In the previous evolution run fitness converged relatively fast, but then remained stuck. To stir up things a little, let us allow for some mutation. Setting mutation probability to 5%, while retaining the elitist strategy, we get

>>> let trace = evalRand (evolveTrace params {elitists = 1, mProb = 0.05}) g
>>> map (sFitness . best . pop) trace
[9591.35,9591.35,9591.35,9591.35,9591.35,9591.35,9159.35,8403.23,7239.11,
6087.15,6087.15,1479.13,819.21,60.13,51.19,5.19,5.19,5.19,5.19,5.19,1.23,
1.23,1.23,1.23,1.23,1.23,1.21,1.21,1.21,1.21,0.23998,0.23998,0.23998,0.23998,
0.23998,0.23998,0.23998,0.23998,0.23998,0.23998,0.23998,0.23998,0.23998,
0.23998,0.23998,0.23998,0.23998,0.23998,0.23998,0.23998]

This time we've got a perfect match:

>>> let i = cachedBest $ last trace
>>> eval $ unInd i
Just 12345

while at the same time the expression is rather compact:

>>> unInd i
Plus (Times (Const 4) (Plus (Const 9) (Const 4))) (Plus (Plus (Times 
(Plus (Const 4) (Const 3)) (Times (Times (Const 3) (Const 9)) (Times 
(Const 5) (Plus (Const 9) (Const 4))))) (Const 3)) (Const 5))
>>> nodes $ unInd i
23