hsgsom-0.2.0: An implementation of the GSOM clustering algorithm.

Portabilitynon-portable (requires STM)
Stabilityexperimental
Maintainergnn.github@gmail.com

Data.Datamining.Clustering.Gsom.Phase

Description

The GSOM Algorithm can be split up in multiple sequentially executed Phases. Each of these phases makes a certain number of passes over the inputs. While doing so each phase modifies a given Lattice according to a certain set of specified parameters. This module contains the definition of the Phase type, a few default instances and the functions needed to run a single phase or to run a sequence of Phases.

Synopsis

Documentation

data Phase Source

This datatype encapsulates all the parameters needed to be known to run one phase of the GSOM algorithm.

Constructors

Phase 

Fields

passes :: Int

The number of passes which are to be made over the input vectors. Since each step of the phase consumes one input vector, the overall number of steps the phase will have will be:

  • steps = passes * length inputs
neighbourhoodSize :: Int

The initial size of the neighbourhood affected by weight adaption. This will decrease linearly while the phase is executed.

learningRate :: LearningRate

The function used to calculate the learning rate in each of the phase's steps. During each step learninRate currentStep maxSteps is fed the number (starting from zero) of the current step as the first argument and the total number of steps the phase will have as the second argument to calculate the learning rate which will be in effect for this phase.

kernel :: Kernel

The kernel function. It is used in conjunction with the learning rate to adjust weight adaption. kernel currentNeighbourhoodsize gridDistance should take the neighbourhood size which is in effect during the current step and a nodes grid distance from the winning node. The neighbourhood size will be a real number due to the linear decrease.

grow :: Bool

The grow flag determines whether this is a growing phase or not. If this is False then no new nodes will be grown.

spreadFactor :: Double

The spread factor is used to calculate the growth threshold according to the formula:

where d is the input dimension.

Instances

The three default phases of the GSOM algorithm. They all use the bubble kernel and a linear learning rate decrease.

defaultFirst :: PhaseSource

The default first phase is the only growing phase. It makes 5 passes over the input, uses an initial learning rate of 0.1 and a starting neighbourhood size of 3. The spreadFactor is set to 0.1.

defaultSecond :: PhaseSource

The default for the second phase is a smoothing phase making 50 passes over the input vectors with a learning rate of 0.05 and an initial neighbourhood size of 2. Since there is no node growth the spreadFactor is ignored and thus set to 0.

defaultThird :: PhaseSource

The default for the third and last phase is a smoothing phase making 50 passes over the input vectors with a learning rate of 0.01 and an initial neighbourhood size of 1. Since there is no node growth the spreadFactor is ignored and thus set to 0.

defaults :: PhasesSource

This is the list of the three default phases of the GSOM algorithm.

growthThreshold :: Phase -> Int -> DoubleSource

Calculates the growth threshold as explained in the documentation for Phase.

phase :: Phase -> Lattice -> Inputs -> IO LatticeSource

phase parameters inputs will update the given lattice by executing one phase of the GSOM algorithm with the given inputs and parameters.

run :: Phases -> Lattice -> Inputs -> IO LatticeSource

Since a complete run of the GSOM algorithm means running a number of Phases this is usually the main function used. run phases lattice inputs runs the GSOM algorithm by running the phases in the order specified, each time making passes over inputs and using the produced Lattice to as an argument to the next phase. The initial Lattice, lattice may be constructed with the newRandom and the newCentered functions.

data Kernel Source

Constructors

Bubble

The bubble kernel is essentially the identity, i.e. it has no effect.

Gaussian

Let s be the neighbourhood size currently in effect and d be the grid-distance of the current node to the winning node then this kernel calculates a factor to control weight adaption with the following formula:

  • gaussian s d = exp(d^2/(2*s^2))

kernelFunction :: Kernel -> Double -> Int -> DoubleSource

Returns the kernel function associated with the given kernel.

data LearningRate Source

Constructors

Linear Double

The linear learning rate reduction function. If you supply it with the initial learning rate lr it uses the following formula where step is the current step the phase is in and steps is the overall number of steps the phase will take:

  • linear lr step steps = lr * (1-step/steps)
InverseAge Double

The inverse time learning rate reduction function. Given an initial learning rate of lr, a maximum number of steps of steps and the current step number beeing step, the formula is:

  • inverseAge lr step steps = lr * steps / (steps + 100 * step)

adaption :: LearningRate -> Int -> Int -> DoubleSource

Returns the learning rate adaption function associated with the given type of learning rate.