This library provides definitions and algorithms for various graphical models
such as mixture models, kalman Filters, and restricted Boltzmann machines, as
well as algorithms for fitting them e.g. expectation maximization and
contrastive divergence minimization. Underlying all of these models is is a
generalized linear object known as a Harmonium, and in the following I will
briefly introduce them.
The core definition of this library is a Manifold
of joint distributions I
call of an AffineHarmonium
newtype AffineHarmonium f y x z w = AffineHarmonium (Affine f y z x, w)
which is a product Manifold
composed of a of a Manifold
of likelihood
functions Affine f y z x
, and a Manifold
of distributions w
that partially
define the space of priors. AffineHarmoniums
provide a bit more flexibility
than what I call a Harmonium
type Harmonium f z w = AffineHarmonium f z w z w
which is a simpler object. Nevertheless, from a theoretical point of view, an
AffineHarmonium
is a special case of a Harmonium
, and we may think of them
more or less equivalently.
A Harmonium
is a model over a observable variables and latent variables, and represents a sort of generalized linear joint distribution over the two of them. The theory for Harmonium
s is summarized well by this paper and this paper. Although Harmonium
s might seem like a little-studied, and esoteric object, various well known models, such as mixture models and restricted Boltzmann machines, are in fact Harmonium
s, and various other models, such as factor analysis, Kalman filters, or hidden Markov models, can be expressed in terms of them.
All of the aforementioned models can be fit with Expectation-Maximization (EM), and EM can be expressed in an entirely general manner for Harmonium
s. Firstly, the expectation step of a Harmonium
is implemented by
expectationStep
:: ( ExponentialFamily z, Map Natural f x y, Bilinear f y x
, Translation z y, Translation w x, LegendreExponentialFamily w )
=> Sample z -- ^ Model Samples
-> Natural # AffineHarmonium f y x z w -- ^ Harmonium
-> Mean # AffineHarmonium f y x z w -- ^ Harmonium expected sufficient statistics
expectationStep zs hrm =
let mzs = sufficientStatistic <$> zs
mys = anchor <$> mzs
pstr = fst . split $ transposeHarmonium hrm
mws = transition <$> pstr >$> mys
mxs = anchor <$> mws
myx = (>$<) mys mxs
in joinHarmonium (average mzs) myx $ average mws
In summary, what we do is
- take some observations,
- compute their
sufficientStatistics
,
- map these statistics into the predictions of the latent variables,
- transition these latent predictions from
Natural
coordinates to Mean
coordinates,
- and assemble the results into the
Mean
sufficientStatistics
of the joint distribution.
The maximization step then consists simply of mapping the whole joint distribution from Mean
back to Natural
coordinates, such that expectationMaximization
may be expressed as
expectationMaximization
:: ( DuallyFlatExponentialFamily (AffineHarmonium f y x z w)
, ExponentialFamily z, Map Natural f x y, Bilinear f y x
, Translation z y, Translation w x, LegendreExponentialFamily w )
=> Sample z
-> Natural # AffineHarmonium f y x z w
-> Natural # AffineHarmonium f y x z w
expectationMaximization zs hrm = transition $ expectationStep zs hrm
As such, for a wide variety of models, we may reduce implementing expectation maximization to instantating the class requirements of the expectationMaximization
function. This is rarely trivial, but in some sense, much more straight-forward and well-defined that deriving EM algorithms from scratch.
For in-depth tutorials visit my
blog.