NLP.Probability.ConditionalDistribution

Contents

Conditional Distributions

Synopsis

type CondObserved event context = SmoothTrie (SubMap context) (Sub context) (Counts event)

type CondDistribution event context = context -> Distribution event

condObservation :: (Context context, Event event) => event -> context -> CondObserved event context

condObservations :: (Context context, Event event) => event -> context -> Count -> CondObserved event context

condObservationCounts :: (Context context, Event event) => context -> Counts event -> CondObserved event context

class Map (SubMap a) (Sub a) => Context a where

type Sub a

type SubMap a :: * -> * -> *

decompose :: a -> [Sub a]

estimateGeneralLinear :: (Event event, Context context) => Weighting -> CondObserved event context -> DebugDist event context

type Weighting = forall a. [Maybe (Observed a)] -> [Double]

wittenBell :: Int -> Weighting

simpleLinear :: [Double] -> Weighting

type DebugDist event context = context -> event -> [(Double, Double)]

mkDist :: DebugDist event context -> CondDistribution event context

Conditional Distributions

Say we want to estimate a conditional distribution based on a very large set of observed data. Naively, we could just collect all the data and estimate a large table, but our table would have little or no counts for a feasible future observations.

In practice, we use smoothing to supplement rare contexts with data from similar, more often seen contexts. For instance, using bigram probabilities when the given trigrams observations are too sparse. Most of these smoothing techniques are special cases of general linear interpolation, which chooses the weight of each level of smoothing based on the sparsity of the current context.

In this module, we give an implementation of this process that separates out count collection from the smoothing model, using a Trie. The user specifies a Context instance that relates the full conditional context to a sequences of SubContexts that characterize the levels of smoothing and the transitions in the Trie. We also give a small set of smoothing techniques to combine these levels.

This work is based on Chapter 6 of ''Foundations of Statistical Natural Language Processing'' by Chris Manning and Hinrich Schutze.

type CondObserved event context = SmoothTrie (SubMap context) (Sub context) (Counts event)