maxent-learner-hw-0.2.0: Hayes and Wilson's maxent learning algorithm for phonotactic grammars.

Copyright© 2016-2017 George Steel and Peter Jurgec
LicenseGPL-2+
Maintainergeorge.steel@gmail.com
Safe HaskellNone
LanguageHaskell2010

Text.PhonotacticLearner.DFST

Contents

Description

Implementations of deterministic finite state transducers containing both a polymorphic DFST functor and a typeclass for fast specialized types (with several implementations provided). Input alphabets are assumed to be finite rectangles inside Ix types. Trandsuction functions are provided for both Monoid and Semiring output types. DFSTs may be created directly, generated from globs, or from smaller DFSTs using the product construction.

Optimized C functions for common output types (Sum Int, Multicount, Expectation Vec, and Expectation Double) are included which use the PackedDFA typeclass to convert to and from the generic type. The specialized types additionally support the following operations.

  • Packing strings into a compact format (PackedText)
  • Transducing packed strings into integer counts and Multicounts
  • Summing over all paths of expectation transducers
  • Applying maxent weights to vector counts to get expectations

Synopsis

Documentation

fnArray :: (Ix i, IArray a e) => (i, i) -> (i -> e) -> a i e Source #

Create an array by caching a function over a rectangle. Depending on the array type used, this can be used to memoise or precompute.

xbd :: (a, a) -> (b, b) -> ((a, b), (a, b)) Source #

Turn a pair of interval tuples into an interval of pairs. Used to compute array bounds for a cartesian product.

Polymorphic DFSTs

data DFST q sigma k Source #

Polymorphic type for deterministic finite state transducers. For an efficient implementation, the set of states and input characters are both limited to be Ix rectangles and their product is the array bounds. This type is a functor over its output type (note that transduction is only possible into a Monoid or Semiring).

Constructors

DFST 

Fields

Instances

(Ix q, Ix sigma) => Functor (DFST q sigma) Source # 

Methods

fmap :: (a -> b) -> DFST q sigma a -> DFST q sigma b #

(<$) :: a -> DFST q sigma b -> DFST q sigma a #

(Show k, Show sigma, Show q, Ix sigma, Ix q) => Show (DFST q sigma k) Source # 

Methods

showsPrec :: Int -> DFST q sigma k -> ShowS #

show :: DFST q sigma k -> String #

showList :: [DFST q sigma k] -> ShowS #

(NFData q, NFData sigma, NFData k) => NFData (DFST q sigma k) Source # 

Methods

rnf :: DFST q sigma k -> () #

stateBounds :: (Ix q, Ix sigma) => DFST q sigma w -> (q, q) Source #

bounds for state labels

segBounds :: (Ix q, Ix sigma) => DFST q sigma k -> (sigma, sigma) Source #

boounds for accepted segments (characters)

transition :: (Ix q, Ix sigma) => DFST q sigma k -> q -> sigma -> (q, k) Source #

advance by one state and get weight output

transduceM :: (Ix q, Ix sigma, Monoid k) => DFST q sigma k -> [sigma] -> k Source #

Transduce a string of segments where and output the product of the weights (as a Monoid).

transduceR :: (Ix q, Ix sigma, Semiring k) => DFST q sigma k -> [sigma] -> k Source #

Transduce a string of segments where and output the product of the weights (as a Ring).

Specialized DFSTs

class PackedDFA pd k | pd -> k where Source #

Typeclass for converting speaiclized DFSTs to and from polymorphic ones. This is used by several optimized versions for various output types that can be manupulated by fast C functions.

Minimal complete definition

numStates, psegBounds, unpackDFA, packDFA

Methods

numStates :: Ix sigma => pd sigma -> Int Source #

Number fo states in DFA

psegBounds :: Ix sigma => pd sigma -> (sigma, sigma) Source #

Bounds of input rectangle

unpackDFA :: Ix sigma => pd sigma -> DFST Int sigma k Source #

umpack

packDFA :: forall sigma. Ix sigma => Int16 -> Int16 -> (sigma, sigma) -> ((Int16, sigma) -> Int16) -> ((Int16, sigma) -> k) -> (Int16 -> k) -> pd sigma Source #

Pack a DFA from its component functions. Use pruneAndPack to convert a polymorphic DFST.

Instances

PackedDFA MulticountDFST Multicount Source # 

Methods

numStates :: Ix sigma => MulticountDFST sigma -> Int Source #

psegBounds :: Ix sigma => MulticountDFST sigma -> (sigma, sigma) Source #

unpackDFA :: Ix sigma => MulticountDFST sigma -> DFST Int sigma Multicount Source #

packDFA :: Ix sigma => Int16 -> Int16 -> (sigma, sigma) -> ((Int16, sigma) -> Int16) -> ((Int16, sigma) -> Multicount) -> (Int16 -> Multicount) -> MulticountDFST sigma Source #

PackedDFA ExpDoubleDFST (Expectation Double) Source # 

Methods

numStates :: Ix sigma => ExpDoubleDFST sigma -> Int Source #

psegBounds :: Ix sigma => ExpDoubleDFST sigma -> (sigma, sigma) Source #

unpackDFA :: Ix sigma => ExpDoubleDFST sigma -> DFST Int sigma (Expectation Double) Source #

packDFA :: Ix sigma => Int16 -> Int16 -> (sigma, sigma) -> ((Int16, sigma) -> Int16) -> ((Int16, sigma) -> Expectation Double) -> (Int16 -> Expectation Double) -> ExpDoubleDFST sigma Source #

PackedDFA ExpVecDFST (Expectation Vec) Source # 

Methods

numStates :: Ix sigma => ExpVecDFST sigma -> Int Source #

psegBounds :: Ix sigma => ExpVecDFST sigma -> (sigma, sigma) Source #

unpackDFA :: Ix sigma => ExpVecDFST sigma -> DFST Int sigma (Expectation Vec) Source #

packDFA :: Ix sigma => Int16 -> Int16 -> (sigma, sigma) -> ((Int16, sigma) -> Int16) -> ((Int16, sigma) -> Expectation Vec) -> (Int16 -> Expectation Vec) -> ExpVecDFST sigma Source #

PackedDFA ShortDFST (Sum Int) Source # 

Methods

numStates :: Ix sigma => ShortDFST sigma -> Int Source #

psegBounds :: Ix sigma => ShortDFST sigma -> (sigma, sigma) Source #

unpackDFA :: Ix sigma => ShortDFST sigma -> DFST Int sigma (Sum Int) Source #

packDFA :: Ix sigma => Int16 -> Int16 -> (sigma, sigma) -> ((Int16, sigma) -> Int16) -> ((Int16, sigma) -> Sum Int) -> (Int16 -> Sum Int) -> ShortDFST sigma Source #

pruneUnreachable :: forall q sigma k. (Ix q, Ix sigma) => DFST q sigma k -> DFST Int sigma k Source #

Remove unreachable states and renumber as integers (starting from 1) using a mark-sweep algorithm. Result is polymorphic.

pruneAndPack :: forall q sigma pd k. (Ix q, Ix sigma, PackedDFA pd k) => DFST q sigma k -> pd sigma Source #

Prune unreachable states and pack into a specialized implementation

rawIntersection :: (Ix q1, Ix q2, Ix sigma) => (k1 -> k2 -> k3) -> DFST q1 sigma k1 -> DFST q2 sigma k2 -> DFST (q1, q2) sigma k3 Source #

Lifts a weight combining function into one that combines DFAs. Performs no pruning and new states are all pairs of old ones. For boolean weights, use (&&) for intersection and (||) for union.

dfaProduct :: (Ix l1, Ix l2, Ix sigma) => (w1 -> w2 -> w3) -> DFST l1 sigma w1 -> DFST l2 sigma w2 -> DFST Int sigma w3 Source #

Product construction with pruning.

nildfa :: (Ix sigma, Monoid k) => (sigma, sigma) -> DFST Int sigma k Source #

Given input bounds, Construct a DFST which always returns mempty for any string.

data PackedText sigma Source #

Structure holding text and word frequencies as a flat array of segment indices (in the input rectangle) for fast transduction.

packSingleText :: Ix sigma => (sigma, sigma) -> [sigma] -> PackedText sigma Source #

Pack a single string

packMultiText :: Ix sigma => (sigma, sigma) -> [([sigma], Int)] -> PackedText sigma Source #

Pack a list of string, fewquency pairs

data ShortDFST sigma Source #

Optimized DFST specialized to transduce integers

Instances

PackedDFA ShortDFST (Sum Int) Source # 

Methods

numStates :: Ix sigma => ShortDFST sigma -> Int Source #

psegBounds :: Ix sigma => ShortDFST sigma -> (sigma, sigma) Source #

unpackDFA :: Ix sigma => ShortDFST sigma -> DFST Int sigma (Sum Int) Source #

packDFA :: Ix sigma => Int16 -> Int16 -> (sigma, sigma) -> ((Int16, sigma) -> Int16) -> ((Int16, sigma) -> Sum Int) -> (Int16 -> Sum Int) -> ShortDFST sigma Source #

Show sigma => Show (ShortDFST sigma) Source # 

Methods

showsPrec :: Int -> ShortDFST sigma -> ShowS #

show :: ShortDFST sigma -> String #

showList :: [ShortDFST sigma] -> ShowS #

NFData sigma => NFData (ShortDFST sigma) Source # 

Methods

rnf :: ShortDFST sigma -> () #

transducePackedShort :: Ix sigma => ShortDFST sigma -> PackedText sigma -> Int Source #

Fast transduction of integers. For multiple words, returns the sum of all transductions.

data MulticountDFST sigma Source #

Optimized DFST specialized to transduce into Multicount mnd count multiple quantities in parallel.

Instances

PackedDFA MulticountDFST Multicount Source # 

Methods

numStates :: Ix sigma => MulticountDFST sigma -> Int Source #

psegBounds :: Ix sigma => MulticountDFST sigma -> (sigma, sigma) Source #

unpackDFA :: Ix sigma => MulticountDFST sigma -> DFST Int sigma Multicount Source #

packDFA :: Ix sigma => Int16 -> Int16 -> (sigma, sigma) -> ((Int16, sigma) -> Int16) -> ((Int16, sigma) -> Multicount) -> (Int16 -> Multicount) -> MulticountDFST sigma Source #

Show sigma => Show (MulticountDFST sigma) Source # 

Methods

showsPrec :: Int -> MulticountDFST sigma -> ShowS #

show :: MulticountDFST sigma -> String #

showList :: [MulticountDFST sigma] -> ShowS #

NFData sigma => NFData (MulticountDFST sigma) Source # 

Methods

rnf :: MulticountDFST sigma -> () #

transducePackedMulti :: Ix sigma => MulticountDFST sigma -> PackedText sigma -> Multicount Source #

Fast transduction of Multicount. For multiple words, returns the sum of all transductions.

data ExpVecDFST sigma Source #

Optimized DFST form calculating verctor expectations over the entire probability distribution defined by the DFA.

Instances

PackedDFA ExpVecDFST (Expectation Vec) Source # 

Methods

numStates :: Ix sigma => ExpVecDFST sigma -> Int Source #

psegBounds :: Ix sigma => ExpVecDFST sigma -> (sigma, sigma) Source #

unpackDFA :: Ix sigma => ExpVecDFST sigma -> DFST Int sigma (Expectation Vec) Source #

packDFA :: Ix sigma => Int16 -> Int16 -> (sigma, sigma) -> ((Int16, sigma) -> Int16) -> ((Int16, sigma) -> Expectation Vec) -> (Int16 -> Expectation Vec) -> ExpVecDFST sigma Source #

Show sigma => Show (ExpVecDFST sigma) Source # 

Methods

showsPrec :: Int -> ExpVecDFST sigma -> ShowS #

show :: ExpVecDFST sigma -> String #

showList :: [ExpVecDFST sigma] -> ShowS #

weightExpVec :: Ix sigma => MulticountDFST sigma -> Vec -> ExpVecDFST sigma Source #

Assign maxent weights to the counts in a Multicount to get expectations (which include probabilities).

expsByLengthVec :: Ix sigma => ExpVecDFST sigma -> Int -> Array Int (Expectation Vec) Source #

Get the total expectations over each length of string up to a maximum

data ExpDoubleDFST sigma Source #

Optimized DFST form calculating scalar expectations over the entire probability distribution defined by the DFA.

Instances

PackedDFA ExpDoubleDFST (Expectation Double) Source # 

Methods

numStates :: Ix sigma => ExpDoubleDFST sigma -> Int Source #

psegBounds :: Ix sigma => ExpDoubleDFST sigma -> (sigma, sigma) Source #

unpackDFA :: Ix sigma => ExpDoubleDFST sigma -> DFST Int sigma (Expectation Double) Source #

packDFA :: Ix sigma => Int16 -> Int16 -> (sigma, sigma) -> ((Int16, sigma) -> Int16) -> ((Int16, sigma) -> Expectation Double) -> (Int16 -> Expectation Double) -> ExpDoubleDFST sigma Source #

Show sigma => Show (ExpDoubleDFST sigma) Source # 

Methods

showsPrec :: Int -> ExpDoubleDFST sigma -> ShowS #

show :: ExpDoubleDFST sigma -> String #

showList :: [ExpDoubleDFST sigma] -> ShowS #

weightExpPartial :: Ix sigma => MulticountDFST sigma -> Vec -> Vec -> ExpDoubleDFST sigma Source #

Assign maxent weights to the counts in a Multicount to and apply a covector to the resulting expectations.

expsByLengthDouble :: Ix sigma => ExpDoubleDFST sigma -> Int -> Array Int (Expectation Double) Source #

Get the total expectations over each length of string up to a maximum

Glob recognition

type SegSet sigma = UArray sigma Bool Source #

Fast reperesentation of a set of segments by its characteristic function over an enclosing rectangle of segments.

data ListGlob sigma Source #

Glob of segment lists, nore generalized version of ngrams allowing for repeated classes as well as single ones. The two boolean parameters restrict the glob to match a prefixes or suffixes only.

Constructors

ListGlob !Bool !Bool [(GlobReps, SegSet sigma)] 

Instances

Ix sigma => Eq (ListGlob sigma) Source # 

Methods

(==) :: ListGlob sigma -> ListGlob sigma -> Bool #

(/=) :: ListGlob sigma -> ListGlob sigma -> Bool #

Ix sigma => Ord (ListGlob sigma) Source # 

Methods

compare :: ListGlob sigma -> ListGlob sigma -> Ordering #

(<) :: ListGlob sigma -> ListGlob sigma -> Bool #

(<=) :: ListGlob sigma -> ListGlob sigma -> Bool #

(>) :: ListGlob sigma -> ListGlob sigma -> Bool #

(>=) :: ListGlob sigma -> ListGlob sigma -> Bool #

max :: ListGlob sigma -> ListGlob sigma -> ListGlob sigma #

min :: ListGlob sigma -> ListGlob sigma -> ListGlob sigma #

Show (ListGlob Char) Source #

Globs are displayed in regex format

(NFData sigma, Ix sigma) => NFData (ListGlob sigma) Source # 

Methods

rnf :: ListGlob sigma -> () #

matchCounter :: forall sigma. Ix sigma => ListGlob sigma -> ShortDFST sigma Source #

Create a DFST countign the violations of a ListGlob. Each SegSet in the glob must have the same bounds and the glob must not be empty.

Orphan instances

(IArray UArray e, NFData i, Ix i) => NFData (UArray i e) Source # 

Methods

rnf :: UArray i e -> () #