lda-0.0.2: Online Latent Dirichlet Allocation

Safe HaskellSafe-Infered

NLP.LDA

Contents

Description

Latent Dirichlet Allocation

Simple implementation of a collapsed Gibbs sampler for LDA. This library uses the topic modeling terminology (documents, words, topics), even though it is generic. For example if used for word class induction, replace documents with word types, words with features and topics with word classes.

Synopsis

Running samplers

runSampler :: Word64 -> LDA -> Sampler a -> (a, LDA)Source

runSampler seed m s runs sampler s with seed and initial model m. The random number generator used is System.Random.Mersenne.Pure64.

pass :: Vector Doc -> Sampler (Vector Doc)Source

pass batch runs one pass of Gibbs sampling on documents in batch

runLDA :: Word64 -> Int -> LDA -> Vector Doc -> (Vector Doc, LDA)Source

runLDA seed n m ds creates and runs an LDA sampler with seed for n passes with initial model m on the batch of documents ds. The random number generator used is System.Random.Mersenne.Pure64.

Datatypes

type Sampler a = RVarT (State LDA) aSource

Custom random variable representing the LDA Gibbs sampler

data LDA Source

Abstract type holding the settings and the state of the sampler

Instances

data Finalized Source

Abstract type holding the LDA model, and the inverse count tables

Instances

type Doc = (D, Vector (W, Maybe Z))Source

type D = IntSource

type W = IntSource

type Z = IntSource

Access model information

docTopics :: LDA -> Table2DSource

Document-topic counts

wordTopics :: LDA -> Table2DSource

Word-topic counts

topics :: LDA -> Table1DSource

Topic counts

alphasum :: LDA -> DoubleSource

alpha * K Dirichlet parameter (topic sparseness)

beta :: LDA -> DoubleSource

beta Dirichlet parameter (word sparseness)

topicNum :: LDA -> IntSource

Number of topics K

vSize :: LDA -> IntSource

Number of unique words

model :: Finalized -> LDASource

LDA model

topicDocs :: Finalized -> Table2DSource

Inverse document-topic counts

topicWords :: Finalized -> Table2DSource

Inverse word-topic counts

Initialization and finalization

initial :: Int -> Double -> Double -> LDASource

initial k a b initializes model with k topics, a/k alpha hyperparameter and b beta hyperparameter.

finalize :: LDA -> FinalizedSource

finalize m creates a finalized model from LDA model m

Prediction

docTopicWeights :: LDA -> Doc -> Vector DoubleSource

docTopicWeights m doc returns unnormalized topic probabilities for document doc given LDA model m

Miscelaneous

compress :: IntMap (IntMap Double) -> IntMap (IntMap Double)Source

Remove zero counts from the doc/topic table