lda-0.0.2: Online Latent Dirichlet Allocation

Safe Haskell	Safe-Infered

NLP.LDA

Contents

Running samplers
Datatypes
Access model information
Initialization and finalization
Prediction
Miscelaneous

Description

Latent Dirichlet Allocation

Simple implementation of a collapsed Gibbs sampler for LDA. This library uses the topic modeling terminology (documents, words, topics), even though it is generic. For example if used for word class induction, replace documents with word types, words with features and topics with word classes.

Synopsis

Running samplers

runSampler :: Word64 -> LDA -> Sampler a -> (a, LDA)Source

runSampler seed m s runs sampler s with seed and initial model m. The random number generator used is System.Random.Mersenne.Pure64.

pass :: Vector Doc -> Sampler (Vector Doc)Source

pass batch runs one pass of Gibbs sampling on documents in batch

runLDA :: Word64 -> Int -> LDA -> Vector Doc -> (Vector Doc, LDA)Source

runLDA seed n m ds creates and runs an LDA sampler with seed for n passes with initial model m on the batch of documents ds. The random number generator used is System.Random.Mersenne.Pure64.

Datatypes

type Sampler a = RVarT (State LDA) aSource

Custom random variable representing the LDA Gibbs sampler

data LDA Source

Abstract type holding the settings and the state of the sampler

Instances

data Finalized Source

Abstract type holding the LDA model, and the inverse count tables

Instances

Generic Finalized

type Doc = (D, Vector (W, Maybe Z))Source

type D = Int Source

type W = Int Source

type Z = Int Source

Access model information

docTopics :: LDA -> Table2D Source

Document-topic counts

wordTopics :: LDA -> Table2D Source

Word-topic counts

topics :: LDA -> Table1D Source

Topic counts

alphasum :: LDA -> Double Source

alpha * K Dirichlet parameter (topic sparseness)

beta :: LDA -> Double Source

beta Dirichlet parameter (word sparseness)

topicNum :: LDA -> Int Source

Number of topics K

vSize :: LDA -> Int Source

Number of unique words

model :: Finalized -> LDA Source

LDA model

topicDocs :: Finalized -> Table2D Source

Inverse document-topic counts

topicWords :: Finalized -> Table2D Source

Inverse word-topic counts

Initialization and finalization

initial :: Int -> Double -> Double -> LDA Source

initial k a b initializes model with k topics, a/k alpha hyperparameter and b beta hyperparameter.

finalize :: LDA -> Finalized Source

finalize m creates a finalized model from LDA model m

Prediction

docTopicWeights :: LDA -> Doc -> Vector Double Source

docTopicWeights m doc returns unnormalized topic probabilities for document doc given LDA model m

Miscelaneous

compress :: IntMap (IntMap Double) -> IntMap (IntMap Double)Source

Remove zero counts from the doc/topic table

type Table2D = IntMap Table1D Source

type Table1D = IntMap Double Source