Safe Haskell | Safe-Infered |
---|
Latent Dirichlet Allocation
Simple implementation of a collapsed Gibbs sampler for LDA. This library uses the topic modeling terminology (documents, words, topics), even though it is generic. For example if used for word class induction, replace documents with word types, words with features and topics with word classes.
- runSampler :: Word64 -> LDA -> Sampler a -> (a, LDA)
- pass :: Vector Doc -> Sampler (Vector Doc)
- runLDA :: Word64 -> Int -> LDA -> Vector Doc -> (Vector Doc, LDA)
- type Sampler a = RVarT (State LDA) a
- data LDA
- data Finalized
- type Doc = (D, Vector (W, Maybe Z))
- type D = Int
- type W = Int
- type Z = Int
- docTopics :: LDA -> Table2D
- wordTopics :: LDA -> Table2D
- topics :: LDA -> Table1D
- alphasum :: LDA -> Double
- beta :: LDA -> Double
- topicNum :: LDA -> Int
- vSize :: LDA -> Int
- model :: Finalized -> LDA
- topicDocs :: Finalized -> Table2D
- topicWords :: Finalized -> Table2D
- initial :: Int -> Double -> Double -> LDA
- finalize :: LDA -> Finalized
- docTopicWeights :: LDA -> Doc -> Vector Double
- compress :: IntMap (IntMap Double) -> IntMap (IntMap Double)
- type Table2D = IntMap Table1D
- type Table1D = IntMap Double
Running samplers
runSampler :: Word64 -> LDA -> Sampler a -> (a, LDA)Source
runSampler seed m s
runs sampler s
with seed
and initial
model m
. The random number generator used is
System.Random.Mersenne.Pure64.
pass :: Vector Doc -> Sampler (Vector Doc)Source
pass batch
runs one pass of Gibbs sampling on documents in batch
runLDA :: Word64 -> Int -> LDA -> Vector Doc -> (Vector Doc, LDA)Source
runLDA seed n m ds
creates and runs an LDA sampler with seed
for n
passes with initial model m
on the batch of documents
ds
. The random number generator used is
System.Random.Mersenne.Pure64.
Datatypes
Abstract type holding the LDA model, and the inverse count tables
Access model information
wordTopics :: LDA -> Table2DSource
Word-topic counts
topicWords :: Finalized -> Table2DSource
Inverse word-topic counts
Initialization and finalization
initial :: Int -> Double -> Double -> LDASource
initial k a b
initializes model with k
topics, a/k
alpha
hyperparameter and b
beta hyperparameter.
Prediction
docTopicWeights :: LDA -> Doc -> Vector DoubleSource
docTopicWeights m doc
returns unnormalized topic probabilities
for document doc given LDA model m