Safe Haskell	None

NLP.SwiftLDA

Contents

Initialization and finalization
Running sampler
Datatypes
Access model information
Querying evolving model
Querying finalized model

Description

Latent Dirichlet Allocation

Imperative implementation of a collapsed Gibbs sampler for LDA. This library uses the topic modeling terminology (documents, words, topics), even though it is generic. For example if used for word class induction, replace documents with word types, words with features and topics with word classes.

Usage example:

 import qualified Data.Vector as V
 import qualified Data.Vector.Unboxed as U
 import  Control.Monad.ST
 import NLP.SwiftLDA
 
 main = do
   -- Initialize model.
   let river = 1
       money = 2
       bank  = 3
   m <- stToIO $ initial (U.singleton 42) 2 10 0.1
   let  docs = [ (1, U.fromList [river, river, bank])  
               , (2, U.fromList [money, money, bank])
               , (3, U.fromList [river, money, bank])
               ]
   -- Run 10 iterations of sampling on this batch of documents.
   result <- stToIO $ run m 10 docs
   -- Display topic assignments.
   print result
   -- Run one iteration of sampling on another batch and display result
   print =<< (stToIO $ run m 1 [(4, U.fromList [bank, bank])]) 
   -- Retrieve and display word-topic weights.
   fm <- stToIO $ finalize m
   print $ wordTopics fm

Synopsis

Initialization and finalization

initial :: Vector Word32 -> Int -> Double -> Double -> ST s (LDA s)Source

initial s k a b initializes model with k topics, a/k alpha hyperparameter, b beta hyperparameter and random seed s.

finalize :: LDA s -> ST s Finalized Source

Create transparent immutable object holding model information from opaque internal representation.

Running sampler

run :: Traversable f => LDA s -> Int -> f Doc -> ST s (f (Vector Z))Source

run m i batch runs an outer loop of i passes of Gibbs sampling over documents in batch using the model m and returns the topic assignments for words in the documents of the batch.