Safe Haskell | Safe |
---|---|
Language | Haskell2010 |
An implementation of BM25F ranking. See:
- A quick overview: http://en.wikipedia.org/wiki/Okapi_BM25
- The Probabilistic Relevance Framework: BM25 and Beyond http://www.soi.city.ac.uk/~ser/papers/foundations_bm25_review.pdf
- An Introduction to Information Retrieval http://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf
- score :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> Float
- data Context term field feature = Context {
- numDocsTotal :: !Int
- avgFieldLength :: field -> Float
- numDocsWithTerm :: term -> Int
- paramK1 :: !Float
- paramB :: field -> Float
- fieldWeight :: field -> Float
- featureWeight :: feature -> Float
- featureFunction :: feature -> FeatureFunction
- data FeatureFunction
- data Doc term field feature = Doc {
- docFieldLength :: field -> Int
- docFieldTermFrequency :: field -> term -> Int
- docFeatureValue :: feature -> Float
- scoreTermsBulk :: forall field term feature. (Ix field, Bounded field) => Context term field feature -> Doc term field feature -> term -> (field -> Int) -> Float
- data Explanation field feature term = Explanation {
- overallScore :: Float
- termScores :: [(term, Float)]
- nonTermScores :: [(feature, Float)]
- termFieldScores :: [(term, [(field, Float)])]
- explain :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> Explanation field feature term
The ranking function
score :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> Float Source #
The BM25F score for a document for a given set of terms.
data Context term field feature Source #
Context | |
|
data FeatureFunction Source #
LogarithmicFunction Float | log (lambda_i + f_i) |
RationalFunction Float | f_i / (lambda_i + f_i) |
SigmoidFunction Float Float | 1 / (lambda + exp(-(lambda' * f_i)) |
data Doc term field feature Source #
Doc | |
|
Specialised variants
scoreTermsBulk :: forall field term feature. (Ix field, Bounded field) => Context term field feature -> Doc term field feature -> term -> (field -> Int) -> Float Source #
Most of the time we want to score several different documents for the same set of terms, but sometimes we want to score one document for many terms and in that case we can save a bit of work by doing it in bulk. It lets us calculate once and share things that depend only on the document, and not the term.
To take advantage of the sharing you must partially apply and name the per-doc score functon, e.g.
let score :: term -> (field -> Int) -> Float score = BM25.bulkScorer ctx doc in sum [ score t (\f -> counts ! (t, f)) | t <- ts ]
Explaining the score
data Explanation field feature term Source #
A breakdown of the BM25F score, to explain somewhat how it relates to the inputs, and so you can compare the scores of different documents.
Explanation | |
|