fuzzyset-0.3.2: Fuzzy set data structure for approximate string matching
Copyright(c) 2017-present Heikki Johannes Hildén
LicenseBSD3
Maintainerhildenjohannes@gmail.com
Stabilityexperimental
PortabilityGHC
Safe HaskellSafe-Inferred
LanguageHaskell2010

Data.FuzzySet

Description

 
Synopsis

Getting started

This library provides two similar, but independent APIs. The Data.FuzzySet.Simple module offers a simpler (pure) interface for working with the FuzzySet data structure directly (similar to earlier versions of the library). A disadvantage of this approach is that it scales poorly when the code involves IO, and possibly other effects. For most real-world use cases, it is therefore recommended to use the default API and the FuzzySearch monad exposed by Data.FuzzySet (see below for more examples).

findPlanet :: (MonadIO m, MonadFuzzySearch m) => Text -> m ()
findPlanet planetName = do
  addMany_ [ "Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune" ]
  findOne planetName >>= liftIO . print
>>> runDefaultFuzzySearchT (findPlanet "Joopiter")
Just (0.75,"Jupiter")

Note that all strings are represented as text values. Examples on this page use the OverloadedStrings language extension to allow string literals to be translated into this form.

Import the main module:

import Data.FuzzySet

Fuzzy search involves three types of operations:

Finally, use runFuzzySearch, runDefaultFuzzySearch, runFuzzySearchT, or runDefaultFuzzySearchT to get the result of the computation from the monad.

Simple search example

The following is a simple program to serve as a 'Hello World' example:

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Data.Text (Text)
import Data.FuzzySet (FuzzySearch, add_, closestMatch, runDefaultFuzzySearch)

findMovie :: Text -> FuzzySearch (Maybe Text)
findMovie title = do
  add_ "Jurassic Park"
  add_ "Terminator"
  add_ "The Matrix"
  closestMatch title

main :: IO ()
main = do
  let result = runDefaultFuzzySearch (findMovie "The Percolator")
  print result

The output of this program is:

Just "Terminator"

Adding IO

Changing the previous example to instead use the FuzzySearchT transformer, we can combine the search monad with IO and other effects.

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Control.Monad.Trans.Class (lift)
import Data.Text (Text)
import Data.FuzzySet (FuzzySearchT, add_, closestMatch, runDefaultFuzzySearchT)

findMovie :: Text -> FuzzySearchT IO (Maybe Text)
findMovie = closestMatch

prog :: FuzzySearchT IO ()
prog = do
  add_ "Jurassic Park"
  add_ "Terminator"
  add_ "The Matrix"
  result <- findMovie "The Percolator"
  lift (print result)

main :: IO ()
main = runDefaultFuzzySearchT prog

To make the search more restrictive, we can set a custom min score:

findMovie :: Text -> FuzzySearchT IO (Maybe Text)
findMovie = closestMatchMin 0.8

The output is now:

Nothing

Another example: Favorite fruit

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Control.Monad (when)
import Control.Monad.IO.Class (liftIO)
import Data.FuzzySet
import Data.Text (Text, pack, unpack)
import qualified Data.Text as Text

repl :: FuzzySearchT IO ()
repl = do
  str <- liftIO $ do
    putStrLn "Enter your favorite fruit below, or type \".exit\"."
    putStr "> "
    getLine
  when (str /= ".exit") $ do
    result <- findOneMin 0.6 (pack str)
    liftIO $ case result of
      Nothing ->
        putStrLn "I don't know that fruit."
      Just (1, match) ->
        putStrLn ("You like " <> unpack (Text.toLower match) <> ". Me too!")
      Just (_, match) ->
        putStrLn ("Did you mean \"" <> unpack match <> "\"?")
    repl

main :: IO ()
main = runDefaultFuzzySearchT $ do
  addMany_ fruits
  repl

fruits :: [Text]
fruits = [ "Apple", "Apricot", "Avocado", "Banana", "Bilberry", "Blackberry", "Blackcurrant", "Blueberry", "Boysenberry", "Currant", "Cherry", "Cherimoya", "Chico fruit", "Cloudberry", "Coconut", "Cranberry", "Cucumber", "Custard apple", "Damson", "Date", "Dragonfruit", "Durian", "Elderberry", "Feijoa", "Fig", "Goji berry", "Gooseberry", "Grape", "Raisin", "Grapefruit", "Guava", "Honeyberry", "Huckleberry", "Jabuticaba", "Jackfruit", "Jambul", "Jujube", "Juniper berry", "Kiwano", "Kiwifruit", "Kumquat", "Lemon", "Lime", "Loquat", "Longan", "Lychee", "Mango", "Mangosteen", "Marionberry", "Melon", "Cantaloupe", "Honeydew", "Watermelon", "Miracle fruit", "Mulberry", "Nectarine", "Nance", "Olive", "Orange", "Blood orange", "Clementine", "Mandarine", "Tangerine", "Papaya", "Passionfruit", "Peach", "Pear", "Persimmon", "Physalis", "Plantain", "Plum", "Prune", "Pineapple", "Plumcot", "Pomegranate", "Pomelo", "Purple mangosteen", "Quince", "Raspberry", "Salmonberry", "Rambutan", "Redcurrant", "Salal berry", "Salak", "Satsuma", "Soursop", "Star fruit", "Solanum quitoense", "Strawberry", "Tamarillo", "Tamarind", "Ugli fruit", "Yuzu" ]

FuzzySearch monad

type FuzzySearch = FuzzySearchT Identity Source #

FuzzySearch monad

class MonadState FuzzySet m => MonadFuzzySearch m Source #

Minimal complete definition

add, findMin

Instances

Instances details
Monad m => MonadFuzzySearch (FuzzySearchT m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

MonadFuzzySearch m => MonadFuzzySearch (MaybeT m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

MonadFuzzySearch m => MonadFuzzySearch (ExceptT e m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

MonadFuzzySearch m => MonadFuzzySearch (ReaderT r m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

(MonadFuzzySearch m, MonadState FuzzySet (SelectT s m)) => MonadFuzzySearch (SelectT s m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

MonadFuzzySearch m => MonadFuzzySearch (StateT FuzzySet m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

(MonadFuzzySearch m, Monoid w) => MonadFuzzySearch (WriterT w m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

MonadFuzzySearch m => MonadFuzzySearch (ContT r m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

Methods

add :: Text -> ContT r m Bool Source #

findMin :: Double -> Text -> ContT r m [FuzzyMatch] Source #

runFuzzySearch Source #

Arguments

:: FuzzySearch a 
-> Int

Lower bound on gram sizes to use (inclusive)

-> Int

Upper bound on gram sizes to use (inclusive)

-> Bool

Whether or not to use the Levenshtein distance to determine the score

-> a

The result of running the computation

Evaluate a FuzzySearch computation with the given options.

runDefaultFuzzySearch :: FuzzySearch a -> a Source #

Evaluate a FuzzySearch computation with the following defaults:

  • Gram size lower: 2
  • Gram size upper: 3
  • Use Levenshtein distance: True

FuzzySearch monad transformer

data FuzzySearchT m a Source #

FuzzySearch monad transformer

Instances

Instances details
MonadTrans FuzzySearchT Source # 
Instance details

Defined in Data.FuzzySet.Monad

Methods

lift :: Monad m => m a -> FuzzySearchT m a #

Monad m => MonadState FuzzySet (FuzzySearchT m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

MonadFix m => MonadFix (FuzzySearchT m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

Methods

mfix :: (a -> FuzzySearchT m a) -> FuzzySearchT m a #

MonadIO m => MonadIO (FuzzySearchT m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

Methods

liftIO :: IO a -> FuzzySearchT m a #

Monad m => Applicative (FuzzySearchT m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

Methods

pure :: a -> FuzzySearchT m a #

(<*>) :: FuzzySearchT m (a -> b) -> FuzzySearchT m a -> FuzzySearchT m b #

liftA2 :: (a -> b -> c) -> FuzzySearchT m a -> FuzzySearchT m b -> FuzzySearchT m c #

(*>) :: FuzzySearchT m a -> FuzzySearchT m b -> FuzzySearchT m b #

(<*) :: FuzzySearchT m a -> FuzzySearchT m b -> FuzzySearchT m a #

Functor m => Functor (FuzzySearchT m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

Methods

fmap :: (a -> b) -> FuzzySearchT m a -> FuzzySearchT m b #

(<$) :: a -> FuzzySearchT m b -> FuzzySearchT m a #

Monad m => Monad (FuzzySearchT m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

Methods

(>>=) :: FuzzySearchT m a -> (a -> FuzzySearchT m b) -> FuzzySearchT m b #

(>>) :: FuzzySearchT m a -> FuzzySearchT m b -> FuzzySearchT m b #

return :: a -> FuzzySearchT m a #

Monad m => MonadFuzzySearch (FuzzySearchT m) Source # 
Instance details

Defined in Data.FuzzySet.Monad

runFuzzySearchT Source #

Arguments

:: Monad m 
=> FuzzySearchT m a 
-> Int

Lower bound on gram sizes to use (inclusive)

-> Int

Upper bound on gram sizes to use (inclusive)

-> Bool

Whether or not to use the Levenshtein distance to determine the score

-> m a

The result of running the computation in the inner monad

Evaluate a FuzzySearchT computation with the given options.

runDefaultFuzzySearchT :: Monad m => FuzzySearchT m a -> m a Source #

Evaluate a FuzzySearchT computation with the following defaults:

  • Gram size lower: 2
  • Gram size upper: 3
  • Use Levenshtein distance: True

Insertion

add Source #

Arguments

:: MonadFuzzySearch m 
=> Text

The new entry

-> m Bool

A flag to indicate whether the value was added (i.e., did not already exist in the set) | Try to match a string against the entries in the set, and return a list of all results with a score greater than or equal to the specified minimum score (i.e., the first argument). The results are ordered by similarity, with the closest match first.

Add a string to the set. A boolean is returned which is True if the string was inserted, or False if it already existed in the set.

add_ :: MonadFuzzySearch m => Text -> m () Source #

Add a string to the set, or do nothing if a key that matches the string already exists.

This function is identical to add, except that the latter returns a boolean to indicate whether any new value was added.

addMany Source #

Arguments

:: MonadFuzzySearch m 
=> [Text]

A list of strings to add to the set

-> m [Text]

A list of values that were inserted

Add a list of strings to the set, all at once.

Unless you need to know the subset of values that were actually inserted, use addMany_ instead.

addMany_ Source #

Arguments

:: MonadFuzzySearch m 
=> [Text]

A list of strings to add to the set

-> m () 

Add a list of strings to the set, all at once.

This function is identical to addMany, except that the latter returns a list of all values that were inserted.

Lookup

find Source #

Arguments

:: MonadFuzzySearch m 
=> Text

The string to search for

-> m [FuzzyMatch]

A list of results (score and matched value)

Try to match the given string against the entries in the set, using a minimum score of 0.33. Return a list of results ordered by similarity score, with the closest match first. Use findMin if you need to specify a custom threshold value.

findMin Source #

Arguments

:: MonadFuzzySearch m 
=> Double

A minimum score

-> Text

The string to search for

-> m [FuzzyMatch]

A list of results (score and matched value)

findOne Source #

Arguments

:: MonadFuzzySearch m 
=> Text

The string to search for

-> m (Maybe FuzzyMatch)

The closest match, if one is found

Try to match the given string against the entries in the set, and return the closest match, if one is found. A minimum score of 0.33 is used. To specify a custom threshold value, instead use findOneMin.

findOneMin Source #

Arguments

:: MonadFuzzySearch m 
=> Double

A minimum score

-> Text

The string to search for

-> m (Maybe FuzzyMatch)

The closest match, if one is found

Try to match the given string against the entries in the set using the specified minimum score and return the closest match, if one is found.

closestMatchMin Source #

Arguments

:: MonadFuzzySearch m 
=> Double

A minimum score

-> Text

The string to search for

-> m (Maybe Text)

The string most closely matching the input, if a match is found

Try to match the given string against the entries in the set using the specified minimum score and return the string that most closely matches the input, if a match is found.

closestMatch Source #

Arguments

:: MonadFuzzySearch m 
=> Text

The string to search for

-> m (Maybe Text)

The string most closely matching the input, if a match is found

Try to match the given string against the entries in the set, and return the string that most closely matches the input, if a match is found. A minimum score of 0.33 is used. To specify a custom threshold value, instead use closestMatchMin.

Inspection

values :: MonadFuzzySearch m => m [Text] Source #

Return the elements of the set. No particular order is guaranteed.

size :: MonadFuzzySearch m => m Int Source #

Return the number of entries in the set.

isEmpty :: MonadFuzzySearch m => m Bool Source #

Return a boolean indicating whether the set is empty.