fuzzyfind: Fuzzy text matching

[ library, mit, program, text ] [ Propose Tags ]

A package that provides an API for fuzzy text search in Haskell, using a modified version of the Smith-Waterman algorithm. The search is intended to behave similarly to the excellent fzf tool by Junegunn Choi.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0.1.0, 2.1.0, 2.2.0, 3.0.0, 3.0.1, 3.0.2
Change log CHANGELOG.md
Dependencies base (>=4 && <5), containers (>=0.6 && <0.7), criterion, deepseq, fuzzyfind, massiv (>=0.6 && <0.7 || >=1.0 && <1.1), text (>=1.2 && <1.3 || >=2 && <3) [details]
License MIT
Copyright Copyright (C) 2021 Unison Computing
Author Rúnar Bjarnason
Maintainer Rúnar Bjarnason <runar.bjarnason@unison.cloud>
Category Text
Home page http://github.com/runarorama/fuzzyfind/
Bug tracker http://github.com/runarorama/fuzzyfind/issues
Source repo head: git clone https://github.com/runarorama/fuzzyfind
Uploaded by runar at 2024-06-27T00:39:43Z
Distributions
Executables bench
Downloads 1635 total (14 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2024-06-27 [all 1 reports]

Readme for fuzzyfind-3.0.2

[back to package description]

fuzzyfind

A package that provides an API for fuzzy text search in Haskell, using a modified version of the Smith-Waterman algorithm. The search is intended to behave similarly to the excellent fzf tool by Junegunn Choi.

The core functionality of the library is provided by the bestMatch function:

bestMatch :: String -> String -> Maybe Alignment

Calling bestMatch query string will return Nothing if query is not a subsequence of string. Otherwise, it will return the "best" way to line the characters in query up with the characters in string. Lower-case characters in the query are assumed to be case-insensitive, and upper-case characters are assumed to be case-sensitive.

For example:

> bestMatch "ff" "FuzzyFind"
Just (Alignment {score = 25, result = Result {[Match "F", Gap "uzzy", Match "F", Gap "ind"]}})

The score indicates how "good" the match is. Better matches have higher scores. There's no maximum score (except for the upper limit of the Int datatype), but the lowest score is 0.

A substring from the query will generate a Match, and any characters from the input that don't result in a Match will generate a Gap. Concatenating all the Match and Gap results should yield the original input string.

Note that the matched characters in the input always occur in the same order as they do in the query pattern.

The algorithm prefers (and will generate higher scores for) the following kinds of matches:

  1. Contiguous characters from the query string. For example, bestMatch "pp" will find the last two ps in "pickled pepper".
  2. Characters at the beginnings of words. For example, bestMatch "pp" will find the first two Ps in "Peter Piper".
  3. A character in the input that matches the first character of the query pattern is strongly preferred. For example, bestMatch "mn" "Bat Man" will score higher than bestMatch "mn" "Batman".

All else being equal, matches that occur later in the input string are preferred.

The fuzzyFind function finds input strings that match all the given input patterns. For each input that matches it returns one Alignment. The output is sorted by score, ascending.

fuzzyFind :: [String] -> [String] -> [Alignment]

For example:

> fuzzyFind ["dad", "mac", "dam"] ["tinned macadamia"]
[Alignment {score = 296, result = Result [Gap "tinne", Match "d", Gap " ", Match "macadam", Gap "ia"]}]