alfred-margaret: Fast Aho-Corasick string searching

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]


An efficient implementation of the Aho-Corasick string searching algorithm.

[Skip to Readme]


Change log None available
Dependencies aeson (>=1.4.2 && <1.6), base (>=4.7 && <5), containers (==0.6.*), deepseq (==1.4.*), hashable (>=1.2.7 && <1.4), primitive (>=0.6.4 && <0.8), text (>=1.2.3 && <1.3), unordered-containers (>=0.2.9 && <0.3), vector (==0.12.*) [details]
License BSD-3-Clause
Copyright 2020 Channable
Author The Alfred-Margaret authors
Maintainer Ruud van Asseldonk <>, Fabian Thorand <>
Category Data, Text
Home page
Source repo head: git clone
Uploaded by rkrzr at 2020-10-19T08:21:10Z



Manual Flags


Compile with LLVM

Automatic Flags

Enable aeson support


Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info


Maintainer's Corner

For package maintainers and hackage trustees

Readme for alfred-margaret-

[back to package description]


Alfred–Margaret is a fast implementation of the Aho–Corasick string searching algorithm in Haskell. It powers many string-related operations in Channable.

The library is designed to work with the text package. It matches directly on the internal UTF-16 representation of Text for efficiency. See the announcement blog post for a deeper dive into Aho–Corasick, and the optimizations that make this library fast.

Alfred–Margaret is named after Alfred Aho and Margaret Corasick.


Running time to count all matches, in a real-world data set, comparing a Java implementation and a Rust implementation against Alfred–Margaret, and against memcopy to establish a lower bound:

For the full details of this benchmark, see our announcement blog post, which includes more details about the data set, the benchmark setup, and a few things to keep in mind when interpreting this graph.


Check if a string contains one of the needles:

import qualified Data.Text.AhoCorasick.Searcher as Searcher

searcher = ["tshirt", "shirts", "shorts"]

Searcher.containsAny searcher "short tshirts"
> True

Searcher.containsAny searcher "long shirt"
> False

Searcher.containsAny searcher "Short TSHIRTS"
> False

Searcher.containsAnyIgnoreCase searcher "Short TSHIRTS"
> True

Sequentially replace many needles:

import Data.Text.AhoCorasick.Automaton (CaseSensitivity (..))
import qualified Data.Text.AhoCorasick.Replacer as Replacer

replacer = CaseSensitive [("tshirt", "banana"), ("shirt", "pear")] replacer "tshirts for sale"
> "bananas for sale" replacer "tshirts and shirts for sale"
> "bananas and pears for sale" replacer "sweatshirts and shirtshirts"
> "sweabananas and shirbananas" replacer "sweatshirts and shirttshirts"
> "sweabananas and pearbananas"

Get all matches, possibly overlapping:

import qualified Data.Text.AhoCorasick.Automaton as Aho

pairNeedleWithSelf text = (Aho.unpackUtf16 text, text)
automaton = $ fmap pairNeedleWithSelf ["tshirt", "shirts", "shorts"]
allMatches = Aho.runText [] (\matches match -> Aho.Step (match : matches))

allMatches automaton "short tshirts"
> [ Match {matchPos = CodeUnitIndex 13, matchValue = "shirts"}
> , Match {matchPos = CodeUnitIndex 12, matchValue = "tshirt"}
> ]

allMatches automaton "sweatshirts and shirtshirts"
> [ Match {matchPos = CodeUnitIndex 27, matchValue = "shirts"}
> , Match {matchPos = CodeUnitIndex 26, matchValue = "tshirt"}
> , Match {matchPos = CodeUnitIndex 22, matchValue = "shirts"}
> , Match {matchPos = CodeUnitIndex 11, matchValue = "shirts"}
> , Match {matchPos = CodeUnitIndex 10, matchValue = "tshirt"}
> ]


Alfred–Margaret is licensed under the 3-clause BSD license.