pattern-trie-0.1.1: Pattern tries

Safe HaskellNone
LanguageHaskell2010

Data.Trie.Pattern

Contents

Description

A variant of a (radix) trie with the following characteristics:

  • Keys are simple Patterns composed of Matchers and hence a single key can match multiple input Strings.
  • Looking up a match for a String can Capture parts of the string.
  • Both patterns and input strings are understood as being composed of (indivisible) chunks of strings of a generic type s (typically instantiated to either Text or ByteString). More precisely, every chunk of an input String is tested against a Matcher of a Pattern in sequence. As a result, pattern tries usually end up less compact than more general tries, since sharing of prefixes as well as all operations are limited to the granularity of these chunks.

These characteristics hint at the primary intended use-case, whereby keys have a "natural" decomposition into chunks and the same chunks are heavily shared by different keys, e.g. as in directory trees. A pattern trie allows to associate values with simple patterns, whereby a single value can essentially be looked up by all strings matching a pattern, thereby capturing parts of it.

Strictness: A Trie is strict in the spine as well as the values (WHNF).

Ordering: The order of keys and thus elements is unspecified. No particular order may be assumed by folds and traversals, whose combining operations should hence be commutative.

Example:

>>> :set -XOverloadedStrings
>>> import Data.ByteString (ByteString)
>>> let p1 = mempty |> EqStr "home" |> EqStr "alice" |> EqStr "tmp"
>>> let p2 = mempty |> EqStr "home" |> AnyStr        |> EqStr "music"
>>> let p3 = mempty |> EqStr "data" |> EqStr "bob"   |> EqStr "books"
>>> let p4 = mempty |> EqStr "data" |> AnyStr        |> EqStr "books"
>>> let p5 = mempty |> EqStr "data" |> AnyStr        |> EqStr "books" |> EqStr "sicp"
>>> let trie = fromAssocList $ [p1,p2,p3,p4,p5] `zip` [1..] :: Trie ByteString Int
>>> match ["home","alice","tmp"] trie
Just (1,fromList [])
>>> match ["home","bob","tmp"] trie
Nothing
>>> match ["home","alice","music"] trie
Just (2,fromList [Capture {captured = "alice"}])
>>> match ["home","bob","music"] trie
Just (2,fromList [Capture {captured = "bob"}])
>>> match ["data","bob","books"] trie
Just (3,fromList [])
>>> match ["data","alice","books"] trie
Just (4,fromList [Capture {captured = "alice"}])
>>> match ["data","alice","books","sicp"] trie
Just (5,fromList [Capture {captured = "alice"}])
>>> match ["data","bob","books","sicp"] trie
Just (5,fromList [Capture {captured = "bob"}])
>>> matchPrefix ["data","alice","books","wonderland"] trie
Just (4,fromList [Capture {captured = "alice"}],["wonderland"])
>>> matchPrefix ["data","bob","books","wonderland"] trie
Just (4,fromList [Capture {captured = "bob"}],["wonderland"])
>>> let (t,c,s) = matchPrefixTrie ["data","bob","books","wonderland"] trie
>>> (value t, c, s)
(Just 4,fromList [Capture {captured = "bob"}],["wonderland"])
Synopsis

Documentation

data Trie s a Source #

An unordered map from Patterns of strings of type s to values of type a.

Instances
Functor (Trie s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

fmap :: (a -> b) -> Trie s a -> Trie s b #

(<$) :: a -> Trie s b -> Trie s a #

Foldable (Trie s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

fold :: Monoid m => Trie s m -> m #

foldMap :: Monoid m => (a -> m) -> Trie s a -> m #

foldr :: (a -> b -> b) -> b -> Trie s a -> b #

foldr' :: (a -> b -> b) -> b -> Trie s a -> b #

foldl :: (b -> a -> b) -> b -> Trie s a -> b #

foldl' :: (b -> a -> b) -> b -> Trie s a -> b #

foldr1 :: (a -> a -> a) -> Trie s a -> a #

foldl1 :: (a -> a -> a) -> Trie s a -> a #

toList :: Trie s a -> [a] #

null :: Trie s a -> Bool #

length :: Trie s a -> Int #

elem :: Eq a => a -> Trie s a -> Bool #

maximum :: Ord a => Trie s a -> a #

minimum :: Ord a => Trie s a -> a #

sum :: Num a => Trie s a -> a #

product :: Num a => Trie s a -> a #

Traversable (Trie s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

traverse :: Applicative f => (a -> f b) -> Trie s a -> f (Trie s b) #

sequenceA :: Applicative f => Trie s (f a) -> f (Trie s a) #

mapM :: Monad m => (a -> m b) -> Trie s a -> m (Trie s b) #

sequence :: Monad m => Trie s (m a) -> m (Trie s a) #

(Eq s, Eq a) => Eq (Trie s a) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

(==) :: Trie s a -> Trie s a -> Bool #

(/=) :: Trie s a -> Trie s a -> Bool #

(Eq s, Hashable s, Read s, Read a) => Read (Trie s a) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

readsPrec :: Int -> ReadS (Trie s a) #

readList :: ReadS [Trie s a] #

readPrec :: ReadPrec (Trie s a) #

readListPrec :: ReadPrec [Trie s a] #

(Show s, Show a) => Show (Trie s a) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

showsPrec :: Int -> Trie s a -> ShowS #

show :: Trie s a -> String #

showList :: [Trie s a] -> ShowS #

Generic (Trie s a) Source # 
Instance details

Defined in Data.Trie.Pattern

Associated Types

type Rep (Trie s a) :: Type -> Type #

Methods

from :: Trie s a -> Rep (Trie s a) x #

to :: Rep (Trie s a) x -> Trie s a #

(Eq s, Hashable s) => Semigroup (Trie s a) Source #

Note (left preference): If two tries have a value attached to the same Pattern (i.e. to the same key), then t1 <> t2 preserves the value of t1.

Instance details

Defined in Data.Trie.Pattern

Methods

(<>) :: Trie s a -> Trie s a -> Trie s a #

sconcat :: NonEmpty (Trie s a) -> Trie s a #

stimes :: Integral b => b -> Trie s a -> Trie s a #

(Eq s, Hashable s) => Monoid (Trie s a) Source #

Note: mappend = (<>).

Instance details

Defined in Data.Trie.Pattern

Methods

mempty :: Trie s a #

mappend :: Trie s a -> Trie s a -> Trie s a #

mconcat :: [Trie s a] -> Trie s a #

(NFData s, NFData a) => NFData (Trie s a) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

rnf :: Trie s a -> () #

type Rep (Trie s a) Source # 
Instance details

Defined in Data.Trie.Pattern

type Rep (Trie s a) = D1 (MetaData "Trie" "Data.Trie.Pattern" "pattern-trie-0.1.1-JpFJgAGc42dD7lpnMqotfZ" False) (C1 (MetaCons "Trie" PrefixI True) (S1 (MetaSel (Just "strtries") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 (HashMap s (Trie s a))) :*: (S1 (MetaSel (Just "vartrie") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 (Maybe (Trie s a))) :*: S1 (MetaSel (Just "value") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 (Maybe a)))))

value :: Trie s a -> Maybe a Source #

The value at the root of the trie, i.e.

value t == lookup mempty t

Patterns

Definition (Prefix Match): A String is a prefix match for a Pattern, iff all Matchers in the pattern succeed when applied on the chunks of the input string in sequence. A proper prefix match is a prefix match that is not a (full) match.

A prefix match is witnessed by applyMatchesPrefix.

Definition ((Full) Match): A String is a (full) match for a Pattern, iff it is a prefix match and there are no unmatched remaining chunks of the input (i.e. the string and the pattern have the same length).

A (full) match is witnessed by applyMatches.

Definition (Overlapping Patterns): Two patterns are overlapping, iff they are not equal and there exists an input String that is a (full) match for both patterns.

Overlapping patterns are witnessed by overlapping.

type Pattern s = Seq (Matcher s) Source #

A pattern is a sequence of Matchers and serves as a key in a pattern trie.

If two patterns are overlapping for an input string, the preference for a match is given by the partial order EqStr > AnyStr on the competing matchers, i.e. towards the more specific pattern. This partial order is witnessed and subsumed by the total order MatchOrd.

The preference for a prefix match is reversed, i.e. for an input string where only a proper prefix is a match for overlapping patterns, the preference is given by the partial order AnyStr > EqStr, i.e. towards the more general pattern. This partial order is witnessed and subsumed by the total order PrefixMatchOrd.

type Str s = [s] Source #

A (chunked) input string to match on a Pattern in a trie.

Note: Input strings can be infinite. Since the tries are always finite, an infinite input string is only consumed until either a match has been found or the applicable paths in the trie have been exhaustively searched.

data Matcher s Source #

A Matcher is applied on a single chunk of an input String while looking for a match and either succeeds or fails. If it succeeds, it may Capture the chunk.

Constructors

AnyStr

Match and capture an arbitrary chunk of an input string.

EqStr !s

Match a chunk of an input string exactly, capturing nothing.

Instances
Eq s => Eq (Matcher s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

(==) :: Matcher s -> Matcher s -> Bool #

(/=) :: Matcher s -> Matcher s -> Bool #

Read s => Read (Matcher s) Source # 
Instance details

Defined in Data.Trie.Pattern

Show s => Show (Matcher s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

showsPrec :: Int -> Matcher s -> ShowS #

show :: Matcher s -> String #

showList :: [Matcher s] -> ShowS #

Generic (Matcher s) Source # 
Instance details

Defined in Data.Trie.Pattern

Associated Types

type Rep (Matcher s) :: Type -> Type #

Methods

from :: Matcher s -> Rep (Matcher s) x #

to :: Rep (Matcher s) x -> Matcher s #

NFData s => NFData (Matcher s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

rnf :: Matcher s -> () #

type Rep (Matcher s) Source # 
Instance details

Defined in Data.Trie.Pattern

type Rep (Matcher s) = D1 (MetaData "Matcher" "Data.Trie.Pattern" "pattern-trie-0.1.1-JpFJgAGc42dD7lpnMqotfZ" False) (C1 (MetaCons "AnyStr" PrefixI False) (U1 :: Type -> Type) :+: C1 (MetaCons "EqStr" PrefixI False) (S1 (MetaSel (Nothing :: Maybe Symbol) NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 s)))

newtype Capture s Source #

A captured chunk of an input String.

Constructors

Capture 

Fields

Instances
Eq s => Eq (Capture s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

(==) :: Capture s -> Capture s -> Bool #

(/=) :: Capture s -> Capture s -> Bool #

Ord s => Ord (Capture s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

compare :: Capture s -> Capture s -> Ordering #

(<) :: Capture s -> Capture s -> Bool #

(<=) :: Capture s -> Capture s -> Bool #

(>) :: Capture s -> Capture s -> Bool #

(>=) :: Capture s -> Capture s -> Bool #

max :: Capture s -> Capture s -> Capture s #

min :: Capture s -> Capture s -> Capture s #

Read s => Read (Capture s) Source # 
Instance details

Defined in Data.Trie.Pattern

Show s => Show (Capture s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

showsPrec :: Int -> Capture s -> ShowS #

show :: Capture s -> String #

showList :: [Capture s] -> ShowS #

Generic (Capture s) Source # 
Instance details

Defined in Data.Trie.Pattern

Associated Types

type Rep (Capture s) :: Type -> Type #

Methods

from :: Capture s -> Rep (Capture s) x #

to :: Rep (Capture s) x -> Capture s #

NFData s => NFData (Capture s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

rnf :: Capture s -> () #

type Rep (Capture s) Source # 
Instance details

Defined in Data.Trie.Pattern

type Rep (Capture s) = D1 (MetaData "Capture" "Data.Trie.Pattern" "pattern-trie-0.1.1-JpFJgAGc42dD7lpnMqotfZ" True) (C1 (MetaCons "Capture" PrefixI True) (S1 (MetaSel (Just "captured") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 s)))

Testing Patterns

overlapping :: Eq s => Pattern s -> Pattern s -> Bool Source #

Check whether two patterns are overlapping, i.e. whether there exists a String that is a (full) match for both patterns.

>>> let p1 = mempty |> EqStr "a" |> AnyStr
>>> let p2 = mempty |> AnyStr    |> EqStr "b"
>>> let p3 = mempty |> EqStr "a" |> EqStr "c"
>>> overlapping p1 p1
False
>>> overlapping p1 p2
True
>>> overlapping p1 p3
True
>>> overlapping p2 p3
False

newtype MatchOrd s Source #

A total order for matchers that subsumes the partial order for the preference between overlapping patterns on a match.

>>> MatchOrd AnyStr < MatchOrd (EqStr "a")
True
>>> let p1 = mempty |> EqStr "a" |> EqStr "b"
>>> let p2 = mempty |> AnyStr    |> EqStr "b"
>>> matchOrd p1 > matchOrd p2
True

Constructors

MatchOrd (Matcher s) 
Instances
Eq s => Eq (MatchOrd s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

(==) :: MatchOrd s -> MatchOrd s -> Bool #

(/=) :: MatchOrd s -> MatchOrd s -> Bool #

Ord s => Ord (MatchOrd s) Source # 
Instance details

Defined in Data.Trie.Pattern

Methods

compare :: MatchOrd s -> MatchOrd s -> Ordering #

(<) :: MatchOrd s -> MatchOrd s -> Bool #

(<=) :: MatchOrd s -> MatchOrd s -> Bool #

(>) :: MatchOrd s -> MatchOrd s -> Bool #

(>=) :: MatchOrd s -> MatchOrd s -> Bool #

max :: MatchOrd s -> MatchOrd s -> MatchOrd s #

min :: MatchOrd s -> MatchOrd s -> MatchOrd s #

newtype MatchPrefixOrd s Source #

A total order for matchers that subsumes the partial order for the preference between overlapping patterns on a matchPrefix.

>>> MatchPrefixOrd AnyStr > MatchPrefixOrd (EqStr "a")
True
>>> let p1 = mempty |> EqStr "a" |> EqStr "b"
>>> let p2 = mempty |> AnyStr    |> EqStr "b"
>>> matchPrefixOrd p1 < matchPrefixOrd p2
True

Constructors

MatchPrefixOrd (Matcher s) 

apply :: Eq s => Str s -> Pattern s -> (Pattern s, Seq (Capture s), Str s) Source #

Apply a string to a pattern, returning the unmatched suffix of the pattern together with the captured chunks and the remaining (unmatched) suffix of the input string.

>>> let p = mempty |> EqStr "a" |> AnyStr |> EqStr "c"
>>> let s = ["a", "b", "d"]
>>> apply s p
(fromList [EqStr "c"],fromList [Capture {captured = "b"}],["d"])

applyCapture :: Eq s => Str s -> Pattern s -> Seq (Capture s) Source #

Apply a string to a pattern, returning the captures.

>>> let p = mempty |> EqStr "a" |> AnyStr |> EqStr "c"
>>> let s = ["a", "b", "d"]
>>> applyCapture s p
fromList [Capture {captured = "b"}]

unapplyCapture :: Pattern s -> Seq (Capture s) -> Str s Source #

(Re)Construct the longest input String matching a prefix of a pattern, using the given captures to satisfy matchers. As long as there are enough captures to satisfy all matchers in the pattern, the resulting string will always be a (full) match for the pattern.

Furthermore, if an input string s is a (full) match for a pattern p, then

unapplyCapture p (applyCapture s p) == s
>>> let p = mempty |> EqStr "a" |> AnyStr |> EqStr "c"
>>> let s = ["a", "b", "c"]
>>> unapplyCapture p (applyCapture s p)
["a","b","c"]

applyMatch :: Eq s => Str s -> Pattern s -> Maybe (Seq (Capture s)) Source #

Apply a string to a pattern, returning the captures iff the string is a (full) match for the pattern.

>>> let p = mempty |> EqStr "a" |> AnyStr |> EqStr "c"
>>> applyMatch ["a", "b", "c", "d"] p
Nothing
>>> applyMatch ["a", "b", "c"] p
Just (fromList [Capture {captured = "b"}])

applyMatches :: Eq s => Str s -> Pattern s -> Bool Source #

Apply a string to a pattern, returning True iff the string is a (full) match for the pattern.

applyMatchPrefix :: Eq s => Str s -> Pattern s -> Maybe (Seq (Capture s)) Source #

Apply a string to a pattern, returning the captures iff the string is a prefix match for the pattern.

>>> let p = mempty |> EqStr "a" |> AnyStr |> EqStr "c"
>>> applyMatchPrefix ["a", "b", "c", "d"] p
Just (fromList [Capture {captured = "b"}])

applyMatchesPrefix :: Eq s => Str s -> Pattern s -> Bool Source #

Apply a string to a pattern, returning True iff the string is a prefix match for the pattern.

List conversion

fromAssocList :: (Eq s, Hashable s) => [(Pattern s, a)] -> Trie s a Source #

Create a pattern trie from a list of patterns and associated values.

\(\mathcal{O}(n \cdot k)\), where \(n\) is the length of the list and \(k\) is the length of the longest pattern in the list.

toAssocList :: (Eq s, Hashable s) => Trie s a -> [(Pattern s, a)] Source #

Create a list of patterns and associated values from a pattern trie.

\(\mathcal{O}(n \cdot k)\), where \(n\) is the number of values in the trie and \(k\) is the length of the longest pattern in the trie.

Modifications

insert :: (Eq s, Hashable s) => Pattern s -> a -> Trie s a -> Trie s a Source #

Insert the value for the given pattern into the trie.

\(\Theta(k)\), where \(k\) is the length of the pattern.

adjust :: (Eq s, Hashable s) => Pattern s -> (a -> a) -> Trie s a -> Trie s a Source #

Update the value of the given pattern in the trie, if it exists.

\(\mathcal{O}(k)\), where \(k\) is the length of the pattern.

delete :: (Eq s, Hashable s) => Pattern s -> Trie s a -> Trie s a Source #

Remove the value for the given pattern from the trie, if it exists.

\(\mathcal{O}(k)\), where \(k\) is the length of the pattern.

Pattern lookup

Time Complexity (successful lookups)

\(\Theta(k)\), where \(k\) is the length of the pattern.

lookup :: (Eq s, Hashable s) => Pattern s -> Trie s a -> Maybe a Source #

Lookup the value of a pattern. If there is no value in the trie for the given pattern, the result is Nothing.

lookupPrefix :: (Eq s, Hashable s) => Pattern s -> Trie s a -> Maybe (a, Pattern s) Source #

Lookup the value for the longest matching prefix of a pattern, returning it together with the remaining suffix of the pattern. If there is no value in the trie for any prefix of the given pattern, the result is Nothing.

lookupPrefixTrie :: (Eq s, Hashable s) => Pattern s -> Trie s a -> (Trie s a, Pattern s) Source #

Lookup the trie rooted at the longest prefix of a pattern, returning it together with the remaining suffix of the pattern.

String matching

Time Complexity (successful matches)

In what follows \(k\) is always the length of the input string (i.e. the number of chunks).

Best case: \(\Theta(k)\), when the input string matches the most specific pattern in the trie (i.e. with the least captures) from all those that have a matching prefix for the string.

Worst case: \(\mathcal{O}(2^k)\), when there are \(2^{k-1}\) distinct patterns of length at least \(k\) in the trie, all of which have a prefix of length \(k-1\) that is a prefix match for the input string, but none except for the most general of them are an actual (full) match. This is a pathological case that comes about from backtracking to more general patterns and is illustrated in an example with \(k=3\) for the input string ["a","a","b"] below.

Nodes with values are filled, choice points are blue, dead ends are red and dashed lines indicate backtracking. The above trie contains the keys (patterns)

  • mempty |> EqStr "a" |> EqStr "a" |> EqStr "a"
  • mempty |> EqStr "a" |> AnyStr    |> EqStr "a"
  • mempty |> AnyStr    |> EqStr "a" |> EqStr "a"
  • mempty |> AnyStr    |> AnyStr    |> EqStr "b"

with some arbitrary values. The paths are explored left-to-right in a depth-first search. The number of steps for a match in the worst case is more accurately approximated by \[ \underbrace{2^k - 1}_{\text{downwards searching}} + \underbrace{2^{k-1} - 1}_{\text{upwards backtracking}} \] Dropping the asymptotically insignificant constants and lower terms yields the bound. For realistic values of \(k\), however, the difference matters.

match :: (Eq s, Hashable s) => Str s -> Trie s a -> Maybe (a, Seq (Capture s)) Source #

Lookup the value for an input string by matching it against the patterns of a trie. The value of the matching pattern, if any, is returned together with any captured parts of the input string. If the input string does not match any pattern in the trie, the result is Nothing.

matchPrefix :: (Eq s, Hashable s) => Str s -> Trie s a -> Maybe (a, Seq (Capture s), Str s) Source #

Lookup the value for the longest matching prefix of the input string, returning it together with any captured parts and the remaining (unmatched) suffix of the input string. If no prefix of the input string matches any pattern in the trie, the result is Nothing.

matchPrefixTrie :: (Eq s, Hashable s) => Str s -> Trie s a -> (Trie s a, Seq (Capture s), Str s) Source #

Lookup the trie rooted at the longest matching prefix of the input string, returning it together with any captured parts and the remaining (unmatched) suffix of the input string.

In particular, if the input string is a (full) match for a pattern, the returned trie is the subtrie that is rooted at the associated value.

Special folds and traversals

traverseWithKey :: Applicative f => (Pattern s -> a -> f b) -> Trie s a -> f (Trie s b) Source #

foldMapWithKey :: Monoid m => (Pattern s -> a -> m) -> Trie s a -> m Source #

foldrWithKey :: (Pattern s -> a -> b -> b) -> b -> Trie s a -> b Source #

Re-exports

(|>) :: Seq a -> a -> Seq a infixl 5 #

\( O(1) \). Add an element to the right end of a sequence. Mnemonic: a triangle with the single element at the pointy end.