Safe Haskell	Safe-Inferred
Language	Haskell2010

Regex.List

Contents

REs
Char REs
Numeric Char REs
Combinators
Combinators in base
Compile and parse
List operations
Additional information

Description

This module offers regexes, combinators, and operations to work with the list type ([]), and also specifically Strings, which are lists of Chars.

Synopsis

data RE c a
token :: (c -> Maybe a) -> RE c a
satisfy :: (c -> Bool) -> RE c c
single :: Eq c => c -> RE c c
anySingle :: RE c c
list :: Eq c => [c] -> RE c [c]
manyList :: RE c [c]
someList :: RE c [c]
manyListMin :: RE c [c]
someListMin :: RE c [c]
charIgnoreCase :: Char -> RE Char Char
oneOfChar :: CharSet -> RE Char Char
stringIgnoreCase :: String -> RE Char String
manyStringOf :: CharSet -> RE Char String
someStringOf :: CharSet -> RE Char String
manyStringOfMin :: CharSet -> RE Char String
someStringOfMin :: CharSet -> RE Char String
naturalDec :: RE Char Natural
integerDec :: RE Char a -> RE Char Integer
naturalHex :: RE Char Natural
integerHex :: RE Char a -> RE Char Integer
wordRangeDec :: (Word, Word) -> RE Char Word
intRangeDec :: RE Char a -> (Int, Int) -> RE Char Int
wordRangeHex :: (Word, Word) -> RE Char Word
intRangeHex :: RE Char a -> (Int, Int) -> RE Char Int
wordDecN :: Int -> RE Char Word
wordHexN :: Int -> RE Char Word
foldlMany :: (b -> a -> b) -> b -> RE c a -> RE c b
foldlManyMin :: (b -> a -> b) -> b -> RE c a -> RE c b
toMatch :: RE c a -> RE c [c]
withMatch :: RE c a -> RE c ([c], a)
data Many a
- = Repeat a
- | Finite [a]
manyr :: RE c a -> RE c (Many a)
optionalMin :: RE c a -> RE c (Maybe a)
someMin :: RE c a -> RE c [a]
manyMin :: RE c a -> RE c [a]
atLeast :: Int -> RE c a -> RE c [a]
atMost :: Int -> RE c a -> RE c [a]
betweenCount :: (Int, Int) -> RE c a -> RE c [a]
atLeastMin :: Int -> RE c a -> RE c [a]
atMostMin :: Int -> RE c a -> RE c [a]
betweenCountMin :: (Int, Int) -> RE c a -> RE c [a]
sepBy :: RE c a -> RE c sep -> RE c [a]
sepBy1 :: RE c a -> RE c sep -> RE c [a]
endBy :: RE c a -> RE c sep -> RE c [a]
endBy1 :: RE c a -> RE c sep -> RE c [a]
sepEndBy :: RE c a -> RE c sep -> RE c [a]
sepEndBy1 :: RE c a -> RE c sep -> RE c [a]
chainl1 :: RE c a -> RE c (a -> a -> a) -> RE c a
chainr1 :: RE c a -> RE c (a -> a -> a) -> RE c a
reParse :: RE c a -> [c] -> Maybe a
data Parser c a
compile :: RE c a -> Parser c a
compileBounded :: Int -> RE c a -> Maybe (Parser c a)
parse :: Parser c a -> [c] -> Maybe a
parseSure :: Parser c a -> [c] -> a
find :: RE c a -> [c] -> Maybe a
findAll :: RE c a -> [c] -> [a]
splitOn :: RE c a -> [c] -> [[c]]
replace :: RE c [c] -> [c] -> Maybe [c]
replaceAll :: RE c [c] -> [c] -> [c]

`RE`s

data RE c a Source #

A regular expression. Operates on a sequence of elements of type c and capable of parsing into an a.

A RE is a Functor, Applicative, and Alternative.

pure: Succeed without consuming input.
liftA2, <*>, *>, <*: Sequential composition.
empty: Fail.
<|>: Alternative composition. Left-biased, i.e. the result of parsing using a <|> b is the result of parsing using a if it succeeds, otherwise it is the result of parsing using b if it succeeds, otherwise parsing fails.
many: Zero or more. many a parses multiple as sequentially. Biased towards matching more. Use manyMin for a bias towards matching less. Also see the section "Looping parsers".
some: One or more. some a parses multiple as sequentially. Biased towards matching more. Use someMin for a bias towards matching less.

In addition to expected Functor, Applicative, and Alternative laws, RE obeys these Applicative-Alternative laws:

a <*> empty = empty
empty <*> a = empty
(a <|> b) <*> c = (a <*> c) <|> (b <*> c)
a <*> (b <|> c) = (a <*> b) <|> (a <*> c)

Note that, because of bias, it is not true that a <|> b = b <|> a.

Performance note: Prefer the smaller of equivalent regexes, i.e. prefer (a <|> b) <*> c over (a <*> c) <|> (b <*> c).

Instances

Instances details

Alternative (RE c) Source #
Instance details Defined in Regex.Internal.Regex Methods empty :: RE c a # (<\|>) :: RE c a -> RE c a -> RE c a # some :: RE c a -> RE c [a] # many :: RE c a -> RE c [a] #
Applicative (RE c) Source #
Instance details Defined in Regex.Internal.Regex Methods pure :: a -> RE c a # (<>) :: RE c (a -> b) -> RE c a -> RE c b # liftA2 :: (a -> b -> c0) -> RE c a -> RE c b -> RE c c0 # (>) :: RE c a -> RE c b -> RE c b # (<*) :: RE c a -> RE c b -> RE c a #
Functor (RE c) Source #
Instance details Defined in Regex.Internal.Regex Methods fmap :: (a -> b) -> RE c a -> RE c b # (<$) :: a -> RE c b -> RE c a #
Monoid a => Monoid (RE c a) Source #	mempty = pure mempty
Instance details Defined in Regex.Internal.Regex Methods mempty :: RE c a # mappend :: RE c a -> RE c a -> RE c a # mconcat :: [RE c a] -> RE c a #
Semigroup a => Semigroup (RE c a) Source #	(<>) = liftA2 (<>)
Instance details Defined in Regex.Internal.Regex Methods (<>) :: RE c a -> RE c a -> RE c a # sconcat :: NonEmpty (RE c a) -> RE c a # stimes :: Integral b => b -> RE c a -> RE c a #

token :: (c -> Maybe a) -> RE c a Source #

Parse a c into an a if the given function returns Just.

satisfy :: (c -> Bool) -> RE c c Source #

Parse a c if it satisfies the given predicate.

single :: Eq c => c -> RE c c Source #

Parse the given c.

anySingle :: RE c c Source #

Parse any c.

list :: Eq c => [c] -> RE c [c] Source #

Parse the given list.

manyList :: RE c [c] Source #

Parse any list. Biased towards matching more.

someList :: RE c [c] Source #

Parse any non-empty list. Biased towards matching more.

manyListMin :: RE c [c] Source #

Parse any list. Minimal, i.e. biased towards matching less.

someListMin :: RE c [c] Source #

Parse any non-empty String. Minimal, i.e. biased towards matching less.

`Char` `RE`s

charIgnoreCase :: Char -> RE Char Char Source #

Parse the given Char, ignoring case.

Comparisons are performed after applying simple case folding as described by the Unicode standard.

oneOfChar :: CharSet -> RE Char Char Source #

Parse a Char if it is a member of the CharSet.

stringIgnoreCase :: String -> RE Char String Source #

Parse the given String, ignoring case.

Comparisons are performed after applying simple case folding as described by the Unicode standard.

manyStringOf :: CharSet -> RE Char String Source #

Parse any String containing members of the CharSet. Biased towards matching more.

someStringOf :: CharSet -> RE Char String Source #

Parse any non-empty String containing members of the CharSet. Biased towards matching more.

manyStringOfMin :: CharSet -> RE Char String Source #

Parse any String containing members of the CharSet. Minimal, i.e. biased towards matching less.

someStringOfMin :: CharSet -> RE Char String Source #

Parse any non-empty String containing members of the CharSet. Minimal, i.e. biased towards matching less.

Numeric `Char` `RE`s

naturalDec :: RE Char Natural Source #

Parse a decimal Natural. Leading zeros are not accepted. Biased towards matching more.

integerDec :: RE Char a -> RE Char Integer Source #

Parse a decimal Integer. Parse an optional sign, '-' or '+', followed by the given RE, followed by the absolute value of the integer. Leading zeros are not accepted. Biased towards matching more.

naturalHex :: RE Char Natural Source #

Parse a hexadecimal Natural. Both uppercase 'A'..'F' and lowercase 'a'..'f' are accepted. Leading zeros are not accepted. Biased towards matching more.

integerHex :: RE Char a -> RE Char Integer Source #

Parse a hexadecimal Integer. Parse an optional sign, '-' or '+', followed by the given RE, followed by the absolute value of the integer. Both uppercase 'A'..'F' and lowercase 'a'..'f' are accepted. Leading zeros are not accepted. Biased towards matching more.

wordRangeDec :: (Word, Word) -> RE Char Word Source #

Parse a decimal Word in the range [low..high]. Leading zeros are not accepted. Biased towards matching more.

intRangeDec :: RE Char a -> (Int, Int) -> RE Char Int Source #

Parse a decimal Int in the range [low..high]. Parse an optional sign, '-' or '+', followed by the given RE, followed by the absolute value of the integer. Leading zeros are not accepted. Biased towards matching more.

wordRangeHex :: (Word, Word) -> RE Char Word Source #

Parse a hexadecimal Word in the range [low..high]. Both uppercase 'A'..'F' and lowercase 'a'..'f' are accepted. Leading zeros are not accepted. Biased towards matching more.

intRangeHex :: RE Char a -> (Int, Int) -> RE Char Int Source #

Parse a hexadecimal Int in the range [low..high]. Parse an optional sign, '-' or '+', followed by the given RE, followed by the absolute value of the integer. Both uppercase 'A'..'F' and lowercase 'a'..'f' are accepted. Leading zeros are not accepted. Biased towards matching more.

wordDecN :: Int -> RE Char Word Source #

Parse a Word of exactly n decimal digits, including any leading zeros. Will not parse values that do not fit in a Word. Biased towards matching more.

wordHexN :: Int -> RE Char Word Source #

Parse a Word of exactly n hexadecimal digits, including any leading zeros. Both uppercase 'A'..'F' and lowercase 'a'..'f' are accepted. Will not parse values that do not fit in a Word. Biased towards matching more.

Combinators

foldlMany :: (b -> a -> b) -> b -> RE c a -> RE c b Source #

Parse many occurences of the given RE. Biased towards matching more.

Also see the section "Looping parsers".

foldlManyMin :: (b -> a -> b) -> b -> RE c a -> RE c b Source #

Parse many occurences of the given RE. Minimal, i.e. biased towards matching less.

toMatch :: RE c a -> RE c [c] Source #

Rebuild the RE such that the result is the matched section of the list instead.

withMatch :: RE c a -> RE c ([c], a) Source #

Rebuild the RE to include the matched section of the list alongside the result.

data Many a Source #

Constructors

Repeat a	A single value repeating indefinitely
Finite [a]	A finite list

Instances

Instances details

Foldable Many Source #
Instance details Defined in Regex.Internal.Regex Methods fold :: Monoid m => Many m -> m # foldMap :: Monoid m => (a -> m) -> Many a -> m # foldMap' :: Monoid m => (a -> m) -> Many a -> m # foldr :: (a -> b -> b) -> b -> Many a -> b # foldr' :: (a -> b -> b) -> b -> Many a -> b # foldl :: (b -> a -> b) -> b -> Many a -> b # foldl' :: (b -> a -> b) -> b -> Many a -> b # foldr1 :: (a -> a -> a) -> Many a -> a # foldl1 :: (a -> a -> a) -> Many a -> a # toList :: Many a -> [a] # null :: Many a -> Bool # length :: Many a -> Int # elem :: Eq a => a -> Many a -> Bool # maximum :: Ord a => Many a -> a # minimum :: Ord a => Many a -> a # sum :: Num a => Many a -> a # product :: Num a => Many a -> a #
Eq1 Many Source #
Instance details Defined in Regex.Internal.Regex Methods liftEq :: (a -> b -> Bool) -> Many a -> Many b -> Bool #
Ord1 Many Source #
Instance details Defined in Regex.Internal.Regex Methods liftCompare :: (a -> b -> Ordering) -> Many a -> Many b -> Ordering #
Show1 Many Source #
Instance details Defined in Regex.Internal.Regex Methods liftShowsPrec :: (Int -> a -> ShowS) -> ([a] -> ShowS) -> Int -> Many a -> ShowS # liftShowList :: (Int -> a -> ShowS) -> ([a] -> ShowS) -> [Many a] -> ShowS #
Functor Many Source #
Instance details Defined in Regex.Internal.Regex Methods fmap :: (a -> b) -> Many a -> Many b # (<$) :: a -> Many b -> Many a #
NFData1 Many Source #
Instance details Defined in Regex.Internal.Regex Methods liftRnf :: (a -> ()) -> Many a -> () #
Show a => Show (Many a) Source #
Instance details Defined in Regex.Internal.Regex Methods showsPrec :: Int -> Many a -> ShowS # show :: Many a -> String # showList :: [Many a] -> ShowS #
NFData a => NFData (Many a) Source #
Instance details Defined in Regex.Internal.Regex Methods rnf :: Many a -> () #
Eq a => Eq (Many a) Source #
Instance details Defined in Regex.Internal.Regex Methods (==) :: Many a -> Many a -> Bool # (/=) :: Many a -> Many a -> Bool #
Ord a => Ord (Many a) Source #
Instance details Defined in Regex.Internal.Regex Methods compare :: Many a -> Many a -> Ordering # (<) :: Many a -> Many a -> Bool # (<=) :: Many a -> Many a -> Bool # (>) :: Many a -> Many a -> Bool # (>=) :: Many a -> Many a -> Bool # max :: Many a -> Many a -> Many a # min :: Many a -> Many a -> Many a #

manyr :: RE c a -> RE c (Many a) Source #

Zero or more. Biased towards matching more.

Also see the section "Looping parsers".

optionalMin :: RE c a -> RE c (Maybe a) Source #

Zero or one. Minimal, i.e. biased towards zero.

Use Control.Applicative.optional for the same but biased towards one.

someMin :: RE c a -> RE c [a] Source #

One or more. Minimal, i.e. biased towards matching less.

manyMin :: RE c a -> RE c [a] Source #

Zero or more. Minimal, i.e. biased towards matching less.

atLeast :: Int -> RE c a -> RE c [a] Source #

At least n times. Biased towards matching more.

atMost :: Int -> RE c a -> RE c [a] Source #

At most n times. Biased towards matching more.

betweenCount :: (Int, Int) -> RE c a -> RE c [a] Source #

Between m and n times (inclusive). Biased towards matching more.

atLeastMin :: Int -> RE c a -> RE c [a] Source #

At least n times. Minimal, i.e. biased towards matching less.

atMostMin :: Int -> RE c a -> RE c [a] Source #

At most n times. Minimal, i.e. biased towards matching less.

betweenCountMin :: (Int, Int) -> RE c a -> RE c [a] Source #

Between m and n times (inclusive). Minimal, i.e. biased towards matching less.

sepBy :: RE c a -> RE c sep -> RE c [a] Source #

r `sepBy` sep parses zero or more occurences of r, separated by sep. Biased towards matching more.

sepBy1 :: RE c a -> RE c sep -> RE c [a] Source #

r `sepBy1` sep parses one or more occurences of r, separated by sep. Biased towards matching more.

endBy :: RE c a -> RE c sep -> RE c [a] Source #

r `endBy` sep parses zero or more occurences of r, separated and ended by sep. Biased towards matching more.

endBy1 :: RE c a -> RE c sep -> RE c [a] Source #

r `endBy1` sep parses one or more occurences of r, separated and ended by sep. Biased towards matching more.

sepEndBy :: RE c a -> RE c sep -> RE c [a] Source #

r `sepEndBy` sep parses zero or more occurences of r, separated and optionally ended by sep. Biased towards matching more.

sepEndBy1 :: RE c a -> RE c sep -> RE c [a] Source #

r `sepEndBy1` sep parses one or more occurences of r, separated and optionally ended by sep. Biased towards matching more.

chainl1 :: RE c a -> RE c (a -> a -> a) -> RE c a Source #

chainl1 r op parses one or more occurences of r, separated by op. The result is obtained by left associative application of all functions returned by op to the values returned by p. Biased towards matching more.

chainr1 :: RE c a -> RE c (a -> a -> a) -> RE c a Source #

chainr1 r op parses one or more occurences of r, separated by op. The result is obtained by right associative application of all functions returned by op to the values returned by p. Biased towards matching more.

Combinators in `base`

Various combinators are available in base that work with REs, by virtue of RE being Applicative and Alternative. Since this package does not attempt to redefine or re-export such combinators, you need to import and use them. Commonly used combinators are:

Control.Applicative: liftA2, <|>, empty, many, some, optional
Control.Monad: void, replicateM, replicateM_
Data.Foldable: traverse_, for_, sequenceA_, asum
Data.Traversable: traverse, for, sequenceA

Compile and parse

reParse :: RE c a -> [c] -> Maybe a Source #

$O(mn \log m)$. Parse a list with a RE.

Uses compile, see the note there.

If parsing multiple lists using the same RE, it is wasteful to compile the RE every time. So, prefer to

Compile once with compile or compileBounded and use the compiled Parser with parse as many times as required.
Alternately, partially apply this function to a RE and use the function as many times as required.

data Parser c a Source #

A parser compiled from a RE c a.

compile :: RE c a -> Parser c a Source #

$O(m)$. Compile a RE c a to a Parser c a.

Note: compile does not limit the size of the RE. See compileBounded if you would like to limit the size. REs with size greater than (maxBound::Int) `div` 2 are not supported and the behavior of such a RE is undefined.

compileBounded :: Int -> RE c a -> Maybe (Parser c a) Source #

$O(\min(l,m))$. Compile a RE c a to a Parser c a.

Returns Nothing if the size of the RE is greater than the provided limit $l$. You may want to use this if you suspect that the RE may be too large, for instance if the regex is constructed from an untrusted source.

While the exact size of a RE depends on an internal representation, it can be assumed to be in the same order as the length of a regex pattern corresponding to the RE.

parse :: Parser c a -> [c] -> Maybe a Source #

$O(mn \log m)$. Parse a list with a Parser.

parseSure :: Parser c a -> [c] -> a Source #

$O(mn \log m)$. Parse a list with a Parser. Calls error on parse failure.

For use with parsers that are known to never fail.

List operations

find :: RE c a -> [c] -> Maybe a Source #

$O(mn \log m)$. Find the first occurence of the given RE in a list.

Examples

Expand

>>> find (list "meow") "homeowner"
Just "meow"

To test whether a list is present in another list, like above, prefer Data.List.isInfixOf.

>>> find (stringIgnoreCase "haskell") "Look I'm Haskelling!"
Just "Haskell"
>>> find (list "backtracking") "parser-regex"
Nothing

findAll :: RE c a -> [c] -> [a] Source #

$O(mn \log m)$. Find all non-overlapping occurences of the given RE in the list.

Examples

Expand

>>> findAll (list "ana") "banananana"
["ana","ana"]

data Roll = Roll
  Natural -- ^ Rolls
  Natural -- ^ Faces on the die
  deriving Show

roll :: RE Char Roll
roll = Roll <$> (naturalDec <|> pure 1) <* single 'd' <*> naturalDec

>>> findAll roll "3d6, d10, 2d10"
[Roll 3 6,Roll 1 10,Roll 2 10]

splitOn :: RE c a -> [c] -> [[c]] Source #

$O(mn \log m)$. Split a list at occurences of the given RE.

Examples

Expand

>>> splitOn (single ' ') "Glasses are really versatile"
["Glasses","are","really","versatile"]

In cases like above, prefer using words or lines instead, if applicable.

>>> splitOn (single ' ' *> oneOfChar "+-=" *> single ' ') "3 - 1 + 1/2 - 2 = 0"
["3","1","1/2","2","0"]

If the list starts or ends with a delimiter, the result will contain empty lists at those positions.

>>> splitOn (single 'a') "ayaya"
["","y","y",""]

replace :: RE c [c] -> [c] -> Maybe [c] Source #

$O(mn \log m)$. Replace the first match of the given RE with its result. If there is no match, the result is Nothing.

Examples

Expand

>>> replace ("world" <$ list "Haskell") "Hello, Haskell!"
Just "Hello, world!"

>>> replace ("," <$ some (single '.')) "one...two...ten"
Just "one,two...ten"

replaceAll :: RE c [c] -> [c] -> [c] Source #

$O(mn \log m)$. Replace non-overlapping matches of the given RE with their results.

Examples

Expand

>>> replaceAll (" and " <$ list ", ") "red, blue, green"
"red and blue and green"

>>> replaceAll ("Fruit" <$ list "Time" <|> "banana" <$ list "arrow") "Time flies like an arrow"
"Fruit flies like a banana"

sep = oneOfChar "-./"
digits n = replicateM n (oneOfChar digit)
toYmd d m y = concat [y, "-", m, "-", d]
date = toYmd <$> digits 2 <* sep
             <*> digits 2 <* sep
             <*> digits 4

>>> replaceAll date "01/01/1970, 01-04-1990, 03.07.2011"
"1970-01-01, 1990-04-01, 2011-07-03"

Additional information

Recursive definitions

It is not possible to define a RE recursively. If it were permitted, it would be capable of parsing more than regular languages. Unfortunately, there is no good way* to make it impossible to write such a regex in the first place. So it must be avoided by the programmer. As an example, avoid this:

re :: RE Char [String]
re = liftA2 (:) (list "ha") re <|> [] <$ list "!"  -- diverges!

Instead, use appropriate combinators from this module:

re = many (list "ha") <* list "!"

For the same reason, be cautious when using combinators from the other packages on REs. Make sure that they do not attempt to construct a recursive RE.

If you find that your regex is impossible to write without recursion, you are attempting to parse a non-regular language! You need a more powerful parser than what this library has to offer.

* Unlifted datatypes can serve this purpose but they are too inconvenient to work with.

Laziness

Parsing is lazy in the result value, i.e. the a in RE c a or Parser c a. In fact, for the algorithm used in this library, this laziness is essential for good runtime complexity. However, there is little reason to be lazy in other aspects, such as the values of the sequence, c, or the functions and regexes used in combinators. Functions are strict in such arguments.

-- Lazy in the result
reParse (pure ⊥) "" = Just ⊥
reParse (fmap (\_ -> ⊥) (char 'a')) "a" = Just ⊥

-- Strict in places like
single ⊥ = ⊥
fmap ⊥ r = ⊥
liftA2 f r ⊥ = ⊥

Looping parsers

What should be the result of reParse (many (pure ())) ""?

Since many r parses r as many times as possible, and pure () succeeds without consuming input, the result should arguably be the infinite list repeat (). Similarly, reParse (foldlMany f z (pure ())) "" should diverge. Note that this applies to not just pure x, but any regex that can succeed without consuming input, such as many x, manyMin x, etc.

This library considers that such an outcome is not desirable in practice. It would be surprising to get an infinite structure from your parser. So, in the case that many succeeds an infinite number of times, this library treats it as succeeding zero times.

By this rule, reParse (many (pure ())) "" parses as [] and reParse (foldlMany f z (pure ())) "" parses as z.

This behavior makes it impossible to distinguish between zero parses and infinite parses. To address this, an alternate combinator manyr is provided. This parses into a Many, a type that clearly indicates if parsing succeeded without consuming input into an infinite list, or if it succeeded a finite number of times.

Performance

This section may be useful for someone looking to understand the performance of this library without diving into the source code.

Parsing with a RE is done in two distinct steps.

A RE is compiled to a Parser in $O(m)$ time, where $m$ is the size of the RE. This is a nondeterministic finite automaton (NFA).
The Parser is run on a list in $O(mn \log m)$ time, where $n$ is the length of the list. Assumes every Char is parsed in $O(1)$.

Performance note: Use (<$) over (<$>), and (<*)/(*>) over liftA2/(<*>) when ignoring the result of a RE. Knowing the result is ignored allows compiling to a faster parser.

Memory usage for parsing is $O(nm)$.

If the result of a RE is ignored using (<$), (<*), or (*>), only $O(m)$ memory is required.

This applies even as subcomponents. So, any subcomponent RE of a larger RE that is only recognizing a section of the list is cheaper in terms of memory.

REs

Instances

Char REs

Numeric Char REs

Combinators

Instances

Combinators in base

Compile and parse

List operations

Examples

Examples

Examples

Examples

Examples

Additional information

Recursive definitions

Laziness

Looping parsers

Performance

`RE`s

`Char` `RE`s

Numeric `Char` `RE`s

Combinators in `base`