-- | This module offers regexes, combinators, and operations to work with the -- 'Data.Text.Text' type from the @text@ package. -- module Regex.Text ( -- * @RE@s R.RE , T.TextToken , T.REText , T.token , T.satisfy , T.char , T.charIgnoreCase , T.anyChar , T.oneOf , T.text , T.textIgnoreCase , T.manyText , T.someText , T.manyTextMin , T.someTextMin , T.manyTextOf , T.someTextOf , T.manyTextOfMin , T.someTextOfMin -- * Numeric @RE@s , T.naturalDec , T.integerDec , T.naturalHex , T.integerHex , T.wordRangeDec , T.intRangeDec , T.wordRangeHex , T.intRangeHex , T.wordDecN , T.wordHexN -- * Combinators , R.foldlMany , R.foldlManyMin , T.toMatch , T.withMatch , R.Many(..) , R.manyr , R.optionalMin , R.someMin , R.manyMin , R.atLeast , R.atMost , R.betweenCount , R.atLeastMin , R.atMostMin , R.betweenCountMin , R.sepBy , R.sepBy1 , R.endBy , R.endBy1 , R.sepEndBy , R.sepEndBy1 , R.chainl1 , R.chainr1 -- * Combinators in @base@ -- $combase -- * Compile and parse , T.reParse , P.Parser , T.ParserText , P.compile , P.compileBounded , T.parse , T.parseSure -- * Text operations , T.find , T.findAll , T.splitOn , T.replace , T.replaceAll -- * Additional information -- $info ) where import qualified Regex.Internal.Regex as R import qualified Regex.Internal.Parser as P import qualified Regex.Internal.Text as T -- $combase -- -- Various combinators are available in @base@ that work with @RE@s, by virtue -- of @RE@ being @Applicative@ and @Alternative@. -- Since this package does not attempt to redefine or re-export such -- combinators, you need to import and use them. Commonly used combinators -- are: -- -- * "Control.Applicative": @liftA2@, @\<|>@, @empty@, @many@, @some@, -- @optional@ -- * "Control.Monad": @void@, @replicateM@, @replicateM_@ -- * "Data.Foldable": @traverse_@, @for_@, @sequenceA_@, @asum@ -- * "Data.Traversable": @traverse@, @for@, @sequenceA@ -- -- $info -- -- == Recursive definitions -- -- It is not possible to define a @RE@ recursively. If it were permitted, it -- would be capable of parsing more than -- [regular languages](https://en.wikipedia.org/wiki/Regular_language). -- Unfortunately, there is no good way\* to make it impossible to write such -- a regex in the first place. So it must be avoided by the programmer. As an -- example, avoid this: -- -- @ -- re :: REText [Text] -- re = liftA2 (:) (text "ha") re \<|> [] \<$ text "!" -- diverges! -- @ -- -- Instead, use appropriate combinators from this module: -- -- @ -- re = many (text "ha") <* text "!" -- @ -- -- For the same reason, be cautious when using combinators from the other -- packages on @RE@s. Make sure that they do not attempt to construct a -- recursive @RE@. -- -- If you find that your regex is impossible to write without recursion, -- you are attempting to parse a non-regular language! You need a more powerful -- parser than what this library has to offer. -- -- \* [Unlifted datatypes](https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/primitives.html#unlifted-datatypes) -- can serve this purpose but they are too inconvenient to work with. -- -- == Laziness -- -- Parsing is lazy in the result value, i.e. the @a@ in @RE c a@ or -- @Parser c a@. In fact, for the algorithm used in this library, this laziness -- is essential for good runtime complexity. However, there is little reason -- to be lazy in other aspects, such as the values of the sequence, @c@, or the -- functions and regexes used in combinators. Functions are strict in such -- arguments. -- -- @ -- -- Lazy in the result -- reParse (pure ⊥) "" = Just ⊥ -- reParse (fmap (\\_ -> ⊥) (char \'a\')) "a" = Just ⊥ -- -- -- Strict in places like -- char ⊥ = ⊥ -- fmap ⊥ r = ⊥ -- liftA2 f r ⊥ = ⊥ -- @ -- -- == Looping parsers -- -- What should be the result of @reParse (many (pure ())) ""@? -- -- Since @many r@ parses @r@ as many times as possible, and @pure ()@ succeeds -- without consuming input, the result should arguably be the infinite list -- @repeat ()@. Similarly, @reParse (foldlMany f z (pure ())) ""@ should -- diverge. Note that this applies to not just @pure x@, but any regex that -- can succeed without consuming input, such as @many x@, @manyMin x@, etc. -- -- This library considers that such an outcome is not desirable in practice. It -- would be surprising to get an infinite structure from your parser. So, in the -- case that @many@ succeeds an infinite number of times, this library treats it -- as succeeding /zero/ times. -- -- By this rule, @reParse (many (pure ())) ""@ parses as @[]@ and -- @reParse (foldlMany f z (pure ())) ""@ parses as @z@. -- -- This behavior makes it impossible to distinguish between zero parses and -- infinite parses. To address this, an alternate combinator 'Regex.Text.manyr' -- is provided. This parses into a 'Regex.Text.Many', a type that clearly -- indicates if parsing succeeded without consuming input into an infinite list, -- or if it succeeded a finite number of times. -- -- == Performance -- -- This section may be useful for someone looking to understand the performance -- of this library without diving into the source code. -- -- Parsing with a @RE@ is done in two distinct steps. -- -- 1. A @RE@ is compiled to a @Parser@ in \(O(m)\) time, where \(m\) is the size -- of the @RE@. This is a -- [nondeterministic finite automaton](https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton) -- (NFA). -- 2. The @Parser@ is run on a @Text@ in \(O(mn \log m)\) time, where \(n\) is -- the length of the @Text@. Assumes every @Char@ is parsed in \(O(1)\). -- -- /Performance note/: Use @(\<$)@ over @(\<$>)@, and @(\<*)@\/@(*>)@ over -- @liftA2@\/@(\<*>)@ when ignoring the result of a @RE@. Knowing the result is -- ignored allows compiling to a faster parser. -- -- Memory usage for parsing is \(O(nm)\). -- -- * If the result of a @RE@ is ignored using @(\<$)@, @(\<*)@, or @(*>)@, only -- \(O(m)\) memory is required. -- * To parse some slice of the input @Text@ (using one of @manyText@, -- @manyTextOf@, etc.), memory required is \(O(1)\). For @toMatch r@, memory -- required is \(O(m' \min (m',n))\) where \(m'\) is the size of @r@. -- -- This applies even as subcomponents. So, any subcomponent @RE@ of a larger -- @RE@ that is only recognizing text or parsing a slice is cheaper in terms of -- memory. --