Safe Haskell	None
Language	Haskell98

Text.Regex.TDFA

Description

The Text.Regex.TDFA module provides a backend for regular expressions. It provides instances for the classes defined and documented in Text.Regex.Base and re-exported by this module. If you import this along with other backends then you should do so with qualified imports (with renaming for convenience).

This regex-tdfa package implements, correctly, POSIX extended regular expressions. It is highly unlikely that the regex-posix package on your operating system is correct, see http://www.haskell.org/haskellwiki/Regex_Posix for examples of your OS's bugs.

Importing and using

Add to your package.yaml/cabal file:

dependencies:
  - regex-tdfa

In modules where you need to use regexes:

import Text.Regex.TDFA

Note that regex-tdfa does not provide support for Text by default. If you need this functionality, add regex-tdfa-text as a dependency and import Text.Regex.TDFA.Text ().

Basics

λ> let emailRegex = "[a-zA-Z0-9+._-]+@[a-zA-Z-]+\\.[a-z]+"
λ> "my email is email@email.com" =~ emailRegex :: Bool
>>> True

-- non-monadic
λ> <to-match-against> =~ <regex>

-- monadic, uses fail on lack of match
λ> <to-match-against> =~~ <regex>

(=~) and (=~~) are polymorphic in their return type. This is so that regex-tdfa can pick the most efficient way to give you your result based on what you need. For instance, if all you want is to check whether the regex matched or not, there's no need to allocate a result string. If you only want the first match, rather than all the matches, then the matching engine can stop after finding a single hit.

This does mean, though, that you may sometimes have to explicitly specify the type you want, especially if you're trying things out at the REPL.

Common use cases

Get the first match

-- returns empty string if no match
a =~ b :: String  -- or ByteString, or Text...

λ> "alexis-de-tocqueville" =~ "[a-z]+" :: String
>>> "alexis"

λ> "alexis-de-tocqueville" =~ "[0-9]+" :: String
>>> ""

Check if it matched at all

a =~ b :: Bool

λ> "alexis-de-tocqueville" =~ "[a-z]+" :: Bool
>>> True

Get first match + text before/after

-- if no match, will just return whole
-- string in the first element of the tuple
a =~ b :: (String, String, String)

λ> "alexis-de-tocqueville" =~ "de" :: (String, String, String)
>>> ("alexis-", "de", "-tocqueville")

λ> "alexis-de-tocqueville" =~ "kant" :: (String, String, String)
>>> ("alexis-de-tocqueville", "", "")

Get first match + submatches

-- same as above, but also returns a list of just submatches.
-- submatch list is empty if regex doesn't match at all
a =~ b :: (String, String, String, [String])

λ> "div[attr=1234]" =~ "div\\[([a-z]+)=([^]]+)\\]" :: (String, String, String, [String])
>>> ("", "div[attr=1234]", "", ["attr","1234"])

Get all matches

-- can also return Data.Array instead of List
getAllTextMatches (a =~ b) :: [String]

λ> getAllTextMatches ("john anne yifan" =~ "[a-z]+") :: [String]
>>> ["john","anne","yifan"]

Feature support

This package does provide captured parenthesized subexpressions.

Depending on the text being searched this package supports Unicode. The [Char] and (Seq Char) text types support Unicode. The ByteString and ByteString.Lazy text types only support ASCII. It is possible to support utf8 encoded ByteString.Lazy by using regex-tdfa and regex-tdfa-utf8 packages together (required the utf8-string package).

As of version 1.1.1 the following GNU extensions are recognized, all anchors:

\` at beginning of entire text
\' at end of entire text
\< at beginning of word
\> at end of word
\b at either beginning or end of word
\B at neither beginning nor end of word

The above are controlled by the newSyntax Bool in CompOption.

Where the "word" boundaries means between characters that are and are not in the [:word:] character class which contains [a-zA-Z0-9_]. Note that \< and \b may match before the entire text and \> and \b may match at the end of the entire text.

There is no locale support, so collating elements like [.ch.] are simply ignored and equivalence classes like [=a=] are converted to just [a]. The character classes like [:alnum:] are supported over ASCII only, valid classes are alnum, digit, punct, alpha, graph, space, blank, lower, upper, cntrl, print, xdigit, word.

This package does not provide "basic" regular expressions. This package does not provide back references inside regular expressions.

The package does not provide Perl style regular expressions. Please look at the regex-pcre and pcre-light packages instead.

This package does not provide find-and-replace.

Avoiding backslashes

If you find yourself writing a lot of regexes, take a look at raw-strings-qq. It'll let you write regexes without needing to escape all your backslashes.

{-# LANGUAGE QuasiQuotes #-}

import Text.RawString.QQ
import Text.Regex.TDFA

λ> "2 * (3 + 1) / 4" =~ [r|\([^)]+\)|] :: String
>>> "(3 + 1)"

Synopsis

getVersion_Text_Regex_TDFA :: Version
(=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) => source1 -> source -> target
(=~~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target, Monad m) => source1 -> source -> m target
data Regex
data ExecOption = ExecOption {
- captureGroups :: Bool
}
data CompOption = CompOption {
- caseSensitive :: Bool
- multiline :: Bool
- rightAssoc :: Bool
- newSyntax :: Bool
- lastStarGreedy :: Bool
}
module Text.Regex.Base

Documentation

getVersion_Text_Regex_TDFA :: Version Source #

(=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) => source1 -> source -> target Source #

This is the pure functional matching operator. If the target cannot be produced then some empty result will be returned. If there is an error in processing, then error will be called.

(=~~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target, Monad m) => source1 -> source -> m target Source #

This is the monadic matching operator. If a single match fails, then fail will be called.

data Regex Source #

The TDFA backend specific Regex type, used by this module's RegexOptions and RegexMaker

Instances

RegexLike Regex String Source #
Instance details Defined in Text.Regex.TDFA.String Methods matchOnce :: Regex -> String -> Maybe MatchArray # matchAll :: Regex -> String -> [MatchArray] # matchCount :: Regex -> String -> Int # matchTest :: Regex -> String -> Bool # matchAllText :: Regex -> String -> [MatchText String] # matchOnceText :: Regex -> String -> Maybe (String, MatchText String, String) #
RegexLike Regex ByteString Source #
Instance details Defined in Text.Regex.TDFA.ByteString.Lazy Methods matchOnce :: Regex -> ByteString -> Maybe MatchArray # matchAll :: Regex -> ByteString -> [MatchArray] # matchCount :: Regex -> ByteString -> Int # matchTest :: Regex -> ByteString -> Bool # matchAllText :: Regex -> ByteString -> [MatchText ByteString] # matchOnceText :: Regex -> ByteString -> Maybe (ByteString, MatchText ByteString, ByteString) #
RegexLike Regex ByteString Source #
Instance details Defined in Text.Regex.TDFA.ByteString Methods matchOnce :: Regex -> ByteString -> Maybe MatchArray # matchAll :: Regex -> ByteString -> [MatchArray] # matchCount :: Regex -> ByteString -> Int # matchTest :: Regex -> ByteString -> Bool # matchAllText :: Regex -> ByteString -> [MatchText ByteString] # matchOnceText :: Regex -> ByteString -> Maybe (ByteString, MatchText ByteString, ByteString) #
RegexOptions Regex CompOption ExecOption Source #
Instance details Defined in Text.Regex.TDFA.Common Methods blankCompOpt :: CompOption # blankExecOpt :: ExecOption # defaultCompOpt :: CompOption # defaultExecOpt :: ExecOption # setExecOpts :: ExecOption -> Regex -> Regex # getExecOpts :: Regex -> ExecOption #
RegexContext Regex String String Source #
Instance details Defined in Text.Regex.TDFA.String Methods match :: Regex -> String -> String # matchM :: Monad m => Regex -> String -> m String #
RegexContext Regex ByteString ByteString Source #
Instance details Defined in Text.Regex.TDFA.ByteString.Lazy Methods match :: Regex -> ByteString -> ByteString # matchM :: Monad m => Regex -> ByteString -> m ByteString #
RegexContext Regex ByteString ByteString Source #
Instance details Defined in Text.Regex.TDFA.ByteString Methods match :: Regex -> ByteString -> ByteString # matchM :: Monad m => Regex -> ByteString -> m ByteString #
RegexMaker Regex CompOption ExecOption String Source #
Instance details Defined in Text.Regex.TDFA.String Methods makeRegex :: String -> Regex # makeRegexOpts :: CompOption -> ExecOption -> String -> Regex # makeRegexM :: Monad m => String -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> String -> m Regex #
RegexMaker Regex CompOption ExecOption ByteString Source #
Instance details Defined in Text.Regex.TDFA.ByteString.Lazy Methods makeRegex :: ByteString -> Regex # makeRegexOpts :: CompOption -> ExecOption -> ByteString -> Regex # makeRegexM :: Monad m => ByteString -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> ByteString -> m Regex #
RegexMaker Regex CompOption ExecOption ByteString Source #
Instance details Defined in Text.Regex.TDFA.ByteString Methods makeRegex :: ByteString -> Regex # makeRegexOpts :: CompOption -> ExecOption -> ByteString -> Regex # makeRegexM :: Monad m => ByteString -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> ByteString -> m Regex #
RegexMaker Regex CompOption ExecOption (Seq Char) Source #
Instance details Defined in Text.Regex.TDFA.Sequence Methods makeRegex :: Seq Char -> Regex # makeRegexOpts :: CompOption -> ExecOption -> Seq Char -> Regex # makeRegexM :: Monad m => Seq Char -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> Seq Char -> m Regex #
RegexLike Regex (Seq Char) Source #
Instance details Defined in Text.Regex.TDFA.Sequence Methods matchOnce :: Regex -> Seq Char -> Maybe MatchArray # matchAll :: Regex -> Seq Char -> [MatchArray] # matchCount :: Regex -> Seq Char -> Int # matchTest :: Regex -> Seq Char -> Bool # matchAllText :: Regex -> Seq Char -> [MatchText (Seq Char)] # matchOnceText :: Regex -> Seq Char -> Maybe (Seq Char, MatchText (Seq Char), Seq Char) #
RegexContext Regex (Seq Char) (Seq Char) Source #
Instance details Defined in Text.Regex.TDFA.Sequence Methods match :: Regex -> Seq Char -> Seq Char # matchM :: Monad m => Regex -> Seq Char -> m (Seq Char) #

data ExecOption Source #

Constructors

ExecOption
Fields captureGroups :: Bool True by default. Set to False to improve speed (and space).

Instances

Read ExecOption Source #
Instance details Defined in Text.Regex.TDFA.Common Methods readsPrec :: Int -> ReadS ExecOption # readList :: ReadS [ExecOption] # readPrec :: ReadPrec ExecOption # readListPrec :: ReadPrec [ExecOption] #
Show ExecOption Source #
Instance details Defined in Text.Regex.TDFA.Common Methods showsPrec :: Int -> ExecOption -> ShowS # show :: ExecOption -> String # showList :: [ExecOption] -> ShowS #
RegexOptions Regex CompOption ExecOption Source #
Instance details Defined in Text.Regex.TDFA.Common Methods blankCompOpt :: CompOption # blankExecOpt :: ExecOption # defaultCompOpt :: CompOption # defaultExecOpt :: ExecOption # setExecOpts :: ExecOption -> Regex -> Regex # getExecOpts :: Regex -> ExecOption #
RegexMaker Regex CompOption ExecOption String Source #
Instance details Defined in Text.Regex.TDFA.String Methods makeRegex :: String -> Regex # makeRegexOpts :: CompOption -> ExecOption -> String -> Regex # makeRegexM :: Monad m => String -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> String -> m Regex #
RegexMaker Regex CompOption ExecOption ByteString Source #
Instance details Defined in Text.Regex.TDFA.ByteString.Lazy Methods makeRegex :: ByteString -> Regex # makeRegexOpts :: CompOption -> ExecOption -> ByteString -> Regex # makeRegexM :: Monad m => ByteString -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> ByteString -> m Regex #
RegexMaker Regex CompOption ExecOption ByteString Source #
Instance details Defined in Text.Regex.TDFA.ByteString Methods makeRegex :: ByteString -> Regex # makeRegexOpts :: CompOption -> ExecOption -> ByteString -> Regex # makeRegexM :: Monad m => ByteString -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> ByteString -> m Regex #
RegexMaker Regex CompOption ExecOption (Seq Char) Source #
Instance details Defined in Text.Regex.TDFA.Sequence Methods makeRegex :: Seq Char -> Regex # makeRegexOpts :: CompOption -> ExecOption -> Seq Char -> Regex # makeRegexM :: Monad m => Seq Char -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> Seq Char -> m Regex #

data CompOption Source #

Control whether the pattern is multiline or case-sensitive like Text.Regex and whether to capture the subgroups (\1, \2, etc). Controls enabling extra anchor syntax.

Constructors

CompOption

Fields

caseSensitive :: Bool
True in blankCompOpt and defaultCompOpt
multiline :: Bool
False in blankCompOpt, True in defaultCompOpt. Compile for newline-sensitive matching. "By default, newline is a completely ordinary character with no special meaning in either REs or strings. With this flag, inverted bracket expressions and . never match newline, a ^ anchor matches the null string after any newline in the string in addition to its normal function, and the $ anchor matches the null string before any newline in the string in addition to its normal function."
rightAssoc :: Bool
True (and therefore Right associative) in blankCompOpt and defaultCompOpt
newSyntax :: Bool
False in blankCompOpt, True in defaultCompOpt. Add the extended non-POSIX syntax described in Text.Regex.TDFA haddock documentation.
lastStarGreedy :: Bool
False by default. This is POSIX correct but it takes space and is slower. Setting this to true will improve performance, and should be done if you plan to set the captureGroups execoption to False.

Instances

Read CompOption Source #
Instance details Defined in Text.Regex.TDFA.Common Methods readsPrec :: Int -> ReadS CompOption # readList :: ReadS [CompOption] # readPrec :: ReadPrec CompOption # readListPrec :: ReadPrec [CompOption] #
Show CompOption Source #
Instance details Defined in Text.Regex.TDFA.Common Methods showsPrec :: Int -> CompOption -> ShowS # show :: CompOption -> String # showList :: [CompOption] -> ShowS #
RegexOptions Regex CompOption ExecOption Source #
Instance details Defined in Text.Regex.TDFA.Common Methods blankCompOpt :: CompOption # blankExecOpt :: ExecOption # defaultCompOpt :: CompOption # defaultExecOpt :: ExecOption # setExecOpts :: ExecOption -> Regex -> Regex # getExecOpts :: Regex -> ExecOption #
RegexMaker Regex CompOption ExecOption String Source #
Instance details Defined in Text.Regex.TDFA.String Methods makeRegex :: String -> Regex # makeRegexOpts :: CompOption -> ExecOption -> String -> Regex # makeRegexM :: Monad m => String -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> String -> m Regex #
RegexMaker Regex CompOption ExecOption ByteString Source #
Instance details Defined in Text.Regex.TDFA.ByteString.Lazy Methods makeRegex :: ByteString -> Regex # makeRegexOpts :: CompOption -> ExecOption -> ByteString -> Regex # makeRegexM :: Monad m => ByteString -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> ByteString -> m Regex #
RegexMaker Regex CompOption ExecOption ByteString Source #
Instance details Defined in Text.Regex.TDFA.ByteString Methods makeRegex :: ByteString -> Regex # makeRegexOpts :: CompOption -> ExecOption -> ByteString -> Regex # makeRegexM :: Monad m => ByteString -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> ByteString -> m Regex #
RegexMaker Regex CompOption ExecOption (Seq Char) Source #
Instance details Defined in Text.Regex.TDFA.Sequence Methods makeRegex :: Seq Char -> Regex # makeRegexOpts :: CompOption -> ExecOption -> Seq Char -> Regex # makeRegexM :: Monad m => Seq Char -> m Regex # makeRegexOptsM :: Monad m => CompOption -> ExecOption -> Seq Char -> m Regex #

module Text.Regex.Base