marvin-0.1.4: A modular chat bot

Copyright(c) Justus Adam 2016
LicenseBSD3
Maintainerdev@justus.science
Stabilityexperimental
PortabilityPOSIX
Safe HaskellNone
LanguageHaskell2010

Marvin.Util.Regex

Description

 

Synopsis

Documentation

data Regex Source #

Abstract Wrapper for a reglar expression implementation. Has an IsString implementation, so literal strings can be used to create a Regex. Alternatively use r to create one with custom options.

Instances

type Match = [Text] Source #

A match to a Regex. Index 0 is the full match, all other indices are match groups.

r :: [MatchOption] -> Text -> Regex Source #

Compile a regex with options

Normally it is sufficient to just write the regex as a plain string and have it be converted automatically, but if you want certain match options you can use this function.

match :: Regex -> Text -> Maybe Match Source #

Match a regex against a string and return the first match found (if any).

data MatchOption :: * #

Options for controlling matching behaviour.

Constructors

CaseInsensitive

Enable case insensitive matching.

Comments

Allow comments and white space within patterns.

DotAll

If set, '.' matches line terminators. Otherwise '.' matching stops at line end.

Literal

If set, treat the entire pattern as a literal string. Metacharacters or escape sequences in the input sequence will be given no special meaning.

The option CaseInsensitive retains its meanings on matching when used in conjunction with this option. Other options become superfluous.

Multiline

Control behaviour of '$' and '^'. If set, recognize line terminators within string, Otherwise, match only at start and end of input string.

HaskellLines

Haskell-only line endings. When this mode is enabled, only '\n' is recognized as a line ending in the behavior of '.', '^', and '$'.

UnicodeWord

Unicode word boundaries. If set, '\\b' uses the Unicode TR 29 definition of word boundaries.

Warning: Unicode word boundaries are quite different from traditional regular expression word boundaries. See http://unicode.org/reports/tr29/#Word_Boundaries.

ErrorOnUnknownEscapes

Throw an error on unrecognized backslash escapes. If set, fail with an error on patterns that contain backslash-escaped ASCII letters without a known special meaning. If this flag is not set, these escaped letters represent themselves.

WorkLimit Int

Set a processing limit for match operations.

Some patterns, when matching certain strings, can run in exponential time. For practical purposes, the match operation may appear to be in an infinite loop. When a limit is set a match operation will fail with an error if the limit is exceeded.

The units of the limit are steps of the match engine. Correspondence with actual processor time will depend on the speed of the processor and the details of the specific pattern, but will typically be on the order of milliseconds.

By default, the matching time is not limited.

StackLimit Int

Set the amount of heap storage avaliable for use by the match backtracking stack.

ICU uses a backtracking regular expression engine, with the backtrack stack maintained on the heap. This function sets the limit to the amount of memory that can be used for this purpose. A backtracking stack overflow will result in an error from the match operation that caused it.

A limit is desirable because a malicious or poorly designed pattern can use excessive memory, potentially crashing the process. A limit is enabled by default.