Portability | portable |
---|---|
Stability | experimental |
Maintainer | Uwe Schmidt (uwe@fh-wedel.de) |
Convenient functions for W3C XML Schema Regular Expression Matcher.
For internals see Text.XML.HXT.RelaxNG.XmlSchema.Regex
Grammar can be found under http://www.w3.org/TR/xmlschema11-2/#regexs
- matchRE :: String -> String -> Maybe Bool
- splitRE :: String -> String -> Maybe (String, String)
- sedRE :: (String -> String) -> String -> String -> Maybe String
- tokenizeRE :: String -> String -> Maybe [String]
- tokenizeRE' :: String -> String -> Maybe [Either String String]
- match :: String -> String -> Bool
- tokenize :: String -> String -> [String]
- tokenize' :: String -> String -> [Either String String]
- sed :: (String -> String) -> String -> String -> String
- split :: String -> String -> (String, String)
Documentation
matchRE :: String -> String -> Maybe BoolSource
match a string with a regular expression
First argument is the regex, second the input string,
if the regex is not well formed, Nothing
is returned,
else Just
the match result
Examples:
matchRE "x*" "xxx" = Just True matchRE "x" "xxx" = Just False matchRE "[" "xxx" = Nothing
splitRE :: String -> String -> Maybe (String, String)Source
split a string by taking the longest prefix matching a regular expression
Nothing
is returned in case of a syntactically wrong regex string
or in case there is no matching prefix, else the pair of prefix and rest is returned
examples:
splitRE "a*b" "abc" = Just ("ab","c") splitRE "a*" "bc" = Just ("", "bc") splitRE "a+" "bc" = Nothing splitRE "[" "abc" = Nothing
sedRE :: (String -> String) -> String -> String -> Maybe StringSource
sed like editing function
All matching tokens are edited by the 1. argument, the editing function, all other chars remain as they are
examples:
sedRE (const "b") "a" "xaxax" = Just "xbxbx" sedRE (\ x -> x ++ x) "a" "xax" = Just "xaax" sedRE undefined "[" undefined = Nothing
tokenizeRE :: String -> String -> Maybe [String]Source
split a string into tokens (words) by giving a regular expression which all tokens must match
This can be used for simple tokenizers.
The words in the result list contain at least one char.
All none matching chars are discarded. If the given regex contains syntax errors,
Nothing
is returned
examples:
tokenizeRE "a*b" "" = Just [] tokenizeRE "a*b" "abc" = Just ["ab"] tokenizeRE "a*b" "abaab ab" = Just ["ab","aab","ab"] tokenizeRE "[a-z]{2,}|[0-9]{2,}|[0-9]+[.][0-9]+" "ab123 456.7abc" = Just ["ab","123","456.7","abc"] tokenizeRE "[a-z]*|[0-9]{2,}|[0-9]+[.][0-9]+" "cab123 456.7abc" = Just ["cab","123","456.7","abc"] tokenizeRE "[^ \t\n\r]*" "abc def\t\n\rxyz" = Just ["abc","def","xyz"] tokenizeRE "[^ \t\n\r]*" = words
tokenizeRE' :: String -> String -> Maybe [Either String String]Source
split a string into tokens and delimierter by giving a regular expression wich all tokens must match
This is a generalisation of the above tokenizeRE
functions.
The none matching char sequences are marked with Left
, the matching ones are marked with Right
If the regular expression contains syntax errors Nothing
is returned
The following Law holds:
concat . map (either id id) . fromJust . tokenizeRE' re == id
match :: String -> String -> BoolSource
convenient function for matchRE
syntax errors in R.E. are interpreted as no match found
tokenize :: String -> String -> [String]Source
convenient function for tokenizeRE a string
syntax errors in R.E. result in an empty list
tokenize' :: String -> String -> [Either String String]Source
convenient function for tokenizeRE'
When the regular expression contains errors [Left input]
is returned, that means tokens are found