pcre2
Regular expressions for Haskell.
Teasers
licensePlate :: Text -> Maybe Text
licensePlate = match "[A-Z]{3}[0-9]{3,4}"
licensePlates :: Text -> [Text]
licensePlates = match "[A-Z]{3}[0-9]{3,4}"
case "The quick brown fox" of
[regex|\bbrown\s+(?<animal>[A-z]+)\b|] -> Text.putStrLn animal
_ -> die "nothing brown"
let kv'd = lined . packed . [_regex|(?x) # Extended PCRE2 syntax
^\s* # Ignore leading whitespace
([^=:\s].*?) # Capture the non-empty key
\s* # Ignore trailing whitespace
[=:] # Separator
\s* # Ignore leading whitespace
(.*?) # Capture the possibly-empty value
\s*$ # Ignore trailing whitespace
|]
forMOf kv'd file $ execStateT $ do
k <- gets $ capture @1
v <- gets $ capture @2
liftIO $ Text.putStrLn $ "found " <> k <> " set to " <> v
case myMap ^. at k of
Just v' | v /= v' -> do
liftIO $ Text.putStrLn $ "setting " <> k <> " to " <> v'
_capture @2 .= v'
_ -> liftIO $ Text.putStrLn $ "no change for " <> k
Features
- No opaque "
Regex
" object. Instead, quiet functions with simple
types—for the most part it's Text
(pattern) -> Text
(subject)
-> result
.
- No custom typeclasses.
- A single datatype for both compile and match options, the
Option
monoid.
- UTF-8
Text
everywhere.
- Match success expressed via
Alternative
.
- Opt-in Template Haskell facilities for compile-time verification of patterns,
indexing captures, and memoizing inline regexes.
- Opt-in
lens
support.
- No failure monads to express compile errors, preferring pure functions and
throwing imprecise exceptions with pretty
Show
instances. Write simple code
and debug it. Or, don't, and use the Template Haskell features instead. Both
are first-class.
- Vast presentation of PCRE2 functionality. We can even register Haskell
callbacks to run during matching!
- Zero-copying of substrings where beneficial.
- Few dependencies.
- Bundled, statically-linked build of up-to-date PCRE2 (version 10.44), with a
complete, exposed Haskell binding.
Currently we are slower than other libraries. For example:
Operation |
pcre2 |
pcre-light |
regex-pcre-builtin |
Compile and match a regex |
3.9 μs |
1.2 μs |
2.9 μs |
If it's really regex processing that's causing a bottleneck,
pcre-light/-heavy/lens-regex-pcre
are recommended instead of this library for the very best performance.
Wishlist
- Many performance optimizations.
- Make use of DFA matching for lazy (infinite) inputs. This likely requires
some upstream changes as well but in theory it's possible
- Improve compile time. Support external
libpcre2
maybe
License
Apache 2.0.
PCRE2 is distributed under the 3-clause BSD license.
Main Author
©2020–2025 Steven Shuck