lens-regex-pcre
Hackage and Docs
Based on pcre-heavy
; so it should support any regexes or options which it supports.
Performance is equal, sometimes better than that of pcre-heavy
alone.
Which module should you use?
If you need unicode support, use Control.Lens.Regex.Text
, if not then Control.Lens.Regex.ByteString
is faster.
Working with Regexes in Haskell kinda sucks; it's tough to figure out which libs
to use, and even after you pick one it's tough to figure out how to use it; lens-regex-pcre
hopes to replace most other solutions by being fast, easy to set up, more adaptable with a more consistent interface.
It helps that there are already HUNDREDS of combinators which interop with lenses 😄.
As it turns out; regexes are a very lens-like tool; Traversals allow you to select
and alter zero or more matches; traversals can even carry indexes so you know which match or group you're working
on.
Examples
txt :: Text
txt = "raindrops on roses and whiskers on kittens"
-- Search
>>> has [regex|whisk|] . match txt
True
-- Get matches
>>> txt ^.. [regex|\br\w+|] . match
["raindrops","roses"]
-- Edit matches
>>> txt & [regex|\br\w+|] . match %~ T.intersperse '-' . T.toUpper
"R-A-I-N-D-R-O-P-S on R-O-S-E-S and whiskers on kittens"
-- Get Groups
>>> txt ^.. [regex|(\w+) on (\w+)|] . groups
[["raindrops","roses"],["whiskers","kittens"]]
-- Edit Groups
>>> txt & [regex|(\w+) on (\w+)|] . groups %~ reverse
"roses on raindrops and kittens on whiskers"
-- Get the third match
>>> txt ^? [regex|\w+|] . index 2 . match
Just "roses"
-- Match integers, 'Read' them into ints, then sort them in-place
-- dumping them back into the source text afterwards.
>>> "Monday: 29, Tuesday: 99, Wednesday: 3"
& partsOf ([regex|\d+|] . match . unpacked . _Show @Int) %~ sort
"Monday: 3, Tuesday: 29, Wednesday: 99"
Basically anything you want to do is possible somehow.
See the benchmarks.
Summary
Caveat: I'm by no means a benchmarking expert; if you have tips on how to do this better I'm all ears!
- Search
lens-regex-pcre
is marginally slower than pcre-heavy
, but well within acceptable margins (within 0.6%)
- Replace
lens-regex-pcre
beats pcre-heavy
by ~10%
- Modify
pcre-heavy
doesn't support this operation at all, so I guess lens-regex-pcre
wins here :)
How can it possibly be faster if it's based on pcre-heavy
? lens-regex-pcre
only uses pcre-heavy
for finding the matches, not substitution/replacement. After that it splits the text into chunks and traverses over them with whichever operation you've chosen. The nature of this implementation makes it a lot easier to understand than imperative implementations of the same thing. This means it's pretty easy to make edits, and is also the reason we can support arbitrary traversals/actions. It was easy enough, so I went ahead and made the whole thing use ByteString Builders, which sped it up a lot. I suspect that pcre-heavy
can benefit from the same optimization if anyone feels like back-porting it; it could be (almost) as nicely using simple traverse
without any lenses. The whole thing is only about 25 LOC.
I'm neither a benchmarks nor stats person, so please open an issue if anything here seems fishy.
Without pcre-light
and pcre-heavy
this library wouldn't be possible, so huge thanks to all contributors!
Here are the benchmarks on my 2013 Macbook (2.6 Ghz i5)
benchmarking static pattern search/pcre-heavy ... took 20.78 s, total 56 iterations
benchmarked static pattern search/pcre-heavy
time 375.3 ms (372.0 ms .. 378.5 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 378.1 ms (376.4 ms .. 380.8 ms)
std dev 3.747 ms (922.3 μs .. 5.609 ms)
benchmarking static pattern search/lens-regex-pcre ... took 20.79 s, total 56 iterations
benchmarked static pattern search/lens-regex-pcre
time 379.5 ms (376.2 ms .. 382.4 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 377.3 ms (376.5 ms .. 378.4 ms)
std dev 1.667 ms (1.075 ms .. 2.461 ms)
benchmarking complex pattern search/pcre-heavy ... took 95.95 s, total 56 iterations
benchmarked complex pattern search/pcre-heavy
time 1.741 s (1.737 s .. 1.746 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.746 s (1.744 s .. 1.749 s)
std dev 4.499 ms (3.186 ms .. 6.080 ms)
benchmarking complex pattern search/lens-regex-pcre ... took 97.26 s, total 56 iterations
benchmarked complex pattern search/lens-regex-pcre
time 1.809 s (1.736 s .. 1.908 s)
0.996 R² (0.991 R² .. 1.000 R²)
mean 1.757 s (1.742 s .. 1.810 s)
std dev 42.83 ms (11.51 ms .. 70.69 ms)
benchmarking simple replacement/pcre-heavy ... took 23.32 s, total 56 iterations
benchmarked simple replacement/pcre-heavy
time 423.8 ms (422.4 ms .. 425.3 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 424.0 ms (422.9 ms .. 426.2 ms)
std dev 2.684 ms (1.239 ms .. 4.270 ms)
benchmarking simple replacement/lens-regex-pcre ... took 20.84 s, total 56 iterations
benchmarked simple replacement/lens-regex-pcre
time 382.8 ms (374.3 ms .. 391.5 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 378.2 ms (376.3 ms .. 381.0 ms)
std dev 3.794 ms (2.577 ms .. 5.418 ms)
benchmarking complex replacement/pcre-heavy ... took 24.77 s, total 56 iterations
benchmarked complex replacement/pcre-heavy
time 448.1 ms (444.7 ms .. 450.0 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 450.8 ms (449.5 ms .. 453.9 ms)
std dev 3.129 ms (947.0 μs .. 4.841 ms)
benchmarking complex replacement/lens-regex-pcre ... took 21.99 s, total 56 iterations
benchmarked complex replacement/lens-regex-pcre
time 399.9 ms (398.4 ms .. 402.2 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 399.6 ms (399.0 ms .. 400.4 ms)
std dev 1.135 ms (826.2 μs .. 1.604 ms)
Benchmark lens-regex-pcre-bench: FINISH
Behaviour
Precise Expected behaviour (and examples) can be found in the test suites: