Copyright | ©2019 James Brock |
---|---|
License | BSD2 |
Maintainer | James Brock <jamesbrock@gmail.com> |
Safe Haskell | None |
Language | Haskell2010 |
Replace.Megaparsec is for finding text patterns, and also replacing or splitting on the found patterns. This activity is traditionally done with regular expressions, but Replace.Megaparsec uses Text.Megaparsec parsers instead for the pattern matching.
Replace.Megaparsec can be used in the same sort of “pattern capture” or “find all” situations in which one would use Python re.findall, or Perl m//, or Unix grep.
Replace.Megaparsec can be used in the same sort of “stream editing” or “search-and-replace” situations in which one would use Python re.sub, or Perl s///, or Unix sed, or awk.
Replace.Megaparsec can be used in the same sort of “string splitting” situations in which one would use Python re.split or Perl split.
See the replace-megaparsec package README for usage examples.
Special accelerated inputs
There are specialization re-write rules to speed up all functions in this
module when the input stream type s
is Data.Text or Data.ByteString.
Type constraints
All functions in the Running Parser section require the type of the
stream of text that is input to be
such that
Stream
s
,
because we want to output the same type of stream that was input.
That requirement is satisfied for all the Tokens
s ~ sStream
instances
included with Text.Megaparsec:
Megaparsec parsers have an error type parameter e
. When writing parsers
to be used by this module, the error type parameter e
should usually
be Void
, because every function in this module expects a parser
failure to occur on every token in a non-matching section of the input
stream, so parser failure error descriptions are not returned.
Synopsis
- breakCap :: forall e s a. (Ord e, Stream s, Tokens s ~ s) => Parsec e s a -> s -> Maybe (s, a, s)
- splitCap :: forall e s a. (Ord e, Stream s, Tokens s ~ s) => Parsec e s a -> s -> [Either s a]
- streamEdit :: forall e s a. (Ord e, Stream s, Monoid s, Tokens s ~ s) => Parsec e s a -> (a -> s) -> s -> s
- streamEditT :: forall e s m a. (Ord e, Stream s, Monad m, Monoid s, Tokens s ~ s) => ParsecT e s m a -> (a -> m s) -> s -> m s
- anyTill :: forall e s m a. MonadParsec e s m => m a -> m (Tokens s, a)
- sepCap :: forall e s m a. MonadParsec e s m => m a -> m [Either (Tokens s) a]
- findAll :: MonadParsec e s m => m a -> m [Either (Tokens s) (Tokens s)]
- findAllCap :: MonadParsec e s m => m a -> m [Either (Tokens s) (Tokens s, a)]
Running parser
Functions in this section are ways to run parsers. They take
as arguments a sep
parser and some input, run the parser on the input,
and return a result.
:: (Ord e, Stream s, Tokens s ~ s) | |
=> Parsec e s a | The pattern matching parser |
-> s | The input stream of text |
-> Maybe (s, a, s) | Maybe (prefix, parse_result, suffix) |
Break on and capture one pattern
Find the first occurence of a pattern in a text stream, capture the found pattern, and break the input text stream on the found pattern.
The breakCap
function is like takeWhile
, but can be predicated
beyond more than just the next one token. It's also like breakOn
,
but the needle
can be a pattern instead of a constant string.
Be careful not to look too far
ahead; if the sep
parser looks to the end of the input then breakCap
could be O(n²).
The pattern parser sep
may match a zero-width pattern (a pattern which
consumes no parser input on success).
Output
Nothing
when no pattern match was found.Just (prefix, parse_result, suffix)
for the result of parsing the pattern match, and theprefix
string before and thesuffix
string after the pattern match.prefix
andsuffix
may be zero-length strings.
Access the matched section of text
If you want to capture the matched string, then combine the pattern
parser sep
with match
.
With the matched string, we can reconstruct in input string.
For all input
, sep
, if
let (Just
(prefix, (infix, _), suffix)) = breakCap (match
sep) input
then
input == prefix<>
infix<>
suffix
:: (Ord e, Stream s, Tokens s ~ s) | |
=> Parsec e s a | The pattern matching parser |
-> s | The input stream of text |
-> [Either s a] | List of matching and non-matching input sections. |
Split on and capture all patterns
Find all occurences of the pattern sep
, split the input string, capture
all the patterns and the splits.
The input string will be split on every leftmost non-overlapping occurence
of the pattern sep
. The output list will contain
the parsed result of input string sections which match the sep
pattern
in Right
, and non-matching sections in Left
.
splitCap
depends on sepCap
, see sepCap
for more details.
Access the matched section of text
If you want to capture the matched strings, then combine the pattern
parser sep
with match
.
With the matched strings, we can reconstruct in input string.
For all input
, sep
, if
let output = splitCap (match
sep) input
then
input ==mconcat
(second
fst
<$>
output)
:: (Ord e, Stream s, Monoid s, Tokens s ~ s) | |
=> Parsec e s a | The pattern matching parser |
-> (a -> s) | The |
-> s | The input stream of text to be edited |
-> s | The edited input stream |
Stream editor
Also known as “find-and-replace”, or “match-and-substitute”. Finds all
non-overlapping sections of the stream which match the pattern sep
,
and replaces them with the result of the editor
function.
Access the matched section of text in the editor
If you want access to the matched string in the editor
function,
then combine the pattern parser sep
with match
.
This will effectively change the type of the editor
function
to (s,a) -> s
.
This allows us to write an editor
function which can choose to not
edit the match and just leave it as it is. If the editor
function
returns the first item in the tuple, then streamEdit
will not change
the matched string.
So, for all sep
:
streamEdit (match
sep)fst
≡id
:: (Ord e, Stream s, Monad m, Monoid s, Tokens s ~ s) | |
=> ParsecT e s m a | The pattern matching parser |
-> (a -> m s) | The |
-> s | The input stream of text to be edited |
-> m s | The edited input stream |
Stream editor transformer
Monad transformer version of streamEdit
.
Both the parser sep
and the editor
function run in the underlying
monad context.
If you want to do IO
operations in the editor
function or the
parser sep
, then run this in IO
.
If you want the editor
function or the parser sep
to remember some state,
then run this in a stateful monad.
Parser combinator
Functions in this section are parser combinators. They take
a sep
parser for an argument, combine sep
with another parser,
and return a new parser.
:: MonadParsec e s m | |
=> m a | The pattern matching parser |
-> m (Tokens s, a) | parser |
Specialized manyTill_
Parser combinator to consume input until the sep
pattern matches,
equivalent to
.
On success, returns the prefix before the pattern match and the parsed match.manyTill_
anySingle
sep
sep
may be a zero-width parser, it may succeed without consuming any
input.
This combinator will produce a parser which
acts like takeWhileP
but is predicated beyond more than
just the next one token. anyTill
is also like takeWhileP
in that it will be “fast” when applied to an input stream type s
for which there are specialization re-write rules.
:: MonadParsec e s m | |
=> m a | The pattern matching parser |
-> m [Either (Tokens s) a] | parser |
Separate and capture
Parser combinator to find all of the leftmost non-overlapping occurrences
of the pattern parser sep
in a text stream.
The sepCap
parser will always consume its entire input and can never fail.
sepCap
is similar to the sep*
family of parser combinators
found in
parser-combinators
and
parsers,
but it returns the parsed result of the sep
parser instead
of throwing it away.
Output
The input stream is separated and output into a list of sections:
- Sections which can parsed by the pattern
sep
will be parsed and captured asRight
. - Non-matching sections of the stream will be captured in
Left
.
The output list also has these properties:
- If the input is
""
then the output list will be[]
. - If there are no pattern matches, then
the entire input stream will be returned as one non-matching
Left
section. - The output list will not contain two consecutive
Left
sections.
Zero-width matches forbidden
If the pattern matching parser sep
would succeed without consuming any
input then sepCap
will force it to fail.
If we allow sep
to match a zero-width pattern,
then it can match the same zero-width pattern again at the same position
on the next iteration, which would result in an infinite number of
overlapping pattern matches.
:: MonadParsec e s m | |
=> m a | The pattern matching parser |
-> m [Either (Tokens s) (Tokens s)] | parser |
:: MonadParsec e s m | |
=> m a | The pattern matching parser |
-> m [Either (Tokens s) (Tokens s, a)] | parser |
Find all occurences, parse and capture pattern matches
Parser combinator for finding all occurences of a pattern in a stream.
Will call sepCap
with the match
combinator so that
the text which matched the pattern parser sep
will be returned in
the Right
sections, along with the result of the parse of sep
.
Definition:
findAllCap sep =sepCap
(match
sep)