Copyright | © 2015–2017 Megaparsec contributors © 2007 Paolo Martini © 1999–2001 Daan Leijen |
---|---|
License | FreeBSD |
Maintainer | Mark Karpov <markkarpov@opmbx.org> |
Stability | experimental |
Portability | non-portable |
Safe Haskell | None |
Language | Haskell2010 |
The primitive parser combinators.
- data State s = State {
- stateInput :: s
- statePos :: NonEmpty SourcePos
- stateTokensProcessed :: !Word
- stateTabWidth :: Pos
- class Ord (Token s) => Stream s where
- type Parsec e s = ParsecT e s Identity
- data ParsecT e s m a
- class (ErrorComponent e, Stream s, Alternative m, MonadPlus m) => MonadParsec e s m | m -> e s where
- (<?>) :: MonadParsec e s m => m a -> String -> m a
- unexpected :: MonadParsec e s m => ErrorItem (Token s) -> m a
- match :: MonadParsec e s m => m a -> m ([Token s], a)
- region :: MonadParsec e s m => (ParseError (Token s) e -> ParseError (Token s) e) -> m a -> m a
- getInput :: MonadParsec e s m => m s
- setInput :: MonadParsec e s m => s -> m ()
- getPosition :: MonadParsec e s m => m SourcePos
- getNextTokenPosition :: forall e s m. MonadParsec e s m => m (Maybe SourcePos)
- setPosition :: MonadParsec e s m => SourcePos -> m ()
- pushPosition :: MonadParsec e s m => SourcePos -> m ()
- popPosition :: MonadParsec e s m => m ()
- getTokensProcessed :: MonadParsec e s m => m Word
- setTokensProcessed :: MonadParsec e s m => Word -> m ()
- getTabWidth :: MonadParsec e s m => m Pos
- setTabWidth :: MonadParsec e s m => Pos -> m ()
- setParserState :: MonadParsec e s m => State s -> m ()
- parse :: Parsec e s a -> String -> s -> Either (ParseError (Token s) e) a
- parseMaybe :: (ErrorComponent e, Stream s) => Parsec e s a -> s -> Maybe a
- parseTest :: (ShowErrorComponent e, Ord (Token s), ShowToken (Token s), Show a) => Parsec e s a -> s -> IO ()
- runParser :: Parsec e s a -> String -> s -> Either (ParseError (Token s) e) a
- runParser' :: Parsec e s a -> State s -> (State s, Either (ParseError (Token s) e) a)
- runParserT :: Monad m => ParsecT e s m a -> String -> s -> m (Either (ParseError (Token s) e) a)
- runParserT' :: Monad m => ParsecT e s m a -> State s -> m (State s, Either (ParseError (Token s) e) a)
- dbg :: forall e s m a. (Stream s, ShowToken (Token s), ShowErrorComponent e, Show a) => String -> ParsecT e s m a -> ParsecT e s m a
Data types
This is the Megaparsec's state, it's parametrized over stream type s
.
State | |
|
class Ord (Token s) => Stream s where Source #
An instance of Stream s
has stream type s
. Token type is determined
by the stream and can be found via Token
type function.
uncons :: s -> Maybe (Token s, s) Source #
Get next token from the stream. If the stream is empty, return
Nothing
.
updatePos :: Proxy s -> Pos -> SourcePos -> Token s -> (SourcePos, SourcePos) Source #
Update position in stream given tab width, current position, and
current token. The result is a tuple where the first element will be
used to report parse errors for current token, while the second element
is the incremented position that will be stored in parser's state. The
stored (incremented) position is used whenever position can't
be/shouldn't be updated by consuming a token. For example, when using
failure
, we don't grab a new token (we need to fail right were we are
now), so error position will be taken from parser's state.
When you work with streams where elements do not contain information
about their position in input, result is usually consists of the third
argument unchanged and incremented position calculated with respect to
current token. This is how default instances of Stream
work (they use
defaultUpdatePos
, which may be a good starting point for your own
position-advancing function).
When you wish to deal with stream of tokens where every token “knows” its start and end position in input (for example, you have produced the stream with Happy/Alex), then the best strategy is to use the start position as actual element position and provide the end position of the token as incremented one.
Since: 5.0.0
type Parsec e s = ParsecT e s Identity Source #
Parsec
is non-transformer variant of more general ParsecT
monad
transformer.
ParsecT e s m a
is a parser with custom data component of error e
,
stream type s
, underlying monad m
and return type a
.
Primitive combinators
class (ErrorComponent e, Stream s, Alternative m, MonadPlus m) => MonadParsec e s m | m -> e s where Source #
Type class describing parsers independent of input type.
failure, label, try, lookAhead, notFollowedBy, withRecovery, observing, eof, token, tokens, getParserState, updateParserState
failure :: Set (ErrorItem (Token s)) -> Set (ErrorItem (Token s)) -> Set e -> m a Source #
The most general way to stop parsing and report ParseError
.
unexpected
is defined in terms of this function:
unexpected item = failure (Set.singleton item) Set.empty Set.empty
Since: 4.2.0
label :: String -> m a -> m a Source #
The parser label name p
behaves as parser p
, but whenever the
parser p
fails without consuming any input, it replaces names of
“expected” tokens with the name name
.
hidden p
behaves just like parser p
, but it doesn't show any
“expected” tokens in error message when p
fails.
The parser try p
behaves like parser p
, except that it backtracks
parser state when p
fails (either consuming input or not).
This combinator is used whenever arbitrary look ahead is needed. Since
it pretends that it hasn't consumed any input when p
fails, the
(<|>
) combinator will try its second alternative even when the
first parser failed while consuming input.
For example, here is a parser that is supposed to parse word “let” or “lexical”:
>>>
parseTest (string "let" <|> string "lexical") "lexical"
1:1: unexpected "lex" expecting "let"
What happens here? First parser consumes “le” and fails (because it
doesn't see a “t”). The second parser, however, isn't tried, since the
first parser has already consumed some input! try
fixes this behavior
and allows backtracking to work:
>>>
parseTest (try (string "let") <|> string "lexical") "lexical"
"lexical"
try
also improves error messages in case of overlapping alternatives,
because Megaparsec's hint system can be used:
>>>
parseTest (try (string "let") <|> string "lexical") "le"
1:1: unexpected "le" expecting "let" or "lexical"
Please note that as of Megaparsec 4.4.0, string
backtracks
automatically (see tokens
), so it does not need try
. However, the
examples above demonstrate the idea behind try
so well that it was
decided to keep them. You still need to use try
when your
alternatives are complex, composite parsers.
lookAhead :: m a -> m a Source #
If p
in lookAhead p
succeeds (either consuming input or not) the
whole parser behaves like p
succeeded without consuming anything
(parser state is not updated as well). If p
fails, lookAhead
has no
effect, i.e. it will fail consuming input if p
fails consuming input.
Combine with try
if this is undesirable.
notFollowedBy :: m a -> m () Source #
notFollowedBy p
only succeeds when parser p
fails. This parser
never consumes any input and never modifies parser state. It can be
used to implement the “longest match” rule.
withRecovery :: (ParseError (Token s) e -> m a) -> m a -> m a Source #
withRecovery r p
allows continue parsing even if parser p
fails.
In this case r
is called with actual ParseError
as its argument.
Typical usage is to return value signifying failure to parse this
particular object and to consume some part of input up to start of next
object.
Note that if r
fails, original error message is reported as if
without withRecovery
. In no way recovering parser r
can influence
error messages.
Since: 4.4.0
observing :: m a -> m (Either (ParseError (Token s) e) a) Source #
observing p
allows to “observe” failure of p
parser, should it
happen, without actually ending parsing, but instead getting the
ParseError
in Left
. On success parsed value is returned in Right
as usual. Note that this primitive just allows you to observe parse
errors as they happen, it does not backtrack or change how the p
parser works in any way.
Since: 5.1.0
This parser only succeeds at the end of the input.
token :: (Token s -> Either (Set (ErrorItem (Token s)), Set (ErrorItem (Token s)), Set e) a) -> Maybe (Token s) -> m a Source #
The parser token test mrep
accepts a token t
with result x
when
the function test t
returns
. Right
xmrep
may provide
representation of the token to report in error messages when input
stream in empty.
This is the most primitive combinator for accepting tokens. For
example, the satisfy
parser is implemented as:
satisfy f = token testChar Nothing where testChar x = if f x then Right x else Left (Set.singleton (Tokens (x:|[])), Set.empty, Set.empty)
tokens :: (Token s -> Token s -> Bool) -> [Token s] -> m [Token s] Source #
The parser tokens test
parses list of tokens and returns it.
Supplied predicate test
is used to check equality of given and parsed
tokens.
This can be used for example to write string
:
string = tokens (==)
Note that beginning from Megaparsec 4.4.0, this is an auto-backtracking
primitive, which means that if it fails, it never consumes any input.
This is done to make its consumption model match how error messages for
this primitive are reported (which becomes an important thing as user
gets more control with primitives like withRecovery
):
>>>
parseTest (string "abc") "abd"
1:1: unexpected "abd" expecting "abc"
This means, in particular, that it's no longer necessary to use try
with tokens
-based parsers, such as string
and
string'
. This feature does not affect
performance in any way.
getParserState :: m (State s) Source #
Returns the full parser state as a State
record.
updateParserState :: (State s -> State s) -> m () Source #
updateParserState f
applies function f
to the parser state.
MonadParsec e s m => MonadParsec e s (IdentityT * m) Source # | |
(Monoid w, MonadParsec e s m) => MonadParsec e s (WriterT w m) Source # | |
(Monoid w, MonadParsec e s m) => MonadParsec e s (WriterT w m) Source # | |
MonadParsec e s m => MonadParsec e s (StateT st m) Source # | |
MonadParsec e s m => MonadParsec e s (StateT st m) Source # | |
MonadParsec e s m => MonadParsec e s (ReaderT * r m) Source # | |
(ErrorComponent e, Stream s) => MonadParsec e s (ParsecT e s m) Source # | |
(Monoid w, MonadParsec e s m) => MonadParsec e s (RWST r w st m) Source # | |
(Monoid w, MonadParsec e s m) => MonadParsec e s (RWST r w st m) Source # | |
(<?>) :: MonadParsec e s m => m a -> String -> m a infix 0 Source #
A synonym for label
in the form of an operator.
unexpected :: MonadParsec e s m => ErrorItem (Token s) -> m a Source #
The parser unexpected item
always fails with an error message telling
about unexpected item item
without consuming any input.
match :: MonadParsec e s m => m a -> m ([Token s], a) Source #
Return both the result of a parse and the list of tokens that were
consumed during parsing. This relies on the change of the
stateTokensProcessed
value to evaluate how many tokens were consumed.
Since: 5.3.0
:: MonadParsec e s m | |
=> (ParseError (Token s) e -> ParseError (Token s) e) | How to process |
-> m a | The “region” that processing applies to |
-> m a |
Specify how to process ParseError
s that happen inside of this
wrapper. As a side effect of current implementation changing errorPos
with this combinator will also change the final statePos
in parser
state.
Since: 5.3.0
Parser state combinators
getInput :: MonadParsec e s m => m s Source #
Return the current input.
setInput :: MonadParsec e s m => s -> m () Source #
getPosition :: MonadParsec e s m => m SourcePos Source #
Return the current source position.
See also: setPosition
, pushPosition
, popPosition
, and SourcePos
.
getNextTokenPosition :: forall e s m. MonadParsec e s m => m (Maybe SourcePos) Source #
Get position where the next token in the stream begins. If the stream
is empty, return Nothing
.
Since: 5.3.0
setPosition :: MonadParsec e s m => SourcePos -> m () Source #
setPosition pos
sets the current source position to pos
.
See also: getPosition
, pushPosition
, popPosition
, and SourcePos
.
pushPosition :: MonadParsec e s m => SourcePos -> m () Source #
Push given position into stack of positions and continue parsing working with this position. Useful for working with include files and the like.
See also: getPosition
, setPosition
, popPosition
, and SourcePos
.
Since: 5.0.0
popPosition :: MonadParsec e s m => m () Source #
Pop a position from stack of positions unless it only contains one
element (in that case stack of positions remains the same). This is how
to return to previous source file after pushPosition
.
See also: getPosition
, setPosition
, pushPosition
, and SourcePos
.
Since: 5.0.0
getTokensProcessed :: MonadParsec e s m => m Word Source #
Get number of tokens processed so far.
Since: 5.2.0
setTokensProcessed :: MonadParsec e s m => Word -> m () Source #
Set number of tokens processed so far.
Since: 5.2.0
getTabWidth :: MonadParsec e s m => m Pos Source #
Return tab width. Default tab width is equal to defaultTabWidth
. You
can set different tab width with help of setTabWidth
.
setTabWidth :: MonadParsec e s m => Pos -> m () Source #
Set tab width. If argument of the function is not positive number,
defaultTabWidth
will be used.
setParserState :: MonadParsec e s m => State s -> m () Source #
setParserState st
set the full parser state to st
.
Running parser
:: Parsec e s a | Parser to run |
-> String | Name of source file |
-> s | Input for parser |
-> Either (ParseError (Token s) e) a |
parse p file input
runs parser p
over Identity
(see runParserT
if you're using the ParsecT
monad transformer; parse
itself is just a
synonym for runParser
). It returns either a ParseError
(Left
) or a
value of type a
(Right
). parseErrorPretty
can be used to turn
ParseError
into the string representation of the error message. See
Text.Megaparsec.Error if you need to do more advanced error analysis.
main = case (parse numbers "" "11,2,43") of Left err -> putStr (parseErrorPretty err) Right xs -> print (sum xs) numbers = integer `sepBy` char ','
parseMaybe :: (ErrorComponent e, Stream s) => Parsec e s a -> s -> Maybe a Source #
parseMaybe p input
runs parser p
on input
and returns result
inside Just
on success and Nothing
on failure. This function also
parses eof
, so if the parser doesn't consume all of its input, it will
fail.
The function is supposed to be useful for lightweight parsing, where error messages (and thus file name) are not important and entire input should be parsed. For example it can be used when parsing of single number according to specification of its format is desired.
:: (ShowErrorComponent e, Ord (Token s), ShowToken (Token s), Show a) | |
=> Parsec e s a | Parser to run |
-> s | Input for parser |
-> IO () |
The expression parseTest p input
applies a parser p
against input
input
and prints the result to stdout. Useful for testing.
:: Parsec e s a | Parser to run |
-> String | Name of source file |
-> s | Input for parser |
-> Either (ParseError (Token s) e) a |
runParser p file input
runs parser p
on the input list of tokens
input
, obtained from source file
. The file
is only used in error
messages and may be the empty string. Returns either a ParseError
(Left
) or a value of type a
(Right
).
parseFromFile p file = runParser p file <$> readFile file
:: Monad m | |
=> ParsecT e s m a | Parser to run |
-> String | Name of source file |
-> s | Input for parser |
-> m (Either (ParseError (Token s) e) a) |
runParserT p file input
runs parser p
on the input list of tokens
input
, obtained from source file
. The file
is only used in error
messages and may be the empty string. Returns a computation in the
underlying monad m
that returns either a ParseError
(Left
) or a
value of type a
(Right
).
:: Monad m | |
=> ParsecT e s m a | Parser to run |
-> State s | Initial state |
-> m (State s, Either (ParseError (Token s) e) a) |
This function is similar to runParserT
, but like runParser'
it
accepts and returns parser state. This is thus the most general way to
run a parser.
Since: 4.2.0
Debugging
:: (Stream s, ShowToken (Token s), ShowErrorComponent e, Show a) | |
=> String | Debugging label |
-> ParsecT e s m a | Parser to debug |
-> ParsecT e s m a | Parser that prints debugging messages |
dbg label p
parser works exactly like p
, but when it's evaluated it
prints information useful for debugging. The label
is only used to
refer to this parser in the debugging output. This combinator uses the
trace
function from Debug.Trace under the hood.
Typical usage is to wrap every sub-parser in misbehaving parser with
dbg
assigning meaningful labels. Then give it a shot and go through the
print-out. As of current version, this combinator prints all available
information except for hints, which are probably only interesting to
the maintainer of Megaparsec itself and may be quite verbose to output in
general. Let me know if you would like to be able to see hints as part of
debugging output.
The output itself is pretty self-explanatory, although the following abbreviations should be clarified (they are derived from low-level source code):
COK
— “consumed OK”. The parser consumed input and succeeded.CERR
— “consumed error”. The parser consumed input and failed.EOK
— “empty OK”. The parser succeeded without consuming input.EERR
— “empty error”. The parser failed without consuming input.
Finally, it's not possible to lift this function into some monad
transformers without introducing surprising behavior (e.g. unexpected
state backtracking) or adding otherwise redundant constraints (e.g.
Show
instance for state), so this helper is only available for
ParsecT
monad, not MonadParsec
in general.
Since: 5.1.0