Copyright | Daan Leijen 1999-2001 Bryan O'Sullivan 2007-2015 Winterland 2016 |
---|---|
License | BSD3 |
Maintainer | drkoster@qq.com |
Stability | experimental |
Portability | portable |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
This library provide parsec/attoparsec style parsing combinators for binary package. By default, this module export combinators in Data.Binary.Get, Data.Binary.Parser.Word8 and Data.Binary.Parser.Numeric, for additional ASCII char parser, please check Data.Binary.Parser.Char8 module.
The behaviour of parsers here is different to that of the
similarly-named parser in Parsec, as this one is all-or-nothing.
To illustrate the difference, the following parser will fail under
Parsec given an input of "for"
:
string "foo" <|> string "for"
The reason for its failure is that the first branch is a
partial match, and will consume the letters 'f'
and 'o'
before failing. In binary-parsers, the above parser will succeed on
that input, because the failed first branch will consume nothing.
There're some redundant combinators get removed, for example:
choice == asum count == replicateM atEnd == isEmpty take == getByteString many1 == some
For fast byte set operations, please use charset package.
It's recommanded to use parseOnly
, parseDetail
... functions to run your parsers since these
functions are faster than binary's counter part by avoiding a small constant overhead.
Check parse
for detail.
A few words on performance and backtracking
There's a common belief that parsers which support backtracking are slow, but it's not neccessarily
true in binary, because binary doesn't do book keeping if you doesn't use <|>
, lookAhead
or their
friends. Combinators in this library like peek
, string
... also try to avoid backtracking so
it's faster to use them rather than do backtracking yourself, for example, peek
is faster than
. In practice, protocols are often designed to avoid backtracking.
For example, if you have following parser:lookAhead
getWord8
branch1 <|> branch2 <|> (skipN 1 >> branch3)
And if you can select the right branch just by looking ahead one byte, then you can rewrite it to:
w <- peek if | w == b1 -> branch1 | w == b2 -> branch2 | w == b3 -> skipN 1 >> branch3
Binary performs as fast as a non-backtracking parser as long as you construct your parser
without using backtracking. And sometime backtracking is indeed neccessary, for example scientifically
is almost impossible to implement correctly if you don't do backtracking.
Synopsis
- type Parser a = Get a
- parseOnly :: Get a -> ByteString -> Either String a
- parseLazy :: Get a -> ByteString -> Either String a
- parseDetail :: Get a -> ByteString -> Either (ByteString, ByteOffset, String) (ByteString, ByteOffset, a)
- parseDetailLazy :: Get a -> ByteString -> Either (ByteString, ByteOffset, String) (ByteString, ByteOffset, a)
- parse :: Get a -> ByteString -> Decoder a
- maybeDecoder :: Decoder r -> Maybe r
- eitherDecoder :: Decoder r -> Either String r
- (<?>) :: Get a -> String -> Get a
- endOfInput :: Get ()
- option :: Alternative f => a -> f a -> f a
- eitherP :: Alternative f => f a -> f b -> f (Either a b)
- match :: Get a -> Get (ByteString, a)
- many' :: MonadPlus m => m a -> m [a]
- some' :: MonadPlus m => m a -> m [a]
- sepBy :: Alternative f => f a -> f s -> f [a]
- sepBy' :: MonadPlus m => m a -> m s -> m [a]
- sepBy1 :: Alternative f => f a -> f s -> f [a]
- sepBy1' :: MonadPlus m => m a -> m s -> m [a]
- manyTill :: Alternative f => f a -> f b -> f [a]
- manyTill' :: MonadPlus m => m a -> m b -> m [a]
- skipMany :: Alternative f => f a -> f ()
- skipMany1 :: Alternative f => f a -> f ()
- module Data.Binary.Get
- module Data.Binary.Parser.Word8
- module Data.Binary.Parser.Numeric
Running parsers
parseOnly :: Get a -> ByteString -> Either String a Source #
Run a parser on ByteString
.
This function does not force a parser to consume all of its input. Instead, any residual input will be discarded. To force a parser to consume all of its input, use something like this:
parseOnly (myParser <* endOfInput)
parseLazy :: Get a -> ByteString -> Either String a Source #
Similar to parseOnly
, but run a parser on lazy ByteString
.
parseDetail :: Get a -> ByteString -> Either (ByteString, ByteOffset, String) (ByteString, ByteOffset, a) Source #
Run a parser on ByteString
.
This function return full parsing results: the rest of input, stop offest and fail message or parsing result.
Since: 0.2.1.0
parseDetailLazy :: Get a -> ByteString -> Either (ByteString, ByteOffset, String) (ByteString, ByteOffset, a) Source #
Similar to parseDetail
, but run a parser on lazy ByteString
.
Since: 0.2.1.0
parse :: Get a -> ByteString -> Decoder a Source #
Run a Get
monad. See Decoder
for what to do next, like providing
input, handling decoding errors and to get the output value.
This's faster than runGetIncremental
becuase it provides an initial chunk rather
than feeding empty
and waiting for chunks, this overhead is noticeable when you're
running small getters over short ByteString
s.
Since: 0.2.1.0
Decoder conversion
maybeDecoder :: Decoder r -> Maybe r Source #
Combinators
endOfInput :: Get () Source #
Match only if all input has been consumed.
option :: Alternative f => a -> f a -> f a Source #
option x p
tries to apply action p
. If p
fails without
consuming input, it returns the value x
, otherwise the value
returned by p
.
priority = option 0 (digitToInt <$> digit)
eitherP :: Alternative f => f a -> f b -> f (Either a b) Source #
Combine two alternatives.
match :: Get a -> Get (ByteString, a) Source #
Return both the result of a parse and the portion of the input that was consumed while it was being parsed.
many' :: MonadPlus m => m a -> m [a] Source #
many' p
applies the action p
zero or more times. Returns a
list of the returned values of p
. The value returned by p
is
forced to WHNF.
word = many' letter
some' :: MonadPlus m => m a -> m [a] Source #
some' p
applies the action p
one or more times. Returns a
list of the returned values of p
. The value returned by p
is
forced to WHNF.
word = some' letter
sepBy :: Alternative f => f a -> f s -> f [a] Source #
sepBy p sep
applies zero or more occurrences of p
, separated
by sep
. Returns a list of the values returned by p
.
commaSep p = p `sepBy` (char ',')
sepBy' :: MonadPlus m => m a -> m s -> m [a] Source #
sepBy' p sep
applies zero or more occurrences of p
, separated
by sep
. Returns a list of the values returned by p
. The value
returned by p
is forced to WHNF.
commaSep p = p `sepBy'` (char ',')
sepBy1 :: Alternative f => f a -> f s -> f [a] Source #
sepBy1 p sep
applies one or more occurrences of p
, separated
by sep
. Returns a list of the values returned by p
.
commaSep p = p `sepBy1` (char ',')
sepBy1' :: MonadPlus m => m a -> m s -> m [a] Source #
sepBy1' p sep
applies one or more occurrences of p
, separated
by sep
. Returns a list of the values returned by p
. The value
returned by p
is forced to WHNF.
commaSep p = p `sepBy1'` (char ',')
manyTill :: Alternative f => f a -> f b -> f [a] Source #
manyTill p end
applies action p
zero or more times until
action end
succeeds, and returns the list of values returned by
p
. This can be used to scan comments:
simpleComment = string "<!--" *> manyTill anyChar (string "-->")
(Note the overlapping parsers anyChar
and string "-->"
.
While this will work, it is not very efficient, as it will cause a
lot of backtracking.)
manyTill' :: MonadPlus m => m a -> m b -> m [a] Source #
manyTill' p end
applies action p
zero or more times until
action end
succeeds, and returns the list of values returned by
p
. This can be used to scan comments:
simpleComment = string "<!--" *> manyTill' anyChar (string "-->")
(Note the overlapping parsers anyChar
and string "-->"
.
While this will work, it is not very efficient, as it will cause a
lot of backtracking.)
The value returned by p
is forced to WHNF.
skipMany :: Alternative f => f a -> f () Source #
Skip zero or more instances of an action.
skipMany1 :: Alternative f => f a -> f () Source #
Skip one or more instances of an action.
Re-exports
module Data.Binary.Get
module Data.Binary.Parser.Word8
module Data.Binary.Parser.Numeric