Copyright | (c) Dong Han 2017-2018 |
---|---|

License | BSD |

Maintainer | winterland1989@gmail.com |

Stability | experimental |

Portability | non-portable |

Safe Haskell | None |

Language | Haskell2010 |

This module provide a simple resumable `Parser`

, which is suitable for binary protocol and simple textual protocol parsing. Both binary parsers (`decodePrim`

,etc) and textual parsers are provided, and they all work on `Bytes`

.

You can use `Alternative`

instance to do backtracking, each branch will either succeed and may consume some input, or fail without consume anything. It's recommend to use `peek`

or `peekMaybe`

to avoid backtracking if possible to get high performance.

Error message can be attached using `<?>`

, which have very small overhead, so it's recommended to attach a message in front of a composed parser like `xPacket = "Foo.Bar.xPacket" ? do ...`

, following is an example message when parsing an integer failed:

>parse int "foo" ([102,111,111],Left ["Z.Data.Parser.Numeric.int","Std.Data.Parser.Base.takeWhile1: no satisfied byte"]) -- It's easy to see we're trying to match a leading sign or digit here

Use parser-combinators to get combinators based on
`Applicative`

or `Monad`

instance, such as `manyTill`

, `sepBy`

, etc.

## Synopsis

- data Result a
- type ParseError = [Text]
- data Parser a
- (<?>) :: Text -> Parser a -> Parser a
- parse :: Parser a -> Bytes -> (Bytes, Either ParseError a)
- parse' :: Parser a -> Bytes -> Either ParseError a
- parseChunk :: Parser a -> Bytes -> Result a
- type ParseChunks m chunk err x = m chunk -> chunk -> m (chunk, Either err x)
- parseChunks :: Monad m => Parser a -> ParseChunks m Bytes ParseError a
- finishParsing :: Result a -> (Bytes, Either ParseError a)
- runAndKeepTrack :: Parser a -> Parser (Result a, [Bytes])
- match :: Parser a -> Parser (Bytes, a)
- ensureN :: Int -> ParseError -> Parser ()
- endOfInput :: Parser ()
- atEnd :: Parser Bool
- decodePrim :: forall a. Unaligned a => Parser a
- decodePrimLE :: forall a. Unaligned (LE a) => Parser a
- decodePrimBE :: forall a. Unaligned (BE a) => Parser a
- scan :: s -> (s -> Word8 -> Maybe s) -> Parser (Bytes, s)
- scanChunks :: forall s. s -> (s -> Bytes -> Either s (Bytes, Bytes, s)) -> Parser (Bytes, s)
- peekMaybe :: Parser (Maybe Word8)
- peek :: Parser Word8
- satisfy :: (Word8 -> Bool) -> Parser Word8
- satisfyWith :: (Word8 -> a) -> (a -> Bool) -> Parser a
- anyWord8 :: Parser Word8
- word8 :: Word8 -> Parser ()
- anyChar8 :: Parser Char
- char8 :: Char -> Parser ()
- skipWord8 :: Parser ()
- endOfLine :: Parser ()
- skip :: Int -> Parser ()
- skipWhile :: (Word8 -> Bool) -> Parser ()
- skipSpaces :: Parser ()
- take :: Int -> Parser Bytes
- takeTill :: (Word8 -> Bool) -> Parser Bytes
- takeWhile :: (Word8 -> Bool) -> Parser Bytes
- takeWhile1 :: (Word8 -> Bool) -> Parser Bytes
- takeRemaining :: Parser Bytes
- bytes :: Bytes -> Parser ()
- bytesCI :: Bytes -> Parser ()
- text :: Text -> Parser ()
- uint :: forall a. (Integral a, Bounded a) => Parser a
- int :: forall a. (Integral a, Bounded a) => Parser a
- integer :: Parser Integer
- uint_ :: forall a. (Integral a, Bounded a) => Parser a
- int_ :: (Integral a, Bounded a) => Parser a
- digit :: Parser Int
- hex :: forall a. (Integral a, FiniteBits a) => Parser a
- hex' :: forall a. (Integral a, FiniteBits a) => Parser a
- hex_ :: (Integral a, Bits a) => Parser a
- rational :: Fractional a => Parser a
- float :: Parser Float
- double :: Parser Double
- scientific :: Parser Scientific
- scientifically :: (Scientific -> a) -> Parser a
- rational' :: Fractional a => Parser a
- float' :: Parser Float
- double' :: Parser Double
- scientific' :: Parser Scientific
- scientifically' :: (Scientific -> a) -> Parser a
- day :: Parser Day
- localTime :: Parser LocalTime
- timeOfDay :: Parser TimeOfDay
- timeZone :: Parser (Maybe TimeZone)
- utcTime :: Parser UTCTime
- zonedTime :: Parser ZonedTime
- fail' :: Text -> Parser a

# Parser types

Simple parsing result, that represent respectively:

- Success: the remaining unparsed data and the parsed value
- Failure: the remaining unparsed data and the error message
- Partial: that need for more input data, supply empty bytes to indicate
`endOfInput`

#### Instances

type ParseError = [Text] Source #

Type alias for error message

Simple CPSed parser

A parser takes a failure continuation, and a success one, while the success continuation is
usually composed by `Monad`

instance, the failure one is more like a reader part, which can
be modified via `<?>`

. If you build parsers from ground, a pattern like this can be used:

xxParser = do ensureN errMsg ... -- make sure we have some bytes Parser $ kf k inp -> -- fail continuation, success continuation and input ... ... kf errMsg (if input not OK) ... k ... (if we get something useful for next parser)

# Running a parser

parse :: Parser a -> Bytes -> (Bytes, Either ParseError a) Source #

Parse the complete input, without resupplying, return the rest bytes

parse' :: Parser a -> Bytes -> Either ParseError a Source #

Parse the complete input, without resupplying

type ParseChunks m chunk err x = m chunk -> chunk -> m (chunk, Either err x) Source #

Type alias for a streaming parser, draw chunk from Monad m (with a initial chunk), return result in `Either err x`

.

parseChunks :: Monad m => Parser a -> ParseChunks m Bytes ParseError a Source #

Run a parser with an initial input string, and a monadic action that can supply more input if needed.

Note, once the monadic action return empty bytes, parsers will stop drawing
more bytes (take it as `endOfInput`

).

finishParsing :: Result a -> (Bytes, Either ParseError a) Source #

Finish parsing and fetch result, feed empty bytes if it's `Partial`

result.

match :: Parser a -> Parser (Bytes, a) Source #

Return both the result of a parse and the portion of the input that was consumed while it was being parsed.

# Basic parsers

ensureN :: Int -> ParseError -> Parser () Source #

Ensure that there are at least `n`

bytes available. If not, the
computation will escape with `Partial`

.

Since this parser is used in many other parsers, an extra error param is provide to attach custom error info.

endOfInput :: Parser () Source #

Test whether all input has been consumed, i.e. there are no remaining
undecoded bytes. Fail if not `atEnd`

.

Test whether all input has been consumed, i.e. there are no remaining undecoded bytes.

# Primitive decoders

decodePrim :: forall a. Unaligned a => Parser a Source #

# More parsers

scan :: s -> (s -> Word8 -> Maybe s) -> Parser (Bytes, s) Source #

A stateful scanner. The predicate consumes and transforms a
state argument, and each transformed state is passed to successive
invocations of the predicate on each byte of the input until one
returns `Nothing`

or the input ends.

This parser does not fail. It will return an empty string if the
predicate returns `Nothing`

on the first byte of input.

scanChunks :: forall s. s -> (s -> Bytes -> Either s (Bytes, Bytes, s)) -> Parser (Bytes, s) Source #

peekMaybe :: Parser (Maybe Word8) Source #

Match any byte, to perform lookahead. Returns `Nothing`

if end of
input has been reached. Does not consume any input.

Match any byte, to perform lookahead. Does not consume any input, but will fail if end of input has been reached.

satisfy :: (Word8 -> Bool) -> Parser Word8 Source #

The parser `satisfy p`

succeeds for any byte for which the
predicate `p`

returns `True`

. Returns the byte that is actually
parsed.

digit = satisfy isDigit where isDigit w = w >= 48 && w <= 57

satisfyWith :: (Word8 -> a) -> (a -> Bool) -> Parser a Source #

The parser `satisfyWith f p`

transforms a byte, and succeeds if
the predicate `p`

returns `True`

on the transformed value. The
parser returns the transformed byte that was parsed.

endOfLine :: Parser () Source #

Match either a single newline byte `'\n'`

, or a carriage
return followed by a newline byte `"\r\n"`

.

skipWhile :: (Word8 -> Bool) -> Parser () Source #

Skip past input for as long as the predicate returns `True`

.

skipSpaces :: Parser () Source #

Skip over white space using `isSpace`

.

takeTill :: (Word8 -> Bool) -> Parser Bytes Source #

Consume input as long as the predicate returns `False`

or reach the end of input,
and return the consumed input.

takeWhile :: (Word8 -> Bool) -> Parser Bytes Source #

Consume input as long as the predicate returns `True`

or reach the end of input,
and return the consumed input.

# Numeric parsers

## Decimal

uint :: forall a. (Integral a, Bounded a) => Parser a Source #

Parse and decode an unsigned decimal number.

Will fail in case of overflow.

int :: forall a. (Integral a, Bounded a) => Parser a Source #

Parse a decimal number with an optional leading `'+'`

or `'-'`

sign
character.

This parser will fail if overflow happens.

uint_ :: forall a. (Integral a, Bounded a) => Parser a Source #

Same with `uint`

, but sliently cast in case of overflow.

int_ :: (Integral a, Bounded a) => Parser a Source #

Same with `int`

, but sliently cast if overflow happens.

## Hex

hex :: forall a. (Integral a, FiniteBits a) => Parser a Source #

Parse and decode an unsigned hex number, fail if input length is larger than (bit_size/4). The hex digits
`'a'`

through `'f'`

may be upper or lower case.

This parser does not accept a leading `"0x"`

string, and consider
sign bit part of the binary hex nibbles, e.g.

`>>>`

`parse' hex "FF" == Right (-1 :: Int8)`

`>>>`

`parse' hex "7F" == Right (127 :: Int8)`

`>>>`

`parse' hex "7Ft" == Right (127 :: Int8)`

`>>>`

`parse' hex "7FF" == Left ["Z.Data.Parser.Numeric.hex","hex numeric number overflow"]`

hex' :: forall a. (Integral a, FiniteBits a) => Parser a Source #

Same with `hex`

, but only take as many as (bit_size/4) bytes.

`>>>`

`parse' hex "FF" == Right (-1 :: Int8)`

`>>>`

`parse' hex "7F" == Right (127 :: Int8)`

`>>>`

`parse' hex "7Ft" == Right (127 :: Int8)`

`>>>`

`parse' hex "7FF" == Right (127 :: Int8)`

hex_ :: (Integral a, Bits a) => Parser a Source #

Same with `hex`

, but silently cast in case of overflow.

`>>>`

`parse' hex "FF" == Right (-1 :: Int8)`

`>>>`

`parse' hex "7F" == Right (127 :: Int8)`

`>>>`

`parse' hex "7Ft" == Right (127 :: Int8)`

`>>>`

`parse' hex "7FF" == Right (-1 :: Int8)`

## Fractional

rational :: Fractional a => Parser a Source #

Parse a rational number.

The syntax accepted by this parser is the same as for `double`

.

*Note*: this parser is not safe for use with inputs from untrusted
sources. An input with a suitably large exponent such as
`"1e1000000000"`

will cause a huge `Integer`

to be allocated,
resulting in what is effectively a denial-of-service attack.

In most cases, it is better to use `double`

or `scientific`

instead.

double :: Parser Double Source #

Parse a rational number and round to `Double`

.

This parser accepts an optional leading sign character, followed by
at least one decimal digit. The syntax similar to that accepted by
the `read`

function, with the exception that a trailing `'.'`

or
`'e'`

*not* followed by a number is not consumed.

Examples with behaviour identical to `read`

:

parse' double "3" == ("", Right 3.0) parse' double "3.1" == ("", Right 3.1) parse' double "3e4" == ("", Right 30000.0) parse' double "3.1e4" == ("", Right 31000.0)

parse' double ".3" == (".3", Left ParserError) parse' double "e3" == ("e3", Left ParserError)

Examples of differences from `read`

:

parse' double "3.foo" == (".foo", Right 3.0) parse' double "3e" == ("e", Right 3.0) parse' double "-3e" == ("e", Right -3.0)

This function does not accept string representations of "NaN" or "Infinity".

scientific :: Parser Scientific Source #

Parse a scientific number.

The syntax accepted by this parser is the same as for `double`

.

scientifically :: (Scientific -> a) -> Parser a Source #

Parse a scientific number and convert to result using a user supply function.

The syntax accepted by this parser is the same as for `double`

.

# Stricter fractional(rfc8259)

rational' :: Fractional a => Parser a Source #

Parse a rational number.

The syntax accepted by this parser is the same as for `double'`

.

*Note*: this parser is not safe for use with inputs from untrusted
sources. An input with a suitably large exponent such as
`"1e1000000000"`

will cause a huge `Integer`

to be allocated,
resulting in what is effectively a denial-of-service attack.

In most cases, it is better to use `double'`

or `scientific'`

instead.

double' :: Parser Double Source #

More strict number parsing(rfc8259).

`scientific`

support parse `2314.`

and `21321exyz`

without eating extra dot or `e`

via
backtrack, this is not allowed in some strict grammer such as JSON, so we make an
non-backtrack strict number parser separately using LL(1) lookahead. This parser also
agree with `read`

on extra dot or e handling:

parse' double "3.foo" == Left ParseError parse' double "3e" == Left ParseError

Leading zeros or `+`

sign is also not allowed:

parse' double "+3.14" == Left ParseError parse' double "0014" == Left ParseError

If you have a similar grammer, you can use this parser to save considerable time.

number = [ minus ] int [ frac ] [ exp ] decimal-point = %x2E ; . digit1-9 = %x31-39 ; 1-9 e = %x65 / %x45 ; e E exp = e [ minus / plus ] 1*DIGIT frac = decimal-point 1*DIGIT

This function does not accept string representations of "NaN" or "Infinity". reference: https://tools.ietf.org/html/rfc8259#section-6

scientific' :: Parser Scientific Source #

Parse a scientific number.

The syntax accepted by this parser is the same as for `double'`

.

scientifically' :: (Scientific -> a) -> Parser a Source #

Parse a scientific number and convert to result using a user supply function.

The syntax accepted by this parser is the same as for `double'`

.

# Time

localTime :: Parser LocalTime Source #

Parse a date and time, of the form `YYYY-MM-DD HH:MM[:SS[.SSS]]`

.
The space may be replaced with a `T`

. The number of seconds is optional
and may be followed by a fractional component.

timeZone :: Parser (Maybe TimeZone) Source #

Parse a time zone, and return `Nothing`

if the offset from UTC is
zero. (This makes some speedups possible.)

utcTime :: Parser UTCTime Source #

Behaves as `zonedTime`

, but converts any time zone offset into a -- UTC time.

zonedTime :: Parser ZonedTime Source #

Parse a date with time zone info. Acceptable formats:

YYYY-MM-DD HH:MM Z YYYY-MM-DD HH:MM:SS Z YYYY-MM-DD HH:MM:SS.SSS Z

The first space may instead be a `T`

, and the second space is
optional. The `Z`

represents UTC. The `Z`

may be replaced with a
time zone offset of the form `+0000`

or `-08:00`

, where the first
two digits are hours, the `:`

is optional and the second two digits
(also optional) are minutes.