Copyright	(c) Dong Han 2017-2018
License	BSD
Maintainer	winterland1989@gmail.com
Stability	experimental
Portability	non-portable
Safe Haskell	None
Language	Haskell2010

Z.Data.Parser

Contents

Parser types
Running a parser
Basic parsers
Primitive decoders
More parsers
Numeric parsers
Stricter fractional(rfc8259)
Time
Misc

Description

This module provide a simple resumable Parser, which is suitable for binary protocol and simple textual protocol parsing. Both binary parsers (decodePrim ,etc) and textual parsers are provided, and they all work on Bytes.

You can use Alternative instance to do backtracking, each branch will either succeed and may consume some input, or fail without consume anything. It's recommend to use peek or peekMaybe to avoid backtracking if possible to get high performance.

Error message can be attached using <?>, which have very small overhead, so it's recommended to attach a message in front of a composed parser like xPacket = "Foo.Bar.xPacket" ? do ..., following is an example message when parsing an integer failed:

    >parse int "foo"
    ([102,111,111],Left ["Z.Data.Parser.Numeric.int","Std.Data.Parser.Base.takeWhile1: no satisfied byte"])
    -- It's easy to see we're trying to match a leading sign or digit here

Use parser-combinators to get combinators based on Applicative or Monad instance, such as manyTill, sepBy, etc.

Synopsis

data Result a
- = Success a !Bytes
- | Failure ParseError !Bytes
- | Partial (ParseStep a)
type ParseError = [Text]
data Parser a
(<?>) :: Text -> Parser a -> Parser a
parse :: Parser a -> Bytes -> (Bytes, Either ParseError a)
parse' :: Parser a -> Bytes -> Either ParseError a
parseChunk :: Parser a -> Bytes -> Result a
type ParseChunks m chunk err x = m chunk -> chunk -> m (chunk, Either err x)
parseChunks :: Monad m => Parser a -> ParseChunks m Bytes ParseError a
finishParsing :: Result a -> (Bytes, Either ParseError a)
runAndKeepTrack :: Parser a -> Parser (Result a, [Bytes])
match :: Parser a -> Parser (Bytes, a)
ensureN :: Int -> ParseError -> Parser ()
endOfInput :: Parser ()
atEnd :: Parser Bool
decodePrim :: forall a. Unaligned a => Parser a
decodePrimLE :: forall a. Unaligned (LE a) => Parser a
decodePrimBE :: forall a. Unaligned (BE a) => Parser a
scan :: s -> (s -> Word8 -> Maybe s) -> Parser (Bytes, s)
scanChunks :: forall s. s -> (s -> Bytes -> Either s (Bytes, Bytes, s)) -> Parser (Bytes, s)
peekMaybe :: Parser (Maybe Word8)
peek :: Parser Word8
satisfy :: (Word8 -> Bool) -> Parser Word8
satisfyWith :: (Word8 -> a) -> (a -> Bool) -> Parser a
anyWord8 :: Parser Word8
word8 :: Word8 -> Parser ()
anyChar8 :: Parser Char
char8 :: Char -> Parser ()
skipWord8 :: Parser ()
endOfLine :: Parser ()
skip :: Int -> Parser ()
skipWhile :: (Word8 -> Bool) -> Parser ()
skipSpaces :: Parser ()
take :: Int -> Parser Bytes
takeTill :: (Word8 -> Bool) -> Parser Bytes
takeWhile :: (Word8 -> Bool) -> Parser Bytes
takeWhile1 :: (Word8 -> Bool) -> Parser Bytes
takeRemaining :: Parser Bytes
bytes :: Bytes -> Parser ()
bytesCI :: Bytes -> Parser ()
text :: Text -> Parser ()
uint :: forall a. (Integral a, Bounded a) => Parser a
int :: forall a. (Integral a, Bounded a) => Parser a
integer :: Parser Integer
uint_ :: forall a. (Integral a, Bounded a) => Parser a
int_ :: (Integral a, Bounded a) => Parser a
digit :: Parser Int
hex :: forall a. (Integral a, FiniteBits a) => Parser a
hex' :: forall a. (Integral a, FiniteBits a) => Parser a
hex_ :: (Integral a, Bits a) => Parser a
rational :: Fractional a => Parser a
float :: Parser Float
double :: Parser Double
scientific :: Parser Scientific
scientifically :: (Scientific -> a) -> Parser a
rational' :: Fractional a => Parser a
float' :: Parser Float
double' :: Parser Double
scientific' :: Parser Scientific
scientifically' :: (Scientific -> a) -> Parser a
day :: Parser Day
localTime :: Parser LocalTime
timeOfDay :: Parser TimeOfDay
timeZone :: Parser (Maybe TimeZone)
utcTime :: Parser UTCTime
zonedTime :: Parser ZonedTime
fail' :: Text -> Parser a

Parser types

data Result a Source #

Simple parsing result, that represent respectively:

Success: the remaining unparsed data and the parsed value
Failure: the remaining unparsed data and the error message
Partial: that need for more input data, supply empty bytes to indicate endOfInput

Constructors

Success a !Bytes
Failure ParseError !Bytes
Partial (ParseStep a)

Instances

Instances details

Functor Result Source #
Instance details Defined in Z.Data.Parser.Base Methods fmap :: (a -> b) -> Result a -> Result b # (<$) :: a -> Result b -> Result a #
Show a => Show (Result a) Source #
Instance details Defined in Z.Data.Parser.Base Methods showsPrec :: Int -> Result a -> ShowS # show :: Result a -> String # showList :: [Result a] -> ShowS #

type ParseError = [Text] Source #

Type alias for error message

data Parser a Source #

Simple CPSed parser

A parser takes a failure continuation, and a success one, while the success continuation is usually composed by Monad instance, the failure one is more like a reader part, which can be modified via <?>. If you build parsers from ground, a pattern like this can be used:

   xxParser = do
     ensureN errMsg ...            -- make sure we have some bytes
     Parser $  kf k inp ->        -- fail continuation, success continuation and input
       ...
       ... kf errMsg (if input not OK)
       ... k ... (if we get something useful for next parser)

Instances

Instances details

Monad Parser Source #
Instance details Defined in Z.Data.Parser.Base Methods (>>=) :: Parser a -> (a -> Parser b) -> Parser b # (>>) :: Parser a -> Parser b -> Parser b # return :: a -> Parser a #
Functor Parser Source #
Instance details Defined in Z.Data.Parser.Base Methods fmap :: (a -> b) -> Parser a -> Parser b # (<$) :: a -> Parser b -> Parser a #
MonadFail Parser Source #
Instance details Defined in Z.Data.Parser.Base Methods fail :: String -> Parser a #
Applicative Parser Source #
Instance details Defined in Z.Data.Parser.Base Methods pure :: a -> Parser a # (<>) :: Parser (a -> b) -> Parser a -> Parser b # liftA2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c # (>) :: Parser a -> Parser b -> Parser b # (<*) :: Parser a -> Parser b -> Parser a #
Alternative Parser Source #
Instance details Defined in Z.Data.Parser.Base Methods empty :: Parser a # (<\|>) :: Parser a -> Parser a -> Parser a # some :: Parser a -> Parser [a] # many :: Parser a -> Parser [a] #
MonadPlus Parser Source #
Instance details Defined in Z.Data.Parser.Base Methods mzero :: Parser a # mplus :: Parser a -> Parser a -> Parser a #

(<?>) :: Text -> Parser a -> Parser a infixr 0 Source #

Running a parser

parse :: Parser a -> Bytes -> (Bytes, Either ParseError a) Source #

Parse the complete input, without resupplying, return the rest bytes

parse' :: Parser a -> Bytes -> Either ParseError a Source #

Parse the complete input, without resupplying

parseChunk :: Parser a -> Bytes -> Result a Source #

Parse an input chunk

type ParseChunks m chunk err x = m chunk -> chunk -> m (chunk, Either err x) Source #

Type alias for a streaming parser, draw chunk from Monad m (with a initial chunk), return result in Either err x.

parseChunks :: Monad m => Parser a -> ParseChunks m Bytes ParseError a Source #

Run a parser with an initial input string, and a monadic action that can supply more input if needed.

Note, once the monadic action return empty bytes, parsers will stop drawing more bytes (take it as endOfInput).

finishParsing :: Result a -> (Bytes, Either ParseError a) Source #

Finish parsing and fetch result, feed empty bytes if it's Partial result.

runAndKeepTrack :: Parser a -> Parser (Result a, [Bytes]) Source #

Run a parser and keep track of all the input chunks it consumes. Once it's finished, return the final result (always Success or Failure) and all consumed chunks.

match :: Parser a -> Parser (Bytes, a) Source #

Return both the result of a parse and the portion of the input that was consumed while it was being parsed.

Basic parsers

ensureN :: Int -> ParseError -> Parser () Source #

Ensure that there are at least n bytes available. If not, the computation will escape with Partial.

Since this parser is used in many other parsers, an extra error param is provide to attach custom error info.

endOfInput :: Parser () Source #

Test whether all input has been consumed, i.e. there are no remaining undecoded bytes. Fail if not atEnd.

atEnd :: Parser Bool Source #

Test whether all input has been consumed, i.e. there are no remaining undecoded bytes.

Primitive decoders

decodePrim :: forall a. Unaligned a => Parser a Source #

decodePrimLE :: forall a. Unaligned (LE a) => Parser a Source #

decodePrimBE :: forall a. Unaligned (BE a) => Parser a Source #

More parsers

scan :: s -> (s -> Word8 -> Maybe s) -> Parser (Bytes, s) Source #

A stateful scanner. The predicate consumes and transforms a state argument, and each transformed state is passed to successive invocations of the predicate on each byte of the input until one returns Nothing or the input ends.

This parser does not fail. It will return an empty string if the predicate returns Nothing on the first byte of input.

scanChunks :: forall s. s -> (s -> Bytes -> Either s (Bytes, Bytes, s)) -> Parser (Bytes, s) Source #

Similar to scan, but working on Bytes chunks, The predicate consumes a Bytes chunk and transforms a state argument, and each transformed state is passed to successive invocations of the predicate on each chunk of the input until one chunk got splited to Right (V.Bytes, V.Bytes) or the input ends.

peekMaybe :: Parser (Maybe Word8) Source #

Match any byte, to perform lookahead. Returns Nothing if end of input has been reached. Does not consume any input.

peek :: Parser Word8 Source #

Match any byte, to perform lookahead. Does not consume any input, but will fail if end of input has been reached.

satisfy :: (Word8 -> Bool) -> Parser Word8 Source #

The parser satisfy p succeeds for any byte for which the predicate p returns True. Returns the byte that is actually parsed.

digit = satisfy isDigit
    where isDigit w = w >= 48 && w <= 57

satisfyWith :: (Word8 -> a) -> (a -> Bool) -> Parser a Source #

The parser satisfyWith f p transforms a byte, and succeeds if the predicate p returns True on the transformed value. The parser returns the transformed byte that was parsed.

anyWord8 :: Parser Word8 Source #

Return a byte, this is an alias to decodePrim Word8@.

word8 :: Word8 -> Parser () Source #

Match a specific byte.

anyChar8 :: Parser Char Source #

Take a byte and return as a 8bit char.

char8 :: Char -> Parser () Source #

Match a specific 8bit char.

skipWord8 :: Parser () Source #

Skip a byte.

endOfLine :: Parser () Source #

Match either a single newline byte '\n', or a carriage return followed by a newline byte "\r\n".

skip :: Int -> Parser () Source #

skip N bytes.

skipWhile :: (Word8 -> Bool) -> Parser () Source #

Skip past input for as long as the predicate returns True.

skipSpaces :: Parser () Source #

Skip over white space using isSpace.

take :: Int -> Parser Bytes Source #

takeTill :: (Word8 -> Bool) -> Parser Bytes Source #

Consume input as long as the predicate returns False or reach the end of input, and return the consumed input.

takeWhile :: (Word8 -> Bool) -> Parser Bytes Source #

Consume input as long as the predicate returns True or reach the end of input, and return the consumed input.

takeWhile1 :: (Word8 -> Bool) -> Parser Bytes Source #

Similar to takeWhile, but requires the predicate to succeed on at least one byte of input: it will fail if the predicate never returns True or reach the end of input

takeRemaining :: Parser Bytes Source #

Take all the remaining input chunks and return as Bytes.

bytes :: Bytes -> Parser () Source #

bytes s parses a sequence of bytes that identically match s.

bytesCI :: Bytes -> Parser () Source #

Same as bytes but ignoring case.

text :: Text -> Parser () Source #

text s parses a sequence of UTF8 bytes that identically match s.

Numeric parsers

Decimal

uint :: forall a. (Integral a, Bounded a) => Parser a Source #

Parse and decode an unsigned decimal number.

Will fail in case of overflow.

int :: forall a. (Integral a, Bounded a) => Parser a Source #

Parse a decimal number with an optional leading '+' or '-' sign character.

This parser will fail if overflow happens.

integer :: Parser Integer Source #

Parser specifically optimized for Integer.

uint_ :: forall a. (Integral a, Bounded a) => Parser a Source #

Same with uint, but sliently cast in case of overflow.

int_ :: (Integral a, Bounded a) => Parser a Source #

Same with int, but sliently cast if overflow happens.

digit :: Parser Int Source #

Take a single decimal digit and return as Int.

Hex

hex :: forall a. (Integral a, FiniteBits a) => Parser a Source #

Parse and decode an unsigned hex number, fail if input length is larger than (bit_size/4). The hex digits 'a' through 'f' may be upper or lower case.

This parser does not accept a leading "0x" string, and consider sign bit part of the binary hex nibbles, e.g.

>>> parse' hex "FF" == Right (-1 :: Int8)
>>> parse' hex "7F" == Right (127 :: Int8)
>>> parse' hex "7Ft" == Right (127 :: Int8)
>>> parse' hex "7FF" == Left ["Z.Data.Parser.Numeric.hex","hex numeric number overflow"]

hex' :: forall a. (Integral a, FiniteBits a) => Parser a Source #

Same with hex, but only take as many as (bit_size/4) bytes.

>>> parse' hex "FF" == Right (-1 :: Int8)
>>> parse' hex "7F" == Right (127 :: Int8)
>>> parse' hex "7Ft" == Right (127 :: Int8)
>>> parse' hex "7FF" == Right (127 :: Int8)

hex_ :: (Integral a, Bits a) => Parser a Source #

Same with hex, but silently cast in case of overflow.

>>> parse' hex "FF" == Right (-1 :: Int8)
>>> parse' hex "7F" == Right (127 :: Int8)
>>> parse' hex "7Ft" == Right (127 :: Int8)
>>> parse' hex "7FF" == Right (-1 :: Int8)

Fractional

rational :: Fractional a => Parser a Source #

Parse a rational number.

The syntax accepted by this parser is the same as for double.

Note: this parser is not safe for use with inputs from untrusted sources. An input with a suitably large exponent such as "1e1000000000" will cause a huge Integer to be allocated, resulting in what is effectively a denial-of-service attack.

In most cases, it is better to use double or scientific instead.

float :: Parser Float Source #

Parse a rational number and round to Float.

Single precision version of double.

double :: Parser Double Source #

Parse a rational number and round to Double.

This parser accepts an optional leading sign character, followed by at least one decimal digit. The syntax similar to that accepted by the read function, with the exception that a trailing '.' or 'e' not followed by a number is not consumed.

Examples with behaviour identical to read:

parse' double "3"     == ("", Right 3.0)
parse' double "3.1"   == ("", Right 3.1)
parse' double "3e4"   == ("", Right 30000.0)
parse' double "3.1e4" == ("", Right 31000.0)

parse' double ".3"    == (".3", Left ParserError)
parse' double "e3"    == ("e3", Left ParserError)

Examples of differences from read:

parse' double "3.foo" == (".foo", Right 3.0)
parse' double "3e"    == ("e",    Right 3.0)
parse' double "-3e"   == ("e",    Right -3.0)

This function does not accept string representations of "NaN" or "Infinity".

scientific :: Parser Scientific Source #

Parse a scientific number.

The syntax accepted by this parser is the same as for double.

scientifically :: (Scientific -> a) -> Parser a Source #

Parse a scientific number and convert to result using a user supply function.

The syntax accepted by this parser is the same as for double.

Stricter fractional(rfc8259)

rational' :: Fractional a => Parser a Source #

Parse a rational number.

The syntax accepted by this parser is the same as for double'.

Note: this parser is not safe for use with inputs from untrusted sources. An input with a suitably large exponent such as "1e1000000000" will cause a huge Integer to be allocated, resulting in what is effectively a denial-of-service attack.

In most cases, it is better to use double' or scientific' instead.

float' :: Parser Float Source #

Parse a rational number and round to Float using stricter grammer.

Single precision version of double'.

double' :: Parser Double Source #

More strict number parsing(rfc8259).

scientific support parse 2314. and 21321exyz without eating extra dot or e via backtrack, this is not allowed in some strict grammer such as JSON, so we make an non-backtrack strict number parser separately using LL(1) lookahead. This parser also agree with read on extra dot or e handling:

parse' double "3.foo" == Left ParseError
parse' double "3e"    == Left ParseError

Leading zeros or + sign is also not allowed:

parse' double "+3.14" == Left ParseError
parse' double "0014" == Left ParseError

If you have a similar grammer, you can use this parser to save considerable time.

     number = [ minus ] int [ frac ] [ exp ]
     decimal-point = %x2E       ; .
     digit1-9 = %x31-39         ; 1-9
     e = %x65 / %x45            ; e E
     exp = e [ minus / plus ] 1*DIGIT
     frac = decimal-point 1*DIGIT

This function does not accept string representations of "NaN" or "Infinity". reference: https://tools.ietf.org/html/rfc8259#section-6

scientific' :: Parser Scientific Source #

Parse a scientific number.

The syntax accepted by this parser is the same as for double'.

scientifically' :: (Scientific -> a) -> Parser a Source #

Parse a scientific number and convert to result using a user supply function.

The syntax accepted by this parser is the same as for double'.

Time

day :: Parser Day Source #

Parse a date of the form [+,-]YYYY-MM-DD.

localTime :: Parser LocalTime Source #

Parse a date and time, of the form YYYY-MM-DD HH:MM[:SS[.SSS]]. The space may be replaced with a T. The number of seconds is optional and may be followed by a fractional component.

timeOfDay :: Parser TimeOfDay Source #

Parse a time of the form HH:MM[:SS[.SSS]].

timeZone :: Parser (Maybe TimeZone) Source #

Parse a time zone, and return Nothing if the offset from UTC is zero. (This makes some speedups possible.)

utcTime :: Parser UTCTime Source #

Behaves as zonedTime, but converts any time zone offset into a -- UTC time.

zonedTime :: Parser ZonedTime Source #

Parse a date with time zone info. Acceptable formats:

  YYYY-MM-DD HH:MM Z
  YYYY-MM-DD HH:MM:SS Z
  YYYY-MM-DD HH:MM:SS.SSS Z

The first space may instead be a T, and the second space is optional. The Z represents UTC. The Z may be replaced with a time zone offset of the form +0000 or -08:00, where the first two digits are hours, the : is optional and the second two digits (also optional) are minutes.

Misc

fail' :: Text -> Parser a Source #

Text version of fail.