flatparse-0.1.0.1: High-performance parsing from strict bytestrings
Safe HaskellNone
LanguageHaskell2010

FlatParse.Stateful

Description

This module implements a Parser supporting a reader environment, custom error types, and an Int state.

Synopsis

Parser types and constructors

newtype Parser r e a Source #

Parser r e a has a reader environment r, an error type e and a return type a.

Constructors

Parser 

Fields

Instances

Instances details
Monad (Parser r e) Source # 
Instance details

Defined in FlatParse.Stateful

Methods

(>>=) :: Parser r e a -> (a -> Parser r e b) -> Parser r e b #

(>>) :: Parser r e a -> Parser r e b -> Parser r e b #

return :: a -> Parser r e a #

Functor (Parser r e) Source # 
Instance details

Defined in FlatParse.Stateful

Methods

fmap :: (a -> b) -> Parser r e a -> Parser r e b #

(<$) :: a -> Parser r e b -> Parser r e a #

Applicative (Parser r e) Source # 
Instance details

Defined in FlatParse.Stateful

Methods

pure :: a -> Parser r e a #

(<*>) :: Parser r e (a -> b) -> Parser r e a -> Parser r e b #

liftA2 :: (a -> b -> c) -> Parser r e a -> Parser r e b -> Parser r e c #

(*>) :: Parser r e a -> Parser r e b -> Parser r e b #

(<*) :: Parser r e a -> Parser r e b -> Parser r e a #

type Res# e a = (# (# a, Addr#, Int# #) | (# #) | (# e #) #) Source #

Primitive result of a parser. Possible results are given by OK#, Err# and Fail# pattern synonyms.

pattern OK# :: a -> Addr# -> Int# -> Res# e a Source #

Contains return value, pointer to the rest of the input buffer and the nex Int state.

pattern Fail# :: Res# e a Source #

Constructor for recoverable failure.

pattern Err# :: e -> Res# e a Source #

Constructor for errors which are by default non-recoverable.

data Result e a Source #

Higher-level boxed data type for parsing results.

Constructors

OK a Int !ByteString

Contains return value, last Int state, unconsumed input.

Fail

Recoverable-by-default failure.

Err !e

Unrecoverble-by-default error.

Instances

Instances details
Functor (Result e) Source # 
Instance details

Defined in FlatParse.Stateful

Methods

fmap :: (a -> b) -> Result e a -> Result e b #

(<$) :: a -> Result e b -> Result e a #

(Show a, Show e) => Show (Result e a) Source # 
Instance details

Defined in FlatParse.Stateful

Methods

showsPrec :: Int -> Result e a -> ShowS #

show :: Result e a -> String #

showList :: [Result e a] -> ShowS #

Running parsers

runParser :: Parser r e a -> r -> Int -> ByteString -> Result e a Source #

Run a parser.

runParserS :: Parser r e a -> r -> Int -> String -> Result e a Source #

Run a parser on a String input. Reminder: OverloadedStrings for ByteString does not yield a valid UTF-8 encoding! For non-ASCII ByteString literal input, use runParserS or packUTF8 for testing.

Actions on the state and the environment

get :: Parser r e Int Source #

Query the Int state.

put :: Int -> Parser r e () Source #

Write the Int state.

modify :: (Int -> Int) -> Parser r e () Source #

Modify the Int state.

ask :: Parser r e r Source #

Query the read-only environment.

local :: (r' -> r) -> Parser r e a -> Parser r' e a Source #

Run a parser in a modified environment.

Errors and failures

empty :: Parser r e a Source #

The failing parser. By default, parser choice (<|>) arbitrarily backtracks on parser failure.

err :: e -> Parser r e a Source #

Throw a parsing error. By default, parser choice (<|>) can't backtrack on parser error. Use try to convert an error to a recoverable failure.

lookahead :: Parser r e a -> Parser r e a Source #

Save the parsing state, then run a parser, then restore the state.

fails :: Parser r e a -> Parser r e () Source #

Convert a parsing failure to a success.

try :: Parser r e a -> Parser r e a Source #

Convert a parsing error into failure.

optional :: Parser r e a -> Parser r e (Maybe a) Source #

Convert a parsing failure to a Maybe. If possible, use optioned instead.

optioned :: Parser r e a -> (a -> Parser r e b) -> Parser r e b -> Parser r e b Source #

CPS'd version of optional. This is usually more efficient, since it gets rid of the extra Maybe allocation.

cut :: Parser r e a -> e -> Parser r e a Source #

Convert a parsing failure to an error.

cutting :: Parser r e a -> e -> (e -> e -> e) -> Parser r e a Source #

Run the parser, if we get a failure, throw the given error, but if we get an error, merge the inner and the newly given errors using the e -> e -> e function. This can be useful for implementing parsing errors which may propagate hints or accummulate contextual information.

Basic lexing and parsing

eof :: Parser r e () Source #

Succeed if the input is empty.

char :: Char -> Q Exp Source #

Parse a UTF-8 character literal. This is a template function, you can use it as $(char 'x'), for example, and the splice in this case has type Parser r e ().

byte :: Word8 -> Parser r e () Source #

Read a Word8.

bytes :: [Word8] -> Q Exp Source #

Read a sequence of bytes. This is a template function, you can use it as $(bytes [3, 4, 5]), for example, and the splice has type Parser r e ().

string :: String -> Q Exp Source #

Parse a UTF-8 string literal. This is a template function, you can use it as $(string "foo"), for example, and the splice has type Parser r e ().

switch :: Q Exp -> Q Exp Source #

This is a template function which makes it possible to branch on a collection of string literals in an efficient way. By using switch, such branching is compiled to a trie of primitive parsing operations, which has optimized control flow, vectorized reads and grouped checking for needed input bytes.

The syntax is slightly magical, it overloads the usual case expression. An example:

    $(switch [| case _ of
        "foo" -> pure True
        "bar" -> pure False |])

The underscore is mandatory in case _ of. Each branch must be a string literal, but optionally we may have a default case, like in

    $(switch [| case _ of
        "foo" -> pure 10
        "bar" -> pure 20
        _     -> pure 30 |])

All case right hand sides must be parsers with the same type. That type is also the type of the whole switch expression.

A switch has longest match semantics, and the order of cases does not matter, except for the default case, which may only appear as the last case.

If a switch does not have a default case, and no case matches the input, then it returns with failure, without having consumed any input. A fallthrough to the default case also does not consume any input.

switchWithPost :: Maybe (Q Exp) -> Q Exp -> Q Exp Source #

Switch expression with an optional first argument for performing a post-processing action after every successful branch matching. For example, if we have ws :: Parser r e () for a whitespace parser, we might want to consume whitespace after matching on any of the switch cases. For that case, we can define a "lexeme" version of switch as follows.

  switch' :: Q Exp -> Q Exp
  switch' = switchWithPost (Just [| ws |])

Note that this switch' function cannot be used in the same module it's defined in, because of the stage restriction of Template Haskell.

rawSwitchWithPost :: Maybe (Q Exp) -> [(String, Q Exp)] -> Maybe (Q Exp) -> Q Exp Source #

Version of switchWithPost without syntactic sugar. The second argument is the list of cases, the third is the default case.

satisfy :: (Char -> Bool) -> Parser r e Char Source #

Parse a UTF-8 Char for which a predicate holds.

satisfyASCII :: (Char -> Bool) -> Parser r e Char Source #

Parse an ASCII Char for which a predicate holds. Assumption: the predicate must only return True for ASCII-range characters. Otherwise this function might read a 128-255 range byte, thereby breaking UTF-8 decoding.

satisfyASCII_ :: (Char -> Bool) -> Parser r e () Source #

Parse an ASCII Char for which a predicate holds.

fusedSatisfy :: (Char -> Bool) -> (Char -> Bool) -> (Char -> Bool) -> (Char -> Bool) -> Parser r e Char Source #

This is a variant of satisfy which allows more optimization. We can pick four testing functions for the four cases for the possible number of bytes in the UTF-8 character. So in fusedSatisfy f1 f2 f3 f4, if we read a one-byte character, the result is scrutinized with f1, for two-bytes, with f2, and so on. This can result in dramatic lexing speedups.

For example, if we want to accept any letter, the naive solution would be to use isLetter, but this accesses a large lookup table of Unicode character classes. We can do better with fusedSatisfy isLatinLetter isLetter isLetter isLetter, since here the isLatinLetter is inlined into the UTF-8 decoding, and it probably handles a great majority of all cases without accessing the character table.

anyWord8 :: Parser r e Word8 Source #

Parse any Word8.

anyWord :: Parser r e Word Source #

Parse any Word.

anyChar :: Parser r e Char Source #

Parse any UTF-8-encoded Char.

anyChar_ :: Parser r e () Source #

Skip any UTF-8-encoded Char.

anyCharASCII :: Parser r e Char Source #

Parse any Char in the ASCII range, fail if the next input character is not in the range. This is more efficient than anyChar if we are only working with ASCII.

anyCharASCII_ :: Parser r e () Source #

Skip any Char in the ASCII range. More efficient than anyChar_ if we're working only with ASCII.

isDigit :: Char -> Bool Source #

isDigit c = '0' <= c && c <= '9'

isGreekLetter :: Char -> Bool Source #

isGreekLetter c = ('Α' <= c && c <= 'Ω') || ('α' <= c && c <= 'ω')

isLatinLetter :: Char -> Bool Source #

isLatinLetter c = ('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z')

Combinators

(<|>) :: Parser r e a -> Parser r e a -> Parser r e a infixr 6 Source #

Choose between two parsers. If the first parser fails, try the second one, but if the first one throws an error, propagate the error.

branch :: Parser r e a -> Parser r e b -> Parser r e b -> Parser r e b Source #

Branch on a parser: if the first argument fails, continue with the second, else with the third. This can produce slightly more efficient code than (<|>). Moreover, ḃranch does not backtrack from the true/false cases.

chainl :: (b -> a -> b) -> Parser r e b -> Parser r e a -> Parser r e b Source #

An analogue of the list foldl function: first parse a b, then parse zero or more a-s, and combine the results in a left-nested way by the b -> a -> b function. Note: this is not the usual chainl function from the parsec libraries!

chainr :: (a -> b -> b) -> Parser r e a -> Parser r e b -> Parser r e b Source #

An analogue of the list foldr function: parse zero or more a-s, terminated by a b, and combine the results in a right-nested way using the a -> b -> b function. Note: this is not the usual chainr function from the parsec libraries!

many :: Parser r e a -> Parser r e [a] Source #

Run a parser zero or more times, collect the results in a list. Note: for optimal performance, try to avoid this. Often it is possible to get rid of the intermediate list by using a combinator or a custom parser.

many_ :: Parser r e a -> Parser r e () Source #

Skip a parser zero or more times.

some :: Parser r e a -> Parser r e [a] Source #

Run a parser one or more times, collect the results in a list. Note: for optimal performance, try to avoid this. Often it is possible to get rid of the intermediate list by using a combinator or a custom parser.

some_ :: Parser r e a -> Parser r e () Source #

Skip a parser one or more times.

notFollowedBy :: Parser r e a -> Parser r e b -> Parser r e a Source #

Succeed if the first parser succeeds and the second one fails. The parsing state is restored to the point of the first argument's success.

Positions and spans

newtype Pos Source #

Byte offset counted backwards from the end of the buffer.

Constructors

Pos Int 

Instances

Instances details
Eq Pos Source # 
Instance details

Defined in FlatParse.Stateful

Methods

(==) :: Pos -> Pos -> Bool #

(/=) :: Pos -> Pos -> Bool #

Ord Pos Source # 
Instance details

Defined in FlatParse.Stateful

Methods

compare :: Pos -> Pos -> Ordering #

(<) :: Pos -> Pos -> Bool #

(<=) :: Pos -> Pos -> Bool #

(>) :: Pos -> Pos -> Bool #

(>=) :: Pos -> Pos -> Bool #

max :: Pos -> Pos -> Pos #

min :: Pos -> Pos -> Pos #

Show Pos Source # 
Instance details

Defined in FlatParse.Stateful

Methods

showsPrec :: Int -> Pos -> ShowS #

show :: Pos -> String #

showList :: [Pos] -> ShowS #

data Span Source #

A pair of positions.

Constructors

Span !Pos !Pos 

Instances

Instances details
Eq Span Source # 
Instance details

Defined in FlatParse.Stateful

Methods

(==) :: Span -> Span -> Bool #

(/=) :: Span -> Span -> Bool #

Show Span Source # 
Instance details

Defined in FlatParse.Stateful

Methods

showsPrec :: Int -> Span -> ShowS #

show :: Span -> String #

showList :: [Span] -> ShowS #

getPos :: Parser r e Pos Source #

Get the current position in the input.

setPos :: Pos -> Parser r e () Source #

Set the input position. Warning: this can result in crashes if the position points outside the current buffer. It is always safe to setPos values which came from getPos with the current input.

endPos :: Pos Source #

The end of the input.

spanOf :: Parser r e a -> Parser r e Span Source #

Return the consumed span of a parser. Use spanned if possible for better efficiency.

spanned :: Parser r e a -> (a -> Span -> Parser r e b) -> Parser r e b Source #

Bind the result together with the span of the result. CPS'd version of spanOf for better unboxing.

byteStringOf :: Parser r e a -> Parser r e ByteString Source #

Return the ByteString consumed by a parser. Note: it's more efficient to use spanOf and spanned instead.

byteStringed :: Parser r e a -> (a -> ByteString -> Parser r e b) -> Parser r e b Source #

CPS'd version of byteStringOf. Can be more efficient, because the result is more eagerly unboxed by GHC. It's more efficient to use spanOf or spanned instead.

inSpan :: Span -> Parser r e a -> Parser r e a Source #

Run a parser in a given input span. The input position and the Int state is restored after the parser is finished, so inSpan does not consume input and has no side effect. Warning: this operation may crash if the given span points outside the current parsing buffer. It's always safe to use inSpan if the span comes from a previous spanned or spanOf call on the current input.

Position and span conversions

validPos :: ByteString -> Pos -> Bool Source #

Check whether a Pos points into a ByteString.

posLineCols :: ByteString -> [Pos] -> [(Int, Int)] Source #

Compute corresponding line and column numbers for each Pos in a list. Throw an error on invalid positions. Note: computing lines and columns may traverse the ByteString, but it traverses it only once regardless of the length of the position list.

unsafeSpanToByteString :: Span -> Parser r e ByteString Source #

Create a ByteString from a Span. The result is invalid is the Span points outside the current buffer, or if the Span start is greater than the end position.

mkPos :: ByteString -> (Int, Int) -> Pos Source #

Create a Pos from a line and column number. Throws an error on out-of-bounds line and column numbers.

lines :: ByteString -> [String] Source #

Break an UTF-8-coded ByteString to lines. Throws an error on invalid input. This is mostly useful for grabbing specific source lines for displaying error messages.

Getting the rest of the input

takeLine :: Parser r e String Source #

Parse the rest of the current line as a String. Assumes UTF-8 encoding, throws an error if the encoding is invalid.

traceLine :: Parser r e String Source #

Parse the rest of the current line as a String, but restore the parsing state. Assumes UTF-8 encoding. This can be used for debugging.

takeRest :: Parser r e String Source #

Take the rest of the input as a String. Assumes UTF-8 encoding.

traceRest :: Parser r e String Source #

Get the rest of the input as a String, but restore the parsing state. Assumes UTF-8 encoding. This can be used for debugging.

Internal functions

ensureBytes# :: Int -> Parser r e () Source #

Check that the input has at least the given number of bytes.

scan8# :: Word -> Parser r e () Source #

Unsafely read a concrete byte from the input. It's not checked that the input has enough bytes.

scan16# :: Word -> Parser r e () Source #

Unsafely read two concrete bytes from the input. It's not checked that the input has enough bytes.

scan32# :: Word -> Parser r e () Source #

Unsafely read four concrete bytes from the input. It's not checked that the input has enough bytes.

scan64# :: Word -> Parser r e () Source #

Unsafely read eight concrete bytes from the input. It's not checked that the input has enough bytes.

scanAny8# :: Parser r e Word8 Source #

Unsafely read and return a byte from the input. It's not checked that the input is non-empty.

scanBytes# :: [Word8] -> Q Exp Source #

Template function, creates a Parser r e () which unsafely scans a given sequence of bytes.

setBack# :: Int -> Parser r e () Source #

Decrease the current input position by the given number of bytes.

packUTF8 :: String -> ByteString Source #

Convert a String to an UTF-8-coded ByteString.