streamly-0.8.0: Dataflow programming and declarative concurrency
Copyright(c) 2020 Composewell Technologies
LicenseBSD-3-Clause
Maintainerstreamly@composewell.com
Stabilityexperimental
PortabilityGHC
Safe HaskellNone
LanguageHaskell2010

Streamly.Internal.Data.Parser.ParserD.Type

Description

Streaming and backtracking parsers.

Parsers just extend folds. Please read the Fold design notes in Streamly.Internal.Data.Fold.Type for background on the design.

Parser Design

The Parser type or a parsing fold is a generalization of the Fold type. The Fold type always succeeds on each input. Therefore, it does not need to buffer the input. In contrast, a Parser may fail and backtrack to replay the input again to explore another branch of the parser. Therefore, it needs to buffer the input. Therefore, a Parser is a fold with some additional requirements. To summarize, unlike a Fold, a Parser:

  1. may not generate a new value of the accumulator on every input, it may generate a new accumulator only after consuming multiple input elements (e.g. takeEQ).
  2. on success may return some unconsumed input (e.g. takeWhile)
  3. may fail and return all input without consuming it (e.g. satisfy)
  4. backtrack and start inspecting the past input again (e.g. alt)

These use cases require buffering and replaying of input. To facilitate this, the step function of the Fold is augmented to return the next state of the fold along with a command tag using a Step functor, the tag tells the fold driver to manipulate the future input as the parser wishes. The Step functor provides the following commands to the fold driver corresponding to the use cases outlined in the previous para:

  1. Continue: buffer the current input and optionally go back to a previous position in the stream
  2. Partial: buffer the current input and optionally go back to a previous position in the stream, drop the buffer before that position.
  3. Done: parser succeeded, returns how much input was leftover
  4. Error: indicates that the parser has failed without a result

How a Parser Works?

A parser is just like a fold, it keeps consuming inputs from the stream and accumulating them in an accumulator. The accumulator of the parser could be a singleton value or it could be a collection of values e.g. a list.

The parser may build a new output value from multiple input items. When it consumes an input item but needs more input to build a complete output item it uses Continue 0 s, yielding the intermediate state s and asking the driver to provide more input. When the parser determines that a new output value is complete it can use a Done n b to terminate the parser with n items of input unused and the final value of the accumulator returned as b. If at any time the parser determines that the parse has failed it can return Error err.

A parser building a collection of values (e.g. a list) can use the Partial constructor whenever a new item in the output collection is generated. If a parser building a collection of values has yielded at least one value then it considered successful and cannot fail after that. In the current implementation, this is not automatically enforced, there is a rule that the parser MUST use only Done for termination after the first Partial, it cannot use Error. It may be possible to change the implementation so that this rule is not required, but there may be some performance cost to it.

takeWhile and some combinators are good examples of efficient implementations using all features of this representation. It is possible to idiomatically build a collection of parsed items using a singleton parser and Alternative instance instead of using a multi-yield parser. However, this implementation is amenable to stream fusion and can therefore be much faster.

Error Handling

When a parser's step function is invoked it may terminate by either a Done or an Error return value. In an Alternative composition an error return can make the composed parser backtrack and try another parser.

If the stream stops before a parser could terminate then we use the extract function of the parser to retrieve the last yielded value of the parser. If the parser has yielded at least one value then extract MUST return a value without throwing an error, otherwise it uses the ParseError exception to throw an error.

We chose the exception throwing mechanism for extract instead of using an explicit error return via an Either type for keeping the interface simple as most of the time we do not need to catch the error in intermediate layers. Note that we cannot use exception throwing mechanism in step function because of performance reasons. Error constructor in that case allows loop fusion and better performance.

Future Work

It may make sense to move "takeWhile" type of parsers, which cannot fail but need some lookahead, to splitting folds. This will allow such combinators to be accepted where we need an unfailing Fold type.

Based on application requirements it should be possible to design even a richer interface to manipulate the input stream/buffer. For example, we could randomly seek into the stream in the forward or reverse directions or we can even seek to the end or from the end or seek from the beginning.

We can distribute and scan/parse a stream using both folds and parsers and merge the resulting streams using different merge strategies (e.g. interleaving or serial).

Synopsis

Documentation

data Initial s b Source #

The type of a Parser's initial action.

Internal

Constructors

IPartial !s

Wait for step function to be called with state s.

IDone !b

Return a result right away without an input.

IError String

Return an error right away without an input.

Instances

Instances details
Bifunctor Initial Source #

first maps on IPartial and second maps on IDone.

Internal

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

bimap :: (a -> b) -> (c -> d) -> Initial a c -> Initial b d #

first :: (a -> b) -> Initial a c -> Initial b c #

second :: (b -> c) -> Initial a b -> Initial a c #

Functor (Initial s) Source #

Maps a function over the result held by IDone.

fmap = second

Internal

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

fmap :: (a -> b) -> Initial s a -> Initial s b #

(<$) :: a -> Initial s b -> Initial s a #

data Step s b Source #

The return type of a Parser step.

The parse operation feeds the input stream to the parser one element at a time, representing a parse Step. The parser may or may not consume the item and returns a result. If the result is Partial we can either extract the result or feed more input to the parser. If the result is Continue, we must feed more input in order to get a result. If the parser returns Done then the parser can no longer take any more input.

If the result is Continue, the parse operation retains the input in a backtracking buffer, in case the parser may ask to backtrack in future. Whenever a 'Partial n' result is returned we first backtrack by n elements in the input and then release any remaining backtracking buffer. Similarly, 'Continue n' backtracks to n elements before the current position and starts feeding the input from that point for future invocations of the parser.

If parser is not yet done, we can use the extract operation on the state of the parser to extract a result. If the parser has not yet yielded a result, the operation fails with a ParseError exception. If the parser yielded a Partial result in the past the last partial result is returned. Therefore, if a parser yields a partial result once it cannot fail later on.

The parser can never backtrack beyond the position where the last partial result left it at. The parser must ensure that the backtrack position is always after that.

Pre-release

Constructors

Partial Int s

Partial result with an optional backtrack request.

Partial count state means a partial result is available which can be extracted successfully, state is the opaque state of the parser to be supplied to the next invocation of the step operation. The current input position is reset to count elements back and any input before that is dropped from the backtrack buffer.

Continue Int s

Need more input with an optional backtrack request.

Continue count state means the parser has consumed the current input but no new result is generated, state is the next state of the parser. The current input is retained in the backtrack buffer and the input position is reset to count elements back.

Done Int b

Done with leftover input count and result.

Done count result means the parser has finished, it will accept no more input, last count elements from the input are unused and the result of the parser is in result.

Error String

Parser failed without generating any output.

The parsing operation may backtrack to the beginning and try another alternative.

Instances

Instances details
Functor (Step s) Source # 
Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

fmap :: (a -> b) -> Step s a -> Step s b #

(<$) :: a -> Step s b -> Step s a #

data Parser m a b Source #

A parser is a fold that can fail and is represented as Parser step initial extract. Before we drive a parser we call the initial action to retrieve the initial state of the fold. The parser driver invokes step with the state returned by the previous step and the next input element. It results into a new state and a command to the driver represented by Step type. The driver keeps invoking the step function until it stops or fails. At any point of time the driver can call extract to inspect the result of the fold. It may result in an error or an output value.

Pre-release

Constructors

forall s. Parser (s -> a -> m (Step s b)) (m (Initial s b)) (s -> m b) 

Instances

Instances details
MonadThrow m => Monad (Parser m a) Source #

See documentation of Parser.

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

(>>=) :: Parser m a a0 -> (a0 -> Parser m a b) -> Parser m a b #

(>>) :: Parser m a a0 -> Parser m a b -> Parser m a b #

return :: a0 -> Parser m a a0 #

Functor m => Functor (Parser m a) Source # 
Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

fmap :: (a0 -> b) -> Parser m a a0 -> Parser m a b #

(<$) :: a0 -> Parser m a b -> Parser m a a0 #

MonadThrow m => Applicative (Parser m a) Source #

Applicative form of serialWith.

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

pure :: a0 -> Parser m a a0 #

(<*>) :: Parser m a (a0 -> b) -> Parser m a a0 -> Parser m a b #

liftA2 :: (a0 -> b -> c) -> Parser m a a0 -> Parser m a b -> Parser m a c #

(*>) :: Parser m a a0 -> Parser m a b -> Parser m a b #

(<*) :: Parser m a a0 -> Parser m a b -> Parser m a a0 #

MonadCatch m => Alternative (Parser m a) Source #

Alternative instance using alt.

Note: The implementation of <|> is not lazy in the second argument. The following code will fail:

>>> Stream.parse (Parser.satisfy (> 0) <|> undefined) $ Stream.fromList [1..10]
1
Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

empty :: Parser m a a0 #

(<|>) :: Parser m a a0 -> Parser m a a0 -> Parser m a a0 #

some :: Parser m a a0 -> Parser m a [a0] #

many :: Parser m a a0 -> Parser m a [a0] #

MonadCatch m => MonadPlus (Parser m a) Source #

See documentation of Parser.

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

mzero :: Parser m a a0 #

mplus :: Parser m a a0 -> Parser m a a0 -> Parser m a a0 #

newtype ParseError Source #

This exception is used for two purposes:

  • When a parser ultimately fails, the user of the parser is intimated via this exception.
  • When the "extract" function of a parser needs to throw an error.

Pre-release

Constructors

ParseError String 

rmapM :: Monad m => (b -> m c) -> Parser m a b -> Parser m a c Source #

Map a monadic function on the output of a parser.

Pre-release

fromPure :: Monad m => b -> Parser m a b Source #

See fromPure.

Pre-release

fromEffect :: Monad m => m b -> Parser m a b Source #

See fromEffect.

Pre-release

serialWith :: MonadThrow m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #

See serialWith.

Note: this implementation of serialWith is fast because of stream fusion but has quadratic time complexity, because each composition adds a new branch that each subsequent parse's input element has to go through, therefore, it cannot scale to a large number of compositions. After around 100 compositions the performance starts dipping rapidly beyond a CPS style unfused implementation.

Pre-release

split_ :: MonadThrow m => Parser m x a -> Parser m x b -> Parser m x b Source #

See split_.

Pre-release

die :: MonadThrow m => String -> Parser m a b Source #

See die.

Pre-release

dieM :: MonadThrow m => m String -> Parser m a b Source #

See dieM.

Pre-release

splitSome :: MonadCatch m => Parser m a b -> Fold m b c -> Parser m a c Source #

See documentation of some.

Pre-release

splitMany :: MonadCatch m => Parser m a b -> Fold m b c -> Parser m a c Source #

See documentation of many.

Pre-release

splitManyPost :: MonadCatch m => Parser m a b -> Fold m b c -> Parser m a c Source #

Like splitMany, but inner fold emits an output at the end even if no input is received.

Internal

alt :: Monad m => Parser m x a -> Parser m x a -> Parser m x a Source #

See alt.

Pre-release

concatMap :: MonadThrow m => (b -> Parser m a c) -> Parser m a b -> Parser m a c Source #

See concatMap.

Pre-release

noErrorUnsafeSplit_ :: MonadThrow m => Parser m x a -> Parser m x b -> Parser m x b Source #

noErrorUnsafeSplitWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #

Works correctly only if the first parser is guaranteed to never fail.

noErrorUnsafeConcatMap :: MonadThrow m => (b -> Parser m a c) -> Parser m a b -> Parser m a c Source #