Copyright | (c) 2020 Composewell Technologies |
---|---|
License | BSD-3-Clause |
Maintainer | streamly@composewell.com |
Stability | experimental |
Portability | GHC |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Streaming and backtracking parsers.
Parsers just extend folds. Please read the Fold
design notes in
Streamly.Internal.Data.Fold.Type for background on the design.
Parser Design
The Parser
type or a parsing fold is a generalization of the Fold
type.
The Fold
type always succeeds on each input. Therefore, it does not need
to buffer the input. In contrast, a Parser
may fail and backtrack to
replay the input again to explore another branch of the parser. Therefore,
it needs to buffer the input. Therefore, a Parser
is a fold with some
additional requirements. To summarize, unlike a Fold
, a Parser
:
- may not generate a new value of the accumulator on every input, it may generate a new accumulator only after consuming multiple input elements (e.g. takeEQ).
- on success may return some unconsumed input (e.g. takeWhile)
- may fail and return all input without consuming it (e.g. satisfy)
- backtrack and start inspecting the past input again (e.g. alt)
These use cases require buffering and replaying of input. To facilitate
this, the step function of the Fold
is augmented to return the next state
of the fold along with a command tag using a Step
functor, the tag tells
the fold driver to manipulate the future input as the parser wishes. The
Step
functor provides the following commands to the fold driver
corresponding to the use cases outlined in the previous para:
Continue
: buffer the current input and optionally go back to a previous position in the streamPartial
: buffer the current input and optionally go back to a previous position in the stream, drop the buffer before that position.Done
: parser succeeded, returns how much input was leftoverError
: indicates that the parser has failed without a result
How a Parser Works?
A parser is just like a fold, it keeps consuming inputs from the stream and accumulating them in an accumulator. The accumulator of the parser could be a singleton value or it could be a collection of values e.g. a list.
The parser may build a new output value from multiple input items. When it
consumes an input item but needs more input to build a complete output item
it uses Continue 0 s
, yielding the intermediate state s
and asking the
driver to provide more input. When the parser determines that a new output
value is complete it can use a Done n b
to terminate the parser with n
items of input unused and the final value of the accumulator returned as
b
. If at any time the parser determines that the parse has failed it can
return Error err
.
A parser building a collection of values (e.g. a list) can use the Partial
constructor whenever a new item in the output collection is generated. If a
parser building a collection of values has yielded at least one value then
it considered successful and cannot fail after that. In the current
implementation, this is not automatically enforced, there is a rule that the
parser MUST use only Done
for termination after the first Partial
, it
cannot use Error
. It may be possible to change the implementation so that
this rule is not required, but there may be some performance cost to it.
takeWhile
and
some
combinators are good examples of
efficient implementations using all features of this representation. It is
possible to idiomatically build a collection of parsed items using a
singleton parser and Alternative
instance instead of using a
multi-yield parser. However, this implementation is amenable to stream
fusion and can therefore be much faster.
Error Handling
When a parser's step
function is invoked it may terminate by either a
Done
or an Error
return value. In an Alternative
composition an error
return can make the composed parser backtrack and try another parser.
If the stream stops before a parser could terminate then we use the
extract
function of the parser to retrieve the last yielded value of the
parser. If the parser has yielded at least one value then extract
MUST
return a value without throwing an error, otherwise it uses the ParseError
exception to throw an error.
We chose the exception throwing mechanism for extract
instead of using an
explicit error return via an Either
type for keeping the interface simple
as most of the time we do not need to catch the error in intermediate
layers. Note that we cannot use exception throwing mechanism in step
function because of performance reasons. Error
constructor in that case
allows loop fusion and better performance.
Future Work
It may make sense to move "takeWhile" type of parsers, which cannot fail but need some lookahead, to splitting folds. This will allow such combinators to be accepted where we need an unfailing Fold type.
Based on application requirements it should be possible to design even a richer interface to manipulate the input stream/buffer. For example, we could randomly seek into the stream in the forward or reverse directions or we can even seek to the end or from the end or seek from the beginning.
We can distribute and scan/parse a stream using both folds and parsers and merge the resulting streams using different merge strategies (e.g. interleaving or serial).
Synopsis
- data Initial s b
- data Step s b
- data Parser m a b = forall s. Parser (s -> a -> m (Step s b)) (m (Initial s b)) (s -> m b)
- newtype ParseError = ParseError String
- rmapM :: Monad m => (b -> m c) -> Parser m a b -> Parser m a c
- parser :: Monad m => (b -> a -> Step b b) -> Initial b b -> Parser m a b
- parserM :: Monad m => (b -> a -> m (Step b b)) -> m (Initial b b) -> Parser m a b
- fromPure :: Monad m => b -> Parser m a b
- fromEffect :: Monad m => m b -> Parser m a b
- serialWith :: MonadThrow m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c
- split_ :: MonadThrow m => Parser m x a -> Parser m x b -> Parser m x b
- die :: MonadThrow m => String -> Parser m a b
- dieM :: MonadThrow m => m String -> Parser m a b
- splitSome :: MonadCatch m => Parser m a b -> Fold m b c -> Parser m a c
- splitMany :: MonadCatch m => Parser m a b -> Fold m b c -> Parser m a c
- splitManyPost :: MonadCatch m => Parser m a b -> Fold m b c -> Parser m a c
- alt :: Monad m => Parser m x a -> Parser m x a -> Parser m x a
- concatMap :: MonadThrow m => (b -> Parser m a c) -> Parser m a b -> Parser m a c
- noErrorUnsafeSplit_ :: MonadThrow m => Parser m x a -> Parser m x b -> Parser m x b
- noErrorUnsafeSplitWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c
- noErrorUnsafeConcatMap :: MonadThrow m => (b -> Parser m a c) -> Parser m a b -> Parser m a c
Types
The type of a Parser'
s initial action.
Internal
IPartial !s | Wait for step function to be called with state |
IDone !b | Return a result right away without an input. |
IError String | Return an error right away without an input. |
The return type of a Parser
step.
The parse operation feeds the input stream to the parser one element at a
time, representing a parse Step
. The parser may or may not consume the
item and returns a result. If the result is Partial
we can either extract
the result or feed more input to the parser. If the result is Continue
, we
must feed more input in order to get a result. If the parser returns Done
then the parser can no longer take any more input.
If the result is Continue
, the parse operation retains the input in a
backtracking buffer, in case the parser may ask to backtrack in future.
Whenever a 'Partial n' result is returned we first backtrack by n
elements
in the input and then release any remaining backtracking buffer. Similarly,
'Continue n' backtracks to n
elements before the current position and
starts feeding the input from that point for future invocations of the
parser.
If parser is not yet done, we can use the extract
operation on the state
of the parser to extract a result. If the parser has not yet yielded a
result, the operation fails with a ParseError
exception. If the parser
yielded a Partial
result in the past the last partial result is returned.
Therefore, if a parser yields a partial result once it cannot fail later on.
The parser can never backtrack beyond the position where the last partial result left it at. The parser must ensure that the backtrack position is always after that.
Pre-release
Partial Int s | Partial result with an optional backtrack request.
|
Continue Int s | Need more input with an optional backtrack request.
|
Done Int b | Done with leftover input count and result.
|
Error String | Parser failed without generating any output. The parsing operation may backtrack to the beginning and try another alternative. |
A parser is a fold that can fail and is represented as Parser step
initial extract
. Before we drive a parser we call the initial
action to
retrieve the initial state of the fold. The parser driver invokes step
with the state returned by the previous step and the next input element. It
results into a new state and a command to the driver represented by Step
type. The driver keeps invoking the step function until it stops or fails.
At any point of time the driver can call extract
to inspect the result of
the fold. If the parser hits the end of input extract
is called.
It may result in an error or an output value.
Pre-release
Instances
(MonadThrow m, MonadReader r m, MonadCatch m) => MonadReader r (Parser m a) Source # | |
(MonadThrow m, MonadState s m) => MonadState s (Parser m a) Source # | |
(MonadThrow m, MonadIO m) => MonadIO (Parser m a) Source # | |
Defined in Streamly.Internal.Data.Parser.ParserD.Type | |
MonadCatch m => Alternative (Parser m a) Source # |
Note: The implementation of
|
MonadThrow m => Applicative (Parser m a) Source # |
|
Defined in Streamly.Internal.Data.Parser.ParserD.Type | |
Functor m => Functor (Parser m a) Source # | |
MonadThrow m => Monad (Parser m a) Source # | See documentation of |
MonadCatch m => MonadPlus (Parser m a) Source # | See documentation of |
newtype ParseError Source #
This exception is used for two purposes:
- When a parser ultimately fails, the user of the parser is intimated via this exception.
- When the "extract" function of a parser needs to throw an error.
Pre-release
Instances
Exception ParseError Source # | |
Defined in Streamly.Internal.Data.Parser.ParserD.Type toException :: ParseError -> SomeException # fromException :: SomeException -> Maybe ParseError # displayException :: ParseError -> String # | |
Show ParseError Source # | |
Defined in Streamly.Internal.Data.Parser.ParserD.Type showsPrec :: Int -> ParseError -> ShowS # show :: ParseError -> String # showList :: [ParseError] -> ShowS # |
rmapM :: Monad m => (b -> m c) -> Parser m a b -> Parser m a c Source #
Map a monadic function on the output of a parser.
Pre-release
Constructors
parser :: Monad m => (b -> a -> Step b b) -> Initial b b -> Parser m a b Source #
Make a parser using a parsing step and a starting value.
Pre-release
parserM :: Monad m => (b -> a -> m (Step b b)) -> m (Initial b b) -> Parser m a b Source #
Like parser
but with a monadic step function.
>>>
parserM step initial = ParserD.Parser step initial return
Pre-release
fromEffect :: Monad m => m b -> Parser m a b Source #
See fromEffect
.
Pre-release
serialWith :: MonadThrow m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #
See serialWith
.
Note: this implementation of serialWith is fast because of stream fusion but has quadratic time complexity, because each composition adds a new branch that each subsequent parse's input element has to go through, therefore, it cannot scale to a large number of compositions. After around 100 compositions the performance starts dipping rapidly beyond a CPS style unfused implementation.
Pre-release
split_ :: MonadThrow m => Parser m x a -> Parser m x b -> Parser m x b Source #
See split_
.
Pre-release
splitSome :: MonadCatch m => Parser m a b -> Fold m b c -> Parser m a c Source #
See documentation of some
.
Pre-release
splitMany :: MonadCatch m => Parser m a b -> Fold m b c -> Parser m a c Source #
See documentation of many
.
Pre-release
splitManyPost :: MonadCatch m => Parser m a b -> Fold m b c -> Parser m a c Source #
Like splitMany, but inner fold emits an output at the end even if no input is received.
Internal
concatMap :: MonadThrow m => (b -> Parser m a c) -> Parser m a b -> Parser m a c Source #
See concatMap
.
Pre-release
noErrorUnsafeSplit_ :: MonadThrow m => Parser m x a -> Parser m x b -> Parser m x b Source #
noErrorUnsafeSplitWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #
Works correctly only if the first parser is guaranteed to never fail.
noErrorUnsafeConcatMap :: MonadThrow m => (b -> Parser m a c) -> Parser m a b -> Parser m a c Source #