Copyright | (c) 2020 Composewell Technologies |
---|---|
License | BSD-3-Clause |
Maintainer | streamly@composewell.com |
Stability | pre-release |
Portability | GHC |
Safe Haskell | None |
Language | Haskell2010 |
Fast, composable stream consumers with ability to terminate, backtrack and fail, supporting stream fusion. Parsers are a natural extension of Streamly.Data.Fold. Parsers and folds can be interconverted.
Please refer to Streamly.Internal.Data.Parser for functions that have not yet been released.
Synopsis
- data Parser a m b
- fromFold :: Monad m => Fold m a b -> Parser a m b
- fromPure :: Monad m => b -> Parser a m b
- fromEffect :: Monad m => m b -> Parser a m b
- die :: Monad m => String -> Parser a m b
- peek :: Monad m => Parser a m a
- eof :: Monad m => Parser a m ()
- one :: Monad m => Parser a m a
- oneOf :: (Monad m, Eq a, Foldable f) => f a -> Parser a m a
- noneOf :: (Monad m, Eq a, Foldable f) => f a -> Parser a m a
- satisfy :: Monad m => (a -> Bool) -> Parser a m a
- streamEqBy :: Monad m => (a -> a -> Bool) -> Stream m a -> Parser a m ()
- listEqBy :: Monad m => (a -> a -> Bool) -> [a] -> Parser a m [a]
- listEq :: (Monad m, Eq a) => [a] -> Parser a m [a]
- lmap :: (a -> b) -> Parser b m r -> Parser a m r
- lmapM :: Monad m => (a -> m b) -> Parser b m r -> Parser a m r
- rmapM :: Monad m => (b -> m c) -> Parser a m b -> Parser a m c
- filter :: Monad m => (a -> Bool) -> Parser a m b -> Parser a m b
- lookAhead :: Monad m => Parser a m b -> Parser a m b
- takeEQ :: Monad m => Int -> Fold m a b -> Parser a m b
- takeWhile :: Monad m => (a -> Bool) -> Fold m a b -> Parser a m b
- takeWhile1 :: Monad m => (a -> Bool) -> Fold m a b -> Parser a m b
- dropWhile :: Monad m => (a -> Bool) -> Parser a m ()
- wordBy :: Monad m => (a -> Bool) -> Fold m a b -> Parser a m b
- groupBy :: Monad m => (a -> a -> Bool) -> Fold m a b -> Parser a m b
- wordWithQuotes :: (Monad m, Eq a) => Bool -> (a -> a -> Maybe a) -> a -> (a -> Maybe a) -> (a -> Bool) -> Fold m a b -> Parser a m b
- many :: Monad m => Parser a m b -> Fold m b c -> Parser a m c
- some :: Monad m => Parser a m b -> Fold m b c -> Parser a m c
- manyTill :: Monad m => Parser a m b -> Parser a m x -> Fold m b c -> Parser a m c
- deintercalate :: Monad m => Parser a m x -> Parser a m y -> Fold m (Either x y) z -> Parser a m z
Setup
To execute the code examples provided in this module in ghci, please run the following commands first.
>>>
:m
>>>
import Control.Applicative ((<|>))
>>>
import Data.Bifunctor (second)
>>>
import Data.Char (isSpace)
>>>
import qualified Data.Foldable as Foldable
>>>
import qualified Data.Maybe as Maybe
>>>
import Streamly.Data.Fold (Fold)
>>>
import Streamly.Data.Parser (Parser)
>>>
import qualified Streamly.Data.Fold as Fold
>>>
import qualified Streamly.Data.Parser as Parser
>>>
import qualified Streamly.Data.Stream as Stream
For APIs that have not been released yet.
>>>
import qualified Streamly.Internal.Data.Fold as Fold
>>>
import qualified Streamly.Internal.Data.Parser as Parser
Overview
Several combinators in this module can be many times faster than CPS based
parsers because of stream fusion. For example,
many
combinator in this module is much
faster than the many
combinator of
Alternative
type class used by CPS based parsers.
The use of Alternative
type class, in parsers has another drawback.
Alternative based parsers use plain Haskell lists to collect the results. In
a strict Monad like IO, the results are necessarily buffered before they can
be consumed. This may not perform optimally in streaming applications
processing large amounts of data. Equivalent combinators in this module can
consume the results of parsing using a Fold
or another parser, thus
providing a scalable and composable consumer.
Note that these parsers do not report the error context (e.g. line number or column). This may be supported in future.
mtl instances are not provided. If the Parser
type is the top most layer
(which should be the case almost always) you can just use fromEffect
to
execute the lower layer monad effects.
Performance Notes
The Parser
type represents a stream consumer by composing state as data
which enables stream fusion. Stream fusion generates a tight loop without
any constructor allocations between the stages, providing C like performance
for the loop. Stream fusion works when multiple functions are combined in a
pipeline statically. Therefore, the operations in this module must be
inlined and must not be used recursively to allow for stream fusion. Note
that operations like sequence
, and asum
that compose pasrers using
recursion should be avoided with these parsers. You can use these with the
ParserK
module instead.
Using the Parser
type, parsing operations like one
, splitWith
etc.
degrade quadratically (O(n^2)) when combined many times. If you need to
combine these operations, say more than 8 times in a single loop, then you
should consider using the continuation style parser type ParserK
instead.
Also, if you need to use these operations in a recursive loop you should use
ParserK
instead.
The ParserK
type represents a stream consumer by composing function calls,
therefore, a function call overhead is incurred at each composition. It is
quite fast in general but may be a few times slower than a fused parser.
However, it allows for scalable dynamic composition especially parsers can
be used in recursive calls. Using the ParserK
type operations like
splitWith
provide linear (O(n)) performance with respect to the number of
compositions..
Parser
and ParserK
types can be interconverted.
Parser Type
A parser is a fold that can fail and is represented as Parser step
initial extract
. Before we drive a parser we call the initial
action to
retrieve the initial state of the fold. The parser driver invokes step
with the state returned by the previous step and the next input element. It
results into a new state and a command to the driver represented by Step
type. The driver keeps invoking the step function until it stops or fails.
At any point of time the driver can call extract
to inspect the result of
the fold. If the parser hits the end of input extract
is called.
It may result in an error or an output value.
Pre-release
Instances
Monad m => Monad (Parser a m) Source # | See documentation of Although this implementation allows stream fusion, it has quadratic
complexity, making it suitable only for a small number of compositions. As a
thumb rule use it for less than 8 compositions, use |
Functor m => Functor (Parser a m) Source # | |
Monad m => MonadFail (Parser a m) Source # | |
Defined in Streamly.Internal.Data.Parser.ParserD.Type | |
Monad m => Applicative (Parser a m) Source # |
|
Defined in Streamly.Internal.Data.Parser.ParserD.Type | |
(Monad m, MonadIO m) => MonadIO (Parser a m) Source # | |
Defined in Streamly.Internal.Data.Parser.ParserD.Type | |
Monad m => Alternative (Parser a m) Source # | Sequential alternative. The input is first passed to the first parser, and if it succeeds, the result is returned. However, if the first parser fails, the parser driver backtracks and tries the same input on the second parser, returning the result if it succeeds. Note: The implementation of
WARNING! this is not suitable for large scale use. As a thumb rule stream
fusion works well for less than 8 compositions of this operation, otherwise
consider using |
Parsers
From Folds
Without Input
fromPure :: Monad m => b -> Parser a m b Source #
A parser that always yields a pure value without consuming any input.
fromEffect :: Monad m => m b -> Parser a m b Source #
A parser that always yields the result of an effectful action without consuming any input.
die :: Monad m => String -> Parser a m b Source #
A parser that always fails with an error message without consuming any input.
peek :: Monad m => Parser a m a Source #
Peek the head element of a stream, without consuming it. Fails if it encounters end of input.
>>>
Stream.parse ((,) <$> Parser.peek <*> Parser.satisfy (> 0)) $ Stream.fromList [1]
Right (1,1)
peek = lookAhead (satisfy True)
eof :: Monad m => Parser a m () Source #
Succeeds if we are at the end of input, fails otherwise.
>>>
Stream.parse ((,) <$> Parser.satisfy (> 0) <*> Parser.eof) $ Stream.fromList [1]
Right (1,())
Element parsers
one :: Monad m => Parser a m a Source #
Consume one element from the head of the stream. Fails if it encounters end of input.
>>>
one = Parser.satisfy $ const True
oneOf :: (Monad m, Eq a, Foldable f) => f a -> Parser a m a Source #
Match any one of the elements in the supplied list.
>>>
oneOf xs = Parser.satisfy (`Foldable.elem` xs)
When performance matters a pattern matching predicate could be more
efficient than a Foldable
datatype:
let p x = case x ofa
-> Truee
-> True _ -> False in satisfy p
GHC may use a binary search instead of linear search in the list. Alternatively, you can also use an array instead of list for storage and search.
noneOf :: (Monad m, Eq a, Foldable f) => f a -> Parser a m a Source #
See performance notes in oneOf
.
>>>
noneOf xs = Parser.satisfy (`Foldable.notElem` xs)
satisfy :: Monad m => (a -> Bool) -> Parser a m a Source #
Returns the next element if it passes the predicate, fails otherwise.
>>>
Stream.parse (Parser.satisfy (== 1)) $ Stream.fromList [1,0,1]
Right 1
>>>
toMaybe f x = if f x then Just x else Nothing
>>>
satisfy f = Parser.maybe (toMaybe f)
Sequences
streamEqBy :: Monad m => (a -> a -> Bool) -> Stream m a -> Parser a m () Source #
Like listEqBy
but uses a stream instead of a list and does not return
the stream.
listEqBy :: Monad m => (a -> a -> Bool) -> [a] -> Parser a m [a] Source #
Match the given sequence of elements using the given comparison function. Returns the original sequence if successful.
Definition:
>>>
listEqBy cmp xs = Parser.streamEqBy cmp (Stream.fromList xs) *> Parser.fromPure xs
Examples:
>>>
Stream.parse (Parser.listEqBy (==) "string") $ Stream.fromList "string"
Right "string"
>>>
Stream.parse (Parser.listEqBy (==) "mismatch") $ Stream.fromList "match"
Left (ParseError "streamEqBy: mismtach occurred")
listEq :: (Monad m, Eq a) => [a] -> Parser a m [a] Source #
Match the input sequence with the supplied list and return it if successful.
>>>
listEq = Parser.listEqBy (==)
Combinators
Mapping on input
lmap :: (a -> b) -> Parser b m r -> Parser a m r Source #
lmap f parser
maps the function f
on the input of the parser.
>>>
Stream.parse (Parser.lmap (\x -> x * x) (Parser.fromFold Fold.sum)) (Stream.enumerateFromTo 1 100)
Right 338350
lmap = Parser.lmapM return
lmapM :: Monad m => (a -> m b) -> Parser b m r -> Parser a m r Source #
lmapM f parser
maps the monadic function f
on the input of the parser.
Map on output
rmapM :: Monad m => (b -> m c) -> Parser a m b -> Parser a m c Source #
rmapM f parser
maps the monadic function f
on the output of the parser.
>>>
rmap = fmap
Filtering
filter :: Monad m => (a -> Bool) -> Parser a m b -> Parser a m b Source #
Include only those elements that pass a predicate.
>>>
Stream.parse (Parser.filter (> 5) (Parser.fromFold Fold.sum)) $ Stream.fromList [1..10]
Right 40
Look Ahead
lookAhead :: Monad m => Parser a m b -> Parser a m b Source #
Run a parser without consuming the input.
Tokenize by length
takeEQ :: Monad m => Int -> Fold m a b -> Parser a m b Source #
Stops after taking exactly n
input elements.
- Stops - after consuming
n
elements. - Fails - if the stream or the collecting fold ends before it can collect
exactly
n
elements.
>>>
Stream.parse (Parser.takeEQ 2 Fold.toList) $ Stream.fromList [1,0,1]
Right [1,0]
>>>
Stream.parse (Parser.takeEQ 4 Fold.toList) $ Stream.fromList [1,0,1]
Left (ParseError "takeEQ: Expecting exactly 4 elements, input terminated on 3")
Tokenize by predicate
takeWhile :: Monad m => (a -> Bool) -> Fold m a b -> Parser a m b Source #
Collect stream elements until an element fails the predicate. The element on which the predicate fails is returned back to the input stream.
- Stops - when the predicate fails or the collecting fold stops.
- Fails - never.
>>>
Stream.parse (Parser.takeWhile (== 0) Fold.toList) $ Stream.fromList [0,0,1,0,1]
Right [0,0]
>>>
takeWhile cond f = Parser.takeWhileP cond (Parser.fromFold f)
We can implement a breakOn
using takeWhile
:
breakOn p = takeWhile (not p)
takeWhile1 :: Monad m => (a -> Bool) -> Fold m a b -> Parser a m b Source #
Like takeWhile
but takes at least one element otherwise fails.
>>>
takeWhile1 cond p = Parser.takeWhileP cond (Parser.takeBetween 1 maxBound p)
dropWhile :: Monad m => (a -> Bool) -> Parser a m () Source #
Drain the input as long as the predicate succeeds, running the effects and discarding the results.
This is also called skipWhile
in some parsing libraries.
>>>
dropWhile p = Parser.takeWhile p Fold.drain
wordBy :: Monad m => (a -> Bool) -> Fold m a b -> Parser a m b Source #
Like splitOn
but strips leading, trailing, and repeated separators.
Therefore, ".a..b."
having .
as the separator would be parsed as
["a","b"]
. In other words, its like parsing words from whitespace
separated text.
- Stops - when it finds a word separator after a non-word element
- Fails - never.
>>>
wordBy = Parser.wordFramedBy (const False) (const False) (const False)
S.wordsBy pred f = S.parseMany (PR.wordBy pred f)
Grouping
groupBy :: Monad m => (a -> a -> Bool) -> Fold m a b -> Parser a m b Source #
Given an input stream [a,b,c,...]
and a comparison function cmp
, the
parser assigns the element a
to the first group, then if a `cmp` b
is
True
b
is also assigned to the same group. If a `cmp` c
is True
then c
is also assigned to the same group and so on. When the comparison
fails the parser is terminated. Each group is folded using the Fold
f
and
the result of the fold is the result of the parser.
- Stops - when the comparison fails.
- Fails - never.
>>>
:{
runGroupsBy eq = Stream.fold Fold.toList . Stream.parseMany (Parser.groupBy eq Fold.toList) . Stream.fromList :}
>>>
runGroupsBy (<) []
[]
>>>
runGroupsBy (<) [1]
[Right [1]]
>>>
runGroupsBy (<) [3, 5, 4, 1, 2, 0]
[Right [3,5,4],Right [1,2],Right [0]]
Framing
:: (Monad m, Eq a) | |
=> Bool | Retain the quotes and escape chars in the output |
-> (a -> a -> Maybe a) | quote char -> escaped char -> translated char |
-> a | Matches an escape elem? |
-> (a -> Maybe a) | If left quote, return right quote, else Nothing. |
-> (a -> Bool) | Matches a word separator? |
-> Fold m a b | |
-> Parser a m b |
Quote and bracket aware word splitting with escaping. Like wordBy
but
word separators within specified quotes or brackets are ignored. Quotes and
escape characters can be processed. If the end quote is different from the
start quote it is called a bracket. The following quoting rules apply:
- In an unquoted string a character may be preceded by an escape character. The escape character is removed and the character following it is treated literally with no special meaning e.g. e.g. h e l l o is a single word, n is same as n.
- Any part of the word can be placed within quotes. Inside quotes all characters are treated literally with no special meaning. Quoting character itself cannot be used within quotes unless escape processing within quotes is applied to allow it.
- Optionally escape processing for quoted part can be specified. Escape character has no special meaning inside quotes unless it is followed by a character that has a escape translation specified, in that case the escape character is removed, and the specified translation is applied to the character following it. This can be used to escape the quoting character itself within quotes.
- There can be multiple quoting characters, when a quote starts, all other quoting characters within that quote lose any special meaning until the quote is closed.
- A starting quote char without an ending char generates a parse error. An ending bracket char without a corresponding bracket begin is ignored.
- Brackets can be nested.
We should note that unquoted and quoted escape processing are different. In unquoted part escape character is always removed. In quoted part it is removed only if followed by a special meaning character. This is consistent with how shell performs escape processing.
Splitting
many :: Monad m => Parser a m b -> Fold m b c -> Parser a m c Source #
Collect zero or more parses. Apply the supplied parser repeatedly on the input stream and push the parse results to a downstream fold.
Stops: when the downstream fold stops or the parser fails. Fails: never, produces zero or more results.
>>>
many = Parser.countBetween 0 maxBound
Compare with many
.
some :: Monad m => Parser a m b -> Fold m b c -> Parser a m c Source #
Collect one or more parses. Apply the supplied parser repeatedly on the input stream and push the parse results to a downstream fold.
Stops: when the downstream fold stops or the parser fails. Fails: if it stops without producing a single result.
>>>
some p f = Parser.manyP p (Parser.takeGE 1 f)
>>>
some = Parser.countBetween 1 maxBound
Compare with some
.
manyTill :: Monad m => Parser a m b -> Parser a m x -> Fold m b c -> Parser a m c Source #
manyTill chunking test f
tries the parser test
on the input, if test
fails it backtracks and tries chunking
, after chunking
succeeds test
is
tried again and so on. The parser stops when test
succeeds. The output of
test
is discarded and the output of chunking
is accumulated by the
supplied fold. The parser fails if chunking
fails.
Stops when the fold f
stops.
De-interleaving
deintercalate :: Monad m => Parser a m x -> Parser a m y -> Fold m (Either x y) z -> Parser a m z Source #
Apply two parsers alternately to an input stream. The input stream is considered an interleaving of two patterns. The two parsers represent the two patterns. Parsing starts at the first parser and stops at the first parser. It can be used to parse a infix style pattern e.g. p1 p2 p1 . Empty input or single parse of the first parser is accepted.
>>>
p1 = Parser.takeWhile1 (not . (== '+')) Fold.toList
>>>
p2 = Parser.satisfy (== '+')
>>>
p = Parser.deintercalate p1 p2 Fold.toList
>>>
Stream.parse p $ Stream.fromList ""
Right []>>>
Stream.parse p $ Stream.fromList "1"
Right [Left "1"]>>>
Stream.parse p $ Stream.fromList "1+"
Right [Left "1"]>>>
Stream.parse p $ Stream.fromList "1+2+3"
Right [Left "1",Right '+',Left "2",Right '+',Left "3"]