Copyright | (c) 2020 Composewell Technologies |
---|---|
License | BSD3 |
Maintainer | streamly@composewell.com |
Stability | experimental |
Portability | GHC |
Safe Haskell | None |
Language | Haskell2010 |
Fast streaming parsers.
Applicative
and Alternative
type class based combinators from the
parser-combinators
package can also be used with the Parser
type. However, there are two
important differences between parser-combinators
and the equivalent ones
provided in this module in terms of performance:
1) parser-combinators
use plain Haskell lists to collect the results, in a
strict Monad like IO, the results are necessarily buffered before they can
be consumed. This may not perform optimally in streaming applications
processing large amounts of data. Equivalent combinators in this module can
consume the results of parsing using a Fold
, thus providing a scalability
and a generic consumer.
2) Several combinators in this module can be many times faster because of
stream fusion. For example, many
combinator
in this module is much faster than the many
combinator
of Alternative
type class.
Failing parsers in this module throw the ParseError
exception.
Synopsis
- data Parser m a b = Parser (s -> a -> m (Step s b)) (m s) (s -> m b)
- fromFold :: Monad m => Fold m a b -> Parser m a b
- any :: Monad m => (a -> Bool) -> Parser m a Bool
- all :: Monad m => (a -> Bool) -> Parser m a Bool
- yield :: Monad m => b -> Parser m a b
- yieldM :: Monad m => m b -> Parser m a b
- die :: MonadThrow m => String -> Parser m a b
- dieM :: MonadThrow m => m String -> Parser m a b
- peek :: MonadThrow m => Parser m a a
- eof :: Monad m => Parser m a ()
- satisfy :: MonadThrow m => (a -> Bool) -> Parser m a a
- take :: Monad m => Int -> Fold m a b -> Parser m a b
- takeEQ :: MonadThrow m => Int -> Fold m a b -> Parser m a b
- takeGE :: MonadThrow m => Int -> Fold m a b -> Parser m a b
- lookAhead :: MonadThrow m => Parser m a b -> Parser m a b
- takeWhile :: Monad m => (a -> Bool) -> Fold m a b -> Parser m a b
- takeWhile1 :: MonadThrow m => (a -> Bool) -> Fold m a b -> Parser m a b
- sliceSepBy :: Monad m => (a -> Bool) -> Fold m a b -> Parser m a b
- sliceSepByMax :: Monad m => (a -> Bool) -> Int -> Fold m a b -> Parser m a b
- sliceEndWith :: (a -> Bool) -> Fold m a b -> Parser m a b
- sliceBeginWith :: (a -> Bool) -> Fold m a b -> Parser m a b
- wordBy :: (a -> Bool) -> Fold m a b -> Parser m a b
- groupBy :: (a -> a -> Bool) -> Fold m a b -> Parser m a b
- eqBy :: MonadThrow m => (a -> a -> Bool) -> [a] -> Parser m a ()
- splitWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c
- teeWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c
- teeWithFst :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c
- teeWithMin :: (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c
- deintercalate :: Fold m a y -> Parser m x a -> Fold m b z -> Parser m x b -> Parser m x (y, z)
- shortest :: Monad m => Parser m x a -> Parser m x a -> Parser m x a
- longest :: MonadCatch m => Parser m x a -> Parser m x a -> Parser m x a
- sequence :: Fold m b c -> t (Parser m a b) -> Parser m a c
- count :: Int -> Fold m b c -> Parser m a b -> Parser m a c
- countBetween :: Int -> Int -> Fold m b c -> Parser m a b -> Parser m a c
- many :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c
- some :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c
- manyTill :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a x -> Parser m a c
- choice :: t (Parser m a b) -> Parser m a b
Documentation
A parser is a fold that can fail and is represented as Parser step
initial extract
. Before we drive a parser we call the initial
action to
retrieve the initial state of the fold. The parser driver invokes step
with the state returned by the previous step and the next input element. It
results into a new state and a command to the driver represented by Step
type. The driver keeps invoking the step function until it stops or fails.
At any point of time the driver can call extract
to inspect the result of
the fold. It may result in an error or an output value.
Internal
Instances
Monad m => Monad (Parser m a) Source # | Monad composition can be used for lookbehind parsers, we can make the future parses depend on the previously parsed values. If we have to parse "a9" or "9a" but not "99" or "aa" we can use the following parser: backtracking :: MonadCatch m => PR.Parser m Char String
backtracking =
sequence [PR.satisfy isDigit, PR.satisfy isAlpha]
We know that if the first parse resulted in a digit at the first place then
the second parse is going to fail. However, we waste that information and
parse the first character again in the second parse only to know that it is
not an alphabetic char. By using lookbehind in a data DigitOrAlpha = Digit Char | Alpha Char lookbehind :: MonadCatch m => PR.Parser m Char String lookbehind = do x1 <- Digit |
Functor m => Functor (Parser m a) Source # | |
Monad m => Applicative (Parser m a) Source # |
|
Defined in Streamly.Internal.Data.Parser.Types | |
MonadCatch m => Alternative (Parser m a) Source # |
Note: The implementation of
|
MonadCatch m => MonadPlus (Parser m a) Source # |
Internal |
Accumulators
fromFold :: Monad m => Fold m a b -> Parser m a b Source #
The resulting parse never terminates and never errors out.
any :: Monad m => (a -> Bool) -> Parser m a Bool Source #
>>>
S.parse (PR.any (== 0)) $ S.fromList [1,0,1]
> Right True
all :: Monad m => (a -> Bool) -> Parser m a Bool Source #
>>>
S.parse (PR.all (== 0)) $ S.fromList [1,0,1]
> Right False
yield :: Monad m => b -> Parser m a b Source #
A parser that always yields a pure value without consuming any input.
Internal
yieldM :: Monad m => m b -> Parser m a b Source #
A parser that always yields the result of an effectful action without consuming any input.
Internal
die :: MonadThrow m => String -> Parser m a b Source #
A parser that always fails with an error message without consuming any input.
Internal
dieM :: MonadThrow m => m String -> Parser m a b Source #
A parser that always fails with an effectful error message and without consuming any input.
Internal
Element parsers
peek :: MonadThrow m => Parser m a a Source #
Peek the head element of a stream, without consuming it. Fails if it encounters end of input.
>>>
S.parse ((,) <$> PR.peek <*> PR.satisfy (> 0)) $ S.fromList [1]
(1,1)
peek = lookAhead (satisfy True)
Internal
eof :: Monad m => Parser m a () Source #
Succeeds if we are at the end of input, fails otherwise.
>>>
S.parse ((,) <$> PR.satisfy (> 0) <*> PR.eof) $ S.fromList [1]
> (1,())
Internal
satisfy :: MonadThrow m => (a -> Bool) -> Parser m a a Source #
Returns the next element if it passes the predicate, fails otherwise.
>>>
S.parse (PR.satisfy (== 1)) $ S.fromList [1,0,1]
> 1
Internal
Sequence parsers
take :: Monad m => Int -> Fold m a b -> Parser m a b Source #
Take at most n
input elements and fold them using the supplied fold.
Stops after n
elements.
Never fails.
>>>
S.parse (PR.take 1 FL.toList) $ S.fromList [1]
[1]
S.chunksOf n f = S.splitParse (FL.take n f)
Internal
takeEQ :: MonadThrow m => Int -> Fold m a b -> Parser m a b Source #
Stops after taking exactly n
input elements.
- Stops - after
n
elements. - Fails - if the stream ends before it can collect
n
elements.
>>>
S.parse (PR.takeEQ 4 FL.toList) $ S.fromList [1,0,1]
> "takeEQ: Expecting exactly 4 elements, got 3"
Internal
takeGE :: MonadThrow m => Int -> Fold m a b -> Parser m a b Source #
Take at least n
input elements, but can collect more.
- Stops - never.
- Fails - if the stream ends before producing
n
elements.
>>>
S.parse (PR.takeGE 4 FL.toList) $ S.fromList [1,0,1]
> "takeGE: Expecting at least 4 elements, got only 3"
>>>
S.parse (PR.takeGE 4 FL.toList) $ S.fromList [1,0,1,0,1]
> [1,0,1,0,1]
Internal
takeWhile :: Monad m => (a -> Bool) -> Fold m a b -> Parser m a b Source #
Collect stream elements until an element fails the predicate. The element on which the predicate fails is returned back to the input stream.
- Stops - when the predicate fails.
- Fails - never.
>>>
S.parse (PR.takeWhile (== 0) FL.toList) $ S.fromList [0,0,1,0,1]
> [0,0]
We can implement a breakOn
using takeWhile
:
breakOn p = takeWhile (not p)
Internal
takeWhile1 :: MonadThrow m => (a -> Bool) -> Fold m a b -> Parser m a b Source #
Like takeWhile
but takes at least one element otherwise fails.
Internal
sliceSepBy :: Monad m => (a -> Bool) -> Fold m a b -> Parser m a b Source #
Collect stream elements until an element succeeds the predicate. Drop the element on which the predicate succeeded. The succeeding element is treated as an infix separator which is dropped from the output.
- Stops - when the predicate succeeds.
- Fails - never.
>>>
S.parse (PR.sliceSepBy (== 1) FL.toList) $ S.fromList [0,0,1,0,1]
> [0,0]
S.splitOn pred f = S.splitParse (PR.sliceSepBy pred f)
>>>
S.toList $ S.splitParse (PR.sliceSepBy (== 1) FL.toList) $ S.fromList [0,0,1,0,1]
> [[0,0],[0],[]]
Internal
sliceSepByMax :: Monad m => (a -> Bool) -> Int -> Fold m a b -> Parser m a b Source #
Split using a condition or a count whichever occurs first. This is a
hybrid of splitOn
and take
. The element on which the condition succeeds
is dropped.
Internal
sliceEndWith :: (a -> Bool) -> Fold m a b -> Parser m a b Source #
Collect stream elements until an element succeeds the predicate. Also take the element on which the predicate succeeded. The succeeding element is treated as a suffix separator which is kept in the output segement.
- Stops - when the predicate succeeds.
- Fails - never.
S.splitWithSuffix pred f = S.splitParse (PR.sliceEndWith pred f)
Unimplemented
sliceBeginWith :: (a -> Bool) -> Fold m a b -> Parser m a b Source #
Collect stream elements until an elements passes the predicate, return the last element on which the predicate succeeded back to the input stream. If the predicate succeeds on the first element itself then it is kept in the stream and we continue collecting. The succeeding element is treated as a prefix separator which is kept in the output segement.
- Stops - when the predicate succeeds in non-leading position.
- Fails - never.
S.splitWithPrefix pred f = S.splitParse (PR.sliceBeginWith pred f)
Unimplemented
wordBy :: (a -> Bool) -> Fold m a b -> Parser m a b Source #
Like splitOn
but strips leading, trailing, and repeated separators.
Therefore, ".a..b."
having .
as the separator would be parsed as
["a","b"]
. In other words, its like parsing words from whitespace
separated text.
- Stops - when it finds a word separator after a non-word element
- Fails - never.
S.wordsBy pred f = S.splitParse (PR.wordBy pred f)
Unimplemented
groupBy :: (a -> a -> Bool) -> Fold m a b -> Parser m a b Source #
groupBy cmp f $ S.fromList [a,b,c,...]
assigns the element a
to the
first group, then if a `cmp` b
is True
b
is also assigned to the same
group. If a `cmp` c
is True
then c
is also assigned to the same
group and so on. When the comparison fails a new group is started. Each
group is folded using the Fold
f
and the result of the fold is emitted
in the output stream.
- Stops - when the comparison fails.
- Fails - never.
S.groupsBy cmp f = S.splitParse (PR.groupBy cmp f)
Unimplemented
eqBy :: MonadThrow m => (a -> a -> Bool) -> [a] -> Parser m a () Source #
Match the given sequence of elements using the given comparison function.
Internal
Binary Combinators
Sequential Applicative
splitWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #
Sequential application. Apply two parsers sequentially to an input stream. The input is provided to the first parser, when it is done the remaining input is provided to the second parser. If both the parsers succeed their outputs are combined using the supplied function. The operation fails if any of the parsers fail.
This undoes an "append" of two streams, it splits the streams using two parsers and zips the results.
This implementation is strict in the second argument, therefore, the following will fail:
>>>
S.parse (PR.satisfy (> 0) *> undefined) $ S.fromList [1]
Internal
Parallel Applicatives
teeWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #
teeWith f p1 p2
distributes its input to both p1
and p2
until both
of them succeed or fail and combines their output using f
. The parser
succeeds if both the parsers succeed.
Internal
teeWithFst :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #
Like teeWith
but ends parsing and zips the results, if available,
whenever the first parser ends.
Internal
teeWithMin :: (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #
Like teeWith
but ends parsing and zips the results, if available,
whenever any of the parsers ends or fails.
Unimplemented
Sequential Interleaving
deintercalate :: Fold m a y -> Parser m x a -> Fold m b z -> Parser m x b -> Parser m x (y, z) Source #
Apply two parsers alternately to an input stream. The input stream is considered an interleaving of two patterns. The two parsers represent the two patterns.
This undoes a "gintercalate" of two streams.
Unimplemented
Parallel Alternatives
shortest :: Monad m => Parser m x a -> Parser m x a -> Parser m x a Source #
Shortest alternative. Apply both parsers in parallel but choose the result from the one which consumed least input i.e. take the shortest succeeding parse.
Internal
longest :: MonadCatch m => Parser m x a -> Parser m x a -> Parser m x a Source #
Longest alternative. Apply both parsers in parallel but choose the result from the one which consumed more input i.e. take the longest succeeding parse.
Internal
N-ary Combinators
Sequential Collection
sequence :: Fold m b c -> t (Parser m a b) -> Parser m a c Source #
sequence f t
collects sequential parses of parsers in the container t
using the fold f
. Fails if the input ends or any of the parsers fail.
Unimplemented
Sequential Repetition
count :: Int -> Fold m b c -> Parser m a b -> Parser m a c Source #
count n f p
collects exactly n
sequential parses of parser p
using
the fold f
. Fails if the input ends or the parser fails before n
results are collected.
Unimplemented
countBetween :: Int -> Int -> Fold m b c -> Parser m a b -> Parser m a c Source #
countBetween m n f p
collects between m
and n
sequential parses of
parser p
using the fold f
. Stop after collecting n
results. Fails if
the input ends or the parser fails before m
results are collected.
Unimplemented
some :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c Source #
Collect one or more parses. Apply the supplied parser repeatedly on the input stream and accumulate the parse results as long as the parser succeeds, stop when it fails. This parser fails if not even one result is collected.
Compare with some
.
Internal
manyTill :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a x -> Parser m a c Source #
manyTill f collect test
tries the parser test
on the input, if test
fails it backtracks and tries collect
, after collect
succeeds test
is
tried again and so on. The parser stops when test
succeeds. The output of
test
is discarded and the output of collect
is accumulated by the
supplied fold. The parser fails if collect
fails.
Internal