Copyright	(c) 2019 Composewell Technologies (c) 2013 Gabriel Gonzalez
License	BSD3
Maintainer	streamly@composewell.com
Stability	experimental
Portability	GHC
Safe Haskell	None
Language	Haskell2010

Streamly.Internal.Data.Fold.Types

Description

Stream Consumers

We can classify stream consumers in the following categories in order of increasing complexity and power:

Accumulators

These are the simplest folds that never fail and never terminate, they accumulate the input values forever and always remain partial and complete at the same time. It means that we can keep adding more input to them or at any time retrieve a consistent result. A sum operation is an example of an accumulator.

We can distribute an input stream to two or more accumulators using a tee style composition. Accumulators cannot be applied on a stream one after the other, which we call a split style composition, as the first one itself will never terminate, therefore, the next one will never get to run.

Splitters

Splitters are accumulators that can terminate. When applied on a stream splitters consume part of the stream, thereby, splitting it. Splitters can be used in a split style composition where one splitter can be applied after the other on an input stream. We can apply a splitter repeatedly on an input stream splitting and consuming it in fragments. Splitters never fail, therefore, they do not need backtracking, but they can lookahead and return unconsumed input. The take operation is an example of a splitter. It terminates after consuming n items. Coupled with an accumulator it can be used to split the stream into chunks of fixed size.

Consider the example of takeWhile operation, it needs to inspect an element for termination decision. However, it does not consume the element on which it terminates. To implement takeWhile a splitter will have to implement a way to return unconsumed input to the driver.

Parsers

Parsers are splitters that can fail and backtrack. Parsers can be composed using an alternative style composition where they can backtrack and apply another parser if one parser fails. satisfy is a simple example of a parser, it would succeed if the condition is satisfied and it would fail otherwise, on failure an alternative parser can be used on the same input.

Types for Stream Consumers

We use the Fold type to implement the Accumulator and Splitter functionality. Parsers are represented by the Parser type. This is a sweet spot to balance ease of use, type safety and performance. Using separate Accumulator and Splitter types would encode more information in types but it would make ease of use, implementation, maintenance effort worse. Combining Accumulator, Splitter and Parser into a single Parser type would make ease of use even better but type safety and performance worse.

One of the design requirements that we have placed for better ease of use and code reuse is that Parser type should be a strict superset of the Fold type i.e. it can do everything that a Fold can do and more. Therefore, folds can be easily upgraded to parsers and we can use parser combinators on folds as well when needed.

Fold Design

A fold is represented by a collection of "initial", "step" and "extract" functions. The "initial" action generates the initial state of the fold. The state is internal to the fold and maintains the accumulated output. The "step" function is invoked using the current state and the next input value and results in a Yield or Stop. A Yield returns the next intermediate state of the fold, a Stop indicates that the fold has terminated and returns the final value of the accumulator.

Every Yield indicates that a new accumulated output is available. The accumulated output can be extracted from the state at any point using "extract". "extract" can never fail. A fold returns a valid output even without any input i.e. even if you call "extract" on "initial" state it provides an output. This is not true for parsers.

In general, "extract" is used in two cases:

When the fold is used as a scan extract is called on the intermediate state every time it is yielded by the fold, the resulting value is yielded as a stream.
When the fold is used as a regular fold, extract is called once when we are done feeding input to the fold.

Alternate Designs

An alternate and simpler design would be to return the intermediate output via Yield along with the state, instead of using "extract" on the yielded state and remove the extract function altogether.

This may even facilitate more efficient implementation. Extract from the intermediate state after each yield may be more costly compared to the fold step itself yielding the output. The fold may have more efficient ways to retrieve the output rather than stuffing it in the state and using extract on the state.

However, removing extract altogether may lead to less optimal code in some cases because the driver of the fold needs to thread around the intermediate output to return it if the stream stops before the fold could Stop. When using this approach, the splitParse (FL.take filesize) benchmark shows a 2x worse performance even after ensuring everything fuses. So we keep the "extract" approach to ensure better perf in all cases.

But we could still yield both state and the output in Yield, the output can be used for the scan use case, instead of using extract. Extract would then be used only for the case when the stream stops before the fold completes.

Synopsis

data Fold m a b = Fold (s -> a -> m s) (m s) (s -> m b)
data Fold2 m c a b = Fold2 (s -> a -> m s) (c -> m s) (s -> m b)
simplify :: Fold2 m c a b -> c -> Fold m a b
toListRevF :: Monad m => Fold m a [a]
lmap :: (a -> b) -> Fold m b r -> Fold m a r
lmapM :: Monad m => (a -> m b) -> Fold m b r -> Fold m a r
lfilter :: Monad m => (a -> Bool) -> Fold m a r -> Fold m a r
lfilterM :: Monad m => (a -> m Bool) -> Fold m a r -> Fold m a r
lcatMaybes :: Monad m => Fold m a b -> Fold m (Maybe a) b
ltake :: Monad m => Int -> Fold m a b -> Fold m a b
ltakeWhile :: Monad m => (a -> Bool) -> Fold m a b -> Fold m a b
lsessionsOf :: MonadAsync m => Double -> Fold m a b -> Fold m b c -> Fold m a c
lchunksOf :: Monad m => Int -> Fold m a b -> Fold m b c -> Fold m a c
lchunksOf2 :: Monad m => Int -> Fold m a b -> Fold2 m x b c -> Fold2 m x a c
duplicate :: Applicative m => Fold m a b -> Fold m a (Fold m a b)
initialize :: Monad m => Fold m a b -> m (Fold m a b)
runStep :: Monad m => Fold m a b -> a -> m (Fold m a b)

Documentation

data Fold m a b Source #

Represents a left fold over an input stream consisting of values of type a to a single value of type b in Monad m.

The fold uses an intermediate state s as accumulator. The step function updates the state and returns the new state. When the fold is done the final result of the fold is extracted from the intermediate state using the extract function.

Since: 0.7.0

Constructors

Fold (s -> a -> m s) (m s) (s -> m b)

Fold step initial extract

Instances

Functor m => Functor (Fold m a) Source #	Maps a function on the output of the fold (the type `b`).
Instance details Defined in Streamly.Internal.Data.Fold.Types Methods fmap :: (a0 -> b) -> Fold m a a0 -> Fold m a b # (<$) :: a0 -> Fold m a b -> Fold m a a0 #
Applicative m => Applicative (Fold m a) Source #	The fold resulting from `<*>` distributes its input to both the argument folds and combines their output using the supplied function.
Instance details Defined in Streamly.Internal.Data.Fold.Types Methods pure :: a0 -> Fold m a a0 # (<>) :: Fold m a (a0 -> b) -> Fold m a a0 -> Fold m a b # liftA2 :: (a0 -> b -> c) -> Fold m a a0 -> Fold m a b -> Fold m a c # (>) :: Fold m a a0 -> Fold m a b -> Fold m a b # (<*) :: Fold m a a0 -> Fold m a b -> Fold m a a0 #
(Monad m, Floating b) => Floating (Fold m a b) Source #	Combines the fold outputs using their `Floating` instances.
Instance details Defined in Streamly.Internal.Data.Fold.Types Methods pi :: Fold m a b # exp :: Fold m a b -> Fold m a b # log :: Fold m a b -> Fold m a b # sqrt :: Fold m a b -> Fold m a b # (**) :: Fold m a b -> Fold m a b -> Fold m a b # logBase :: Fold m a b -> Fold m a b -> Fold m a b # sin :: Fold m a b -> Fold m a b # cos :: Fold m a b -> Fold m a b # tan :: Fold m a b -> Fold m a b # asin :: Fold m a b -> Fold m a b # acos :: Fold m a b -> Fold m a b # atan :: Fold m a b -> Fold m a b # sinh :: Fold m a b -> Fold m a b # cosh :: Fold m a b -> Fold m a b # tanh :: Fold m a b -> Fold m a b # asinh :: Fold m a b -> Fold m a b # acosh :: Fold m a b -> Fold m a b # atanh :: Fold m a b -> Fold m a b # log1p :: Fold m a b -> Fold m a b # expm1 :: Fold m a b -> Fold m a b # log1pexp :: Fold m a b -> Fold m a b # log1mexp :: Fold m a b -> Fold m a b #
(Monad m, Fractional b) => Fractional (Fold m a b) Source #	Combines the fold outputs (type `b`) using their `Fractional` instances.
Instance details Defined in Streamly.Internal.Data.Fold.Types Methods (/) :: Fold m a b -> Fold m a b -> Fold m a b # recip :: Fold m a b -> Fold m a b # fromRational :: Rational -> Fold m a b #
(Monad m, Num b) => Num (Fold m a b) Source #	Combines the fold outputs (type `b`) using their `Num` instances.
Instance details Defined in Streamly.Internal.Data.Fold.Types Methods (+) :: Fold m a b -> Fold m a b -> Fold m a b # (-) :: Fold m a b -> Fold m a b -> Fold m a b # (*) :: Fold m a b -> Fold m a b -> Fold m a b # negate :: Fold m a b -> Fold m a b # abs :: Fold m a b -> Fold m a b # signum :: Fold m a b -> Fold m a b # fromInteger :: Integer -> Fold m a b #
(Semigroup b, Monad m) => Semigroup (Fold m a b) Source #	Combines the outputs of the folds (the type `b`) using their `Semigroup` instances.
Instance details Defined in Streamly.Internal.Data.Fold.Types Methods (<>) :: Fold m a b -> Fold m a b -> Fold m a b # sconcat :: NonEmpty (Fold m a b) -> Fold m a b # stimes :: Integral b0 => b0 -> Fold m a b -> Fold m a b #
(Semigroup b, Monoid b, Monad m) => Monoid (Fold m a b) Source #	Combines the outputs of the folds (the type `b`) using their `Monoid` instances.
Instance details Defined in Streamly.Internal.Data.Fold.Types Methods mempty :: Fold m a b # mappend :: Fold m a b -> Fold m a b -> Fold m a b # mconcat :: [Fold m a b] -> Fold m a b #

data Fold2 m c a b Source #

Experimental type to provide a side input to the fold for generating the initial state. For example, if we have to fold chunks of a stream and write each chunk to a different file, then we can generate the file name using a monadic action. This is a generalized version of Fold.

Constructors

Fold2 (s -> a -> m s) (c -> m s) (s -> m b)

Fold step inject extract

simplify :: Fold2 m c a b -> c -> Fold m a b Source #

Convert more general type Fold2 into a simpler type Fold

toListRevF :: Monad m => Fold m a [a] Source #

Buffers the input stream to a list in the reverse order of the input.

Warning! working on large lists accumulated as buffers in memory could be very inefficient, consider using Streamly.Array instead.

Since: 0.7.0

This is more efficient than toList. toList is exactly the same as reversing the list after toListRevF.

lmap :: (a -> b) -> Fold m b r -> Fold m a r Source #

(lmap f fold) maps the function f on the input of the fold.

>>> S.fold (FL.lmap (\x -> x * x) FL.sum) (S.enumerateFromTo 1 100)
338350

Since: 0.7.0

lmapM :: Monad m => (a -> m b) -> Fold m b r -> Fold m a r Source #

(lmapM f fold) maps the monadic function f on the input of the fold.

Since: 0.7.0

lfilter :: Monad m => (a -> Bool) -> Fold m a r -> Fold m a r Source #

Include only those elements that pass a predicate.

>>> S.fold (lfilter (> 5) FL.sum) [1..10]
40

Since: 0.7.0

lfilterM :: Monad m => (a -> m Bool) -> Fold m a r -> Fold m a r Source #

Like lfilter but with a monadic predicate.

Since: 0.7.0

lcatMaybes :: Monad m => Fold m a b -> Fold m (Maybe a) b Source #

Transform a fold from a pure input to a Maybe input, consuming only Just values.

ltake :: Monad m => Int -> Fold m a b -> Fold m a b Source #

Take first n elements from the stream and discard the rest.

Since: 0.7.0

ltakeWhile :: Monad m => (a -> Bool) -> Fold m a b -> Fold m a b Source #

Takes elements from the input as long as the predicate succeeds.

Since: 0.7.0

lsessionsOf :: MonadAsync m => Double -> Fold m a b -> Fold m b c -> Fold m a c Source #

Group the input stream into windows of n second each and then fold each group using the provided fold function.

For example, we can copy and distribute a stream to multiple folds where each fold can group the input differently e.g. by one second, one minute and one hour windows respectively and fold each resulting stream of folds.

-----Fold m a b----|-Fold n a c-|-Fold n a c-|-...-|----Fold m a c

lchunksOf :: Monad m => Int -> Fold m a b -> Fold m b c -> Fold m a c Source #

For every n input items, apply the first fold and supply the result to the next fold.

lchunksOf2 :: Monad m => Int -> Fold m a b -> Fold2 m x b c -> Fold2 m x a c Source #

duplicate :: Applicative m => Fold m a b -> Fold m a (Fold m a b) Source #

Modify the fold such that when the fold is done, instead of returning the accumulator, it returns a fold. The returned fold starts from where we left i.e. it uses the last accumulator value as the initial value of the accumulator. Thus we can resume the fold later and feed it more input.

> do
    more <- S.fold (FL.duplicate FL.sum) (S.enumerateFromTo 1 10)
    evenMore <- S.fold (FL.duplicate more) (S.enumerateFromTo 11 20)
    S.fold evenMore (S.enumerateFromTo 21 30)
 465

Since: 0.7.0

initialize :: Monad m => Fold m a b -> m (Fold m a b) Source #

Run the initialization effect of a fold. The returned fold would use the value returned by this effect as its initial value.

runStep :: Monad m => Fold m a b -> a -> m (Fold m a b) Source #

Run one step of a fold and store the accumulator as an initial value in the returned fold.