streamly-0.8.3: Dataflow programming and declarative concurrency
Copyright(c) 2019 Composewell Technologies
LicenseBSD-3-Clause
Maintainerstreamly@composewell.com
Stabilityreleased
PortabilityGHC
Safe HaskellSafe-Inferred
LanguageHaskell2010

Streamly.Data.Fold

Description

A Fold is a sink or a consumer of a stream of values. The Fold type consists of an accumulator and an effectful action that absorbs a value into the accumulator.

>>> import Data.Function ((&))
>>> import qualified Streamly.Data.Fold as Fold
>>> import qualified Streamly.Prelude as Stream

For example, a sum Fold represents adding the input to the accumulated sum. A fold driver e.g. fold pushes values from a stream to the Fold one at a time, reducing the stream to a single value.

>>> Stream.fold Fold.sum $ Stream.fromList [1..100]
5050

Conceptually, a Fold is a data type that can mimic a strict left fold (foldl) as well as lazy right fold (foldr). The above example is similar to a left fold using (+) as the step and 0 as the initial value of the accumulator:

>>> Data.List.foldl' (+) 0 [1..100]
5050

Folds have an early termination capability e.g. the head fold would terminate on an infinite stream:

>>> Stream.fold Fold.head $ Stream.fromList [1..]
Just 1

The above example is similar to the following right fold:

>>> Prelude.foldr (\x _ -> Just x) Nothing [1..]
Just 1

Folds can be combined together using combinators. For example, to create a fold that sums first two elements in a stream:

>>> sumTwo = Fold.take 2 Fold.sum
>>> Stream.fold sumTwo $ Stream.fromList [1..100]
3

Folds can be combined to run in parallel on the same input. For example, to compute the average of numbers in a stream without going through the stream twice:

>>> avg = Fold.teeWith (/) Fold.sum (fmap fromIntegral Fold.length)
>>> Stream.fold avg $ Stream.fromList [1.0..100.0]
50.5

Folds can be combined so as to partition the input stream over multiple folds. For example, to count even and odd numbers in a stream:

>>> split n = if even n then Left n else Right n
>>> stream = Stream.map split $ Stream.fromList [1..100]
>>> countEven = fmap (("Even " ++) . show) Fold.length
>>> countOdd = fmap (("Odd "  ++) . show) Fold.length
>>> f = Fold.partition countEven countOdd
>>> Stream.fold f stream
("Even 50","Odd 50")

Terminating folds can be combined to parse the stream serially such that the first fold consumes the input until it terminates and the second fold consumes the rest of the input until it terminates:

>>> f = Fold.serialWith (,) (Fold.take 8 Fold.toList) (Fold.takeEndBy (== '\n') Fold.toList)
>>> Stream.fold f $ Stream.fromList "header: hello\n"
("header: ","hello\n")

A Fold can be applied repeatedly on a stream to transform it to a stream of fold results. To split a stream on newlines:

>>> f = Fold.takeEndBy (== '\n') Fold.toList
>>> Stream.toList $ Stream.foldMany f $ Stream.fromList "Hello there!\nHow are you\n"
["Hello there!\n","How are you\n"]

Similarly, we can split the input of a fold too:

>>> Stream.fold (Fold.many f Fold.toList) $ Stream.fromList "Hello there!\nHow are you\n"
["Hello there!\n","How are you\n"]

Please see Streamly.Internal.Data.Fold for additional Pre-release functions.

Folds vs. Streams

We can often use streams or folds to achieve the same goal. However, streams are more efficient in composition of producers (e.g. serial or mergeBy) whereas folds are more efficient in composition of consumers (e.g. serialWith, partition or teeWith).

Streams are producers, transformations on streams happen on the output side:

>>> :{
 f stream =
       Stream.filter odd stream
     & Stream.map (+1)
     & Stream.sum
:}
>>> f $ Stream.fromList [1..100 :: Int]
2550

Folds are stream consumers with an input stream and an output value, stream transformations on folds happen on the input side:

>>> :{
f =
       Fold.filter odd
     $ Fold.lmap (+1)
     $ Fold.sum
:}
>>> Stream.fold f $ Stream.fromList [1..100 :: Int]
2550

Notice the similiarity in the definition of f in both cases, the only difference is the composition by & vs $ and the use lmap vs map, the difference is due to output vs input side transformations.

Synopsis

Fold Type

data Fold m a b Source #

The type Fold m a b having constructor Fold step initial extract represents a fold over an input stream of values of type a to a final value of type b in Monad m.

The fold uses an intermediate state s as accumulator, the type s is internal to the specific fold definition. The initial value of the fold state s is returned by initial. The step function consumes an input and either returns the final result b if the fold is done or the next intermediate state (see Step). At any point the fold driver can extract the result from the intermediate state using the extract function.

NOTE: The constructor is not yet exposed via exposed modules, smart constructors are provided to create folds. If you think you need the constructor of this type please consider using the smart constructors in Streamly.Internal.Data.Fold instead.

since 0.8.0 (type changed)

Since: 0.7.0

Instances

Instances details
Functor m => Functor (Fold m a) Source #

Maps a function on the output of the fold (the type b).

Instance details

Defined in Streamly.Internal.Data.Fold.Type

Methods

fmap :: (a0 -> b) -> Fold m a a0 -> Fold m a b #

(<$) :: a0 -> Fold m a b -> Fold m a a0 #

Constructors

foldl' :: Monad m => (b -> a -> b) -> b -> Fold m a b Source #

Make a fold from a left fold style pure step function and initial value of the accumulator.

If your Fold returns only Partial (i.e. never returns a Done) then you can use foldl'* constructors.

A fold with an extract function can be expressed using fmap:

mkfoldlx :: Monad m => (s -> a -> s) -> s -> (s -> b) -> Fold m a b
mkfoldlx step initial extract = fmap extract (foldl' step initial)

See also: Streamly.Prelude.foldl'

Since: 0.8.0

foldlM' :: Monad m => (b -> a -> m b) -> m b -> Fold m a b Source #

Make a fold from a left fold style monadic step function and initial value of the accumulator.

A fold with an extract function can be expressed using rmapM:

mkFoldlxM :: Functor m => (s -> a -> m s) -> m s -> (s -> m b) -> Fold m a b
mkFoldlxM step initial extract = rmapM extract (foldlM' step initial)

See also: Streamly.Prelude.foldlM'

Since: 0.8.0

foldr :: Monad m => (a -> b -> b) -> b -> Fold m a b Source #

Make a fold using a right fold style step function and a terminal value. It performs a strict right fold via a left fold using function composition. Note that this is strict fold, it can only be useful for constructing strict structures in memory. For reductions this will be very inefficient.

For example,

toList = foldr (:) []

See also: foldr

Since: 0.8.0

Folds

Accumulators

Folds that never terminate, these folds are much like strict left folds. mconcat is the fundamental accumulator. All other accumulators can be expressed in terms of mconcat using a suitable Monoid. Instead of writing folds we could write Monoids and turn them into folds.

sconcat :: (Monad m, Semigroup a) => a -> Fold m a a Source #

Append the elements of an input stream to a provided starting value.

>>> Stream.fold (Fold.sconcat 10) (Stream.map Data.Monoid.Sum $ Stream.enumerateFromTo 1 10)
Sum {getSum = 65}
sconcat = Fold.foldl' (<>)

Since: 0.8.0

mconcat :: (Monad m, Monoid a) => Fold m a a Source #

Fold an input stream consisting of monoidal elements using mappend and mempty.

>>> Stream.fold Fold.mconcat (Stream.map Data.Monoid.Sum $ Stream.enumerateFromTo 1 10)
Sum {getSum = 55}
mconcat = Fold.sconcat mempty

Since: 0.7.0

foldMap :: (Monad m, Monoid b) => (a -> b) -> Fold m a b Source #

foldMap f = Fold.lmap f Fold.mconcat

Make a fold from a pure function that folds the output of the function using mappend and mempty.

>>> Stream.fold (Fold.foldMap Data.Monoid.Sum) $ Stream.enumerateFromTo 1 10
Sum {getSum = 55}

Since: 0.7.0

foldMapM :: (Monad m, Monoid b) => (a -> m b) -> Fold m a b Source #

foldMapM f = Fold.lmapM f Fold.mconcat

Make a fold from a monadic function that folds the output of the function using mappend and mempty.

>>> Stream.fold (Fold.foldMapM (return . Data.Monoid.Sum)) $ Stream.enumerateFromTo 1 10
Sum {getSum = 55}

Since: 0.7.0

drain :: Monad m => Fold m a () Source #

A fold that drains all its input, running the effects and discarding the results.

drain = drainBy (const (return ()))

Since: 0.7.0

drainBy :: Monad m => (a -> m b) -> Fold m a () Source #

drainBy f = lmapM f drain
drainBy = Fold.foldMapM (void . f)

Drain all input after passing it through a monadic function. This is the dual of mapM_ on stream producers.

See also: mapM_

Since: 0.7.0

last :: Monad m => Fold m a (Maybe a) Source #

Extract the last element of the input stream, if any.

last = fmap getLast $ Fold.foldMap (Last . Just)

Since: 0.7.0

length :: Monad m => Fold m a Int Source #

Determine the length of the input stream.

length = fmap getSum $ Fold.foldMap (Sum . const  1)

Since: 0.7.0

sum :: (Monad m, Num a) => Fold m a a Source #

Determine the sum of all elements of a stream of numbers. Returns additive identity (0) when the stream is empty. Note that this is not numerically stable for floating point numbers.

sum = fmap getSum $ Fold.foldMap Sum

Since: 0.7.0

product :: (Monad m, Num a, Eq a) => Fold m a a Source #

Determine the product of all elements of a stream of numbers. Returns multiplicative identity (1) when the stream is empty. The fold terminates when it encounters (0) in its input.

Compare with Fold.foldMap Product.

Since 0.8.0 (Added Eq constraint)

Since: 0.7.0

maximumBy :: Monad m => (a -> a -> Ordering) -> Fold m a (Maybe a) Source #

Determine the maximum element in a stream using the supplied comparison function.

Since: 0.7.0

maximum :: (Monad m, Ord a) => Fold m a (Maybe a) Source #

maximum = Fold.maximumBy compare

Determine the maximum element in a stream.

Compare with Fold.foldMap Max.

Since: 0.7.0

minimumBy :: Monad m => (a -> a -> Ordering) -> Fold m a (Maybe a) Source #

Computes the minimum element with respect to the given comparison function

Since: 0.7.0

minimum :: (Monad m, Ord a) => Fold m a (Maybe a) Source #

Determine the minimum element in a stream using the supplied comparison function.

minimum = minimumBy compare

Compare with Fold.foldMap Min.

Since: 0.7.0

mean :: (Monad m, Fractional a) => Fold m a a Source #

Compute a numerically stable arithmetic mean of all elements in the input stream.

Since: 0.7.0

variance :: (Monad m, Fractional a) => Fold m a a Source #

Compute a numerically stable (population) variance over all elements in the input stream.

Since: 0.7.0

stdDev :: (Monad m, Floating a) => Fold m a a Source #

Compute a numerically stable (population) standard deviation over all elements in the input stream.

Since: 0.7.0

rollingHash :: (Monad m, Enum a) => Fold m a Int64 Source #

Compute an Int sized polynomial rolling hash of a stream.

rollingHash = Fold.rollingHashWithSalt defaultSalt

Since: 0.8.0

rollingHashWithSalt :: (Monad m, Enum a) => Int64 -> Fold m a Int64 Source #

Compute an Int sized polynomial rolling hash

H = salt * k ^ n + c1 * k ^ (n - 1) + c2 * k ^ (n - 2) + ... + cn * k ^ 0

Where c1, c2, cn are the elements in the input stream and k is a constant.

This hash is often used in Rabin-Karp string search algorithm.

See https://en.wikipedia.org/wiki/Rolling_hash

Since: 0.8.0

toList :: Monad m => Fold m a [a] Source #

Folds the input stream to a list.

Warning! working on large lists accumulated as buffers in memory could be very inefficient, consider using Streamly.Data.Array.Foreign instead.

toList = foldr (:) []

Since: 0.7.0

toListRev :: Monad m => Fold m a [a] Source #

Buffers the input stream to a list in the reverse order of the input.

toListRev = Fold.foldl' (flip (:)) []

Warning! working on large lists accumulated as buffers in memory could be very inefficient, consider using Streamly.Array instead.

Since: 0.8.0

Terminating Folds

These are much like lazy right folds.

index :: Monad m => Int -> Fold m a (Maybe a) Source #

Lookup the element at the given index.

See also: !!

Since: 0.7.0

head :: Monad m => Fold m a (Maybe a) Source #

Extract the first element of the stream, if any.

Since: 0.7.0

find :: Monad m => (a -> Bool) -> Fold m a (Maybe a) Source #

Returns the first element that satisfies the given predicate.

Since: 0.7.0

lookup :: (Eq a, Monad m) => a -> Fold m (a, b) (Maybe b) Source #

In a stream of (key-value) pairs (a, b), return the value b of the first pair where the key equals the given value a.

lookup = snd <$> Fold.find ((==) . fst)

Since: 0.7.0

findIndex :: Monad m => (a -> Bool) -> Fold m a (Maybe Int) Source #

Returns the first index that satisfies the given predicate.

Since: 0.7.0

elemIndex :: (Eq a, Monad m) => a -> Fold m a (Maybe Int) Source #

Returns the first index where a given value is found in the stream.

elemIndex a = Fold.findIndex (== a)

Since: 0.7.0

null :: Monad m => Fold m a Bool Source #

Return True if the input stream is empty.

null = fmap isJust Fold.head

Since: 0.7.0

elem :: (Eq a, Monad m) => a -> Fold m a Bool Source #

Return True if the given element is present in the stream.

elem a = Fold.any (== a)

Since: 0.7.0

notElem :: (Eq a, Monad m) => a -> Fold m a Bool Source #

Returns True if the given element is not present in the stream.

notElem a = Fold.all (/= a)

Since: 0.7.0

all :: Monad m => (a -> Bool) -> Fold m a Bool Source #

Returns True if all elements of a stream satisfy a predicate.

>>> Stream.fold (Fold.all (== 0)) $ Stream.fromList [1,0,1]
False
all p = Fold.lmap p Fold.and

Since: 0.7.0

any :: Monad m => (a -> Bool) -> Fold m a Bool Source #

Returns True if any of the elements of a stream satisfies a predicate.

>>> Stream.fold (Fold.any (== 0)) $ Stream.fromList [1,0,1]
True
any p = Fold.lmap p Fold.or

Since: 0.7.0

and :: Monad m => Fold m Bool Bool Source #

Returns True if all elements are True, False otherwise

and = Fold.all (== True)

Since: 0.7.0

or :: Monad m => Fold m Bool Bool Source #

Returns True if any element is True, False otherwise

or = Fold.any (== True)

Since: 0.7.0

Combinators

Combinators are modifiers of folds. In the type Fold m a b, a is the input type and b is the output type. Transformations can be applied either on the input side or on the output side. Therefore, combinators are of one of the following general shapes:

  • ... -> Fold m a b -> Fold m c b (input transformation)
  • ... -> Fold m a b -> Fold m a c (output transformation)

Output transformations are also known as covariant transformations, and input transformations are also known as contravariant transformations. The input side transformations are more interesting for folds. Most of the following sections describe the input transformation operations on a fold. The names and signatures of the operations are consistent with corresponding operations in Streamly.Prelude. When an operation makes sense on both input and output side we use the prefix l (for left) for input side operations and the prefix r (for right) for output side operations.

Mapping on output

The Functor instance of a fold maps on the output of the fold:

>>> Stream.fold (fmap show Fold.sum) (Stream.enumerateFromTo 1 100)
"5050"

rmapM :: Monad m => (b -> m c) -> Fold m a b -> Fold m a c Source #

Map a monadic function on the output of a fold.

Since: 0.8.0

Mapping on Input

lmap :: (a -> b) -> Fold m b r -> Fold m a r Source #

lmap f fold maps the function f on the input of the fold.

>>> Stream.fold (Fold.lmap (\x -> x * x) Fold.sum) (Stream.enumerateFromTo 1 100)
338350
lmap = Fold.lmapM return

Since: 0.8.0

lmapM :: Monad m => (a -> m b) -> Fold m b r -> Fold m a r Source #

lmapM f fold maps the monadic function f on the input of the fold.

Since: 0.8.0

Filtering

filter :: Monad m => (a -> Bool) -> Fold m a r -> Fold m a r Source #

Include only those elements that pass a predicate.

>>> Stream.fold (Fold.filter (> 5) Fold.sum) $ Stream.fromList [1..10]
40
filter f = Fold.filterM (return . f)

Since: 0.8.0

filterM :: Monad m => (a -> m Bool) -> Fold m a r -> Fold m a r Source #

Like filter but with a monadic predicate.

Since: 0.8.0

catMaybes :: Monad m => Fold m a b -> Fold m (Maybe a) b Source #

Modify a fold to receive a Maybe input, the Just values are unwrapped and sent to the original fold, Nothing values are discarded.

Since: 0.8.0

mapMaybe :: Monad m => (a -> Maybe b) -> Fold m b r -> Fold m a r Source #

mapMaybe f fold maps a Maybe returning function f on the input of the fold, filters out Nothing elements, and return the values extracted from Just.

>>> f x = if even x then Just x else Nothing
>>> fld = Fold.mapMaybe f Fold.toList
>>> Stream.fold fld (Stream.enumerateFromTo 1 10)
[2,4,6,8,10]

Since: 0.8.0

Trimming

take :: Monad m => Int -> Fold m a b -> Fold m a b Source #

Take at most n input elements and fold them using the supplied fold. A negative count is treated as 0.

>>> Stream.fold (Fold.take 2 Fold.toList) $ Stream.fromList [1..10]
[1,2]

Since: 0.8.0

takeEndBy_ :: Monad m => (a -> Bool) -> Fold m a b -> Fold m a b Source #

Like takeEndBy but drops the element on which the predicate succeeds.

>>> Stream.fold (Fold.takeEndBy_ (== '\n') Fold.toList) $ Stream.fromList "hello\nthere\n"
"hello"
>>> Stream.toList $ Stream.foldMany (Fold.takeEndBy_ (== '\n') Fold.toList) $ Stream.fromList "hello\nthere\n"
["hello","there"]
Stream.splitOnSuffix p f = Stream.foldMany (Fold.takeEndBy_ p f)

See splitOnSuffix for more details on splitting a stream using takeEndBy_.

Since: 0.8.0

takeEndBy :: Monad m => (a -> Bool) -> Fold m a b -> Fold m a b Source #

Take the input, stop when the predicate succeeds taking the succeeding element as well.

>>> Stream.fold (Fold.takeEndBy (== '\n') Fold.toList) $ Stream.fromList "hello\nthere\n"
"hello\n"
>>> Stream.toList $ Stream.foldMany (Fold.takeEndBy (== '\n') Fold.toList) $ Stream.fromList "hello\nthere\n"
["hello\n","there\n"]
Stream.splitWithSuffix p f = Stream.foldMany (Fold.takeEndBy p f)

See splitWithSuffix for more details on splitting a stream using takeEndBy.

Since: 0.8.0

Serial Append

serialWith :: Monad m => (a -> b -> c) -> Fold m x a -> Fold m x b -> Fold m x c Source #

Sequential fold application. Apply two folds sequentially to an input stream. The input is provided to the first fold, when it is done - the remaining input is provided to the second fold. When the second fold is done or if the input stream is over, the outputs of the two folds are combined using the supplied function.

>>> f = Fold.serialWith (,) (Fold.take 8 Fold.toList) (Fold.takeEndBy (== '\n') Fold.toList)
>>> Stream.fold f $ Stream.fromList "header: hello\n"
("header: ","hello\n")

Note: This is dual to appending streams using serial.

Note: this implementation allows for stream fusion but has quadratic time complexity, because each composition adds a new branch that each subsequent fold's input element has to traverse, therefore, it cannot scale to a large number of compositions. After around 100 compositions the performance starts dipping rapidly compared to a CPS style implementation.

Time: O(n^2) where n is the number of compositions.

Since: 0.8.0

Parallel Distribution

For applicative composition using distribution see Streamly.Internal.Data.Fold.Tee.

teeWith :: Monad m => (a -> b -> c) -> Fold m x a -> Fold m x b -> Fold m x c Source #

teeWith k f1 f2 distributes its input to both f1 and f2 until both of them terminate and combines their output using k.

>>> avg = Fold.teeWith (/) Fold.sum (fmap fromIntegral Fold.length)
>>> Stream.fold avg $ Stream.fromList [1.0..100.0]
50.5
teeWith k f1 f2 = fmap (uncurry k) ((Fold.tee f1 f2)

For applicative composition using this combinator see Streamly.Internal.Data.Fold.Tee.

See also: Streamly.Internal.Data.Fold.Tee

Since: 0.8.0

tee :: Monad m => Fold m a b -> Fold m a c -> Fold m a (b, c) Source #

Distribute one copy of the stream to each fold and zip the results.

                |-------Fold m a b--------|
---stream m a---|                         |---m (b,c)
                |-------Fold m a c--------|
>>> Stream.fold (Fold.tee Fold.sum Fold.length) (Stream.enumerateFromTo 1.0 100.0)
(5050.0,100)
tee = teeWith (,)

Since: 0.7.0

distribute :: Monad m => [Fold m a b] -> Fold m a [b] Source #

Distribute one copy of the stream to each fold and collect the results in a container.

                |-------Fold m a b--------|
---stream m a---|                         |---m [b]
                |-------Fold m a b--------|
                |                         |
                           ...
>>> Stream.fold (Fold.distribute [Fold.sum, Fold.length]) (Stream.enumerateFromTo 1 5)
[15,5]
distribute = Prelude.foldr (Fold.teeWith (:)) (Fold.fromPure [])

This is the consumer side dual of the producer side sequence operation.

Stops when all the folds stop.

Since: 0.7.0

Partitioning

Direct items in the input stream to different folds using a binary fold selector.

partition :: Monad m => Fold m b x -> Fold m c y -> Fold m (Either b c) (x, y) Source #

Compose two folds such that the combined fold accepts a stream of Either and routes the Left values to the first fold and Right values to the second fold.

partition = partitionBy id

Since: 0.7.0

Unzipping

unzip :: Monad m => Fold m a x -> Fold m b y -> Fold m (a, b) (x, y) Source #

Send the elements of tuples in a stream of tuples through two different folds.

                          |-------Fold m a x--------|
---------stream of (a,b)--|                         |----m (x,y)
                          |-------Fold m b y--------|

unzip = Fold.unzipWith id

This is the consumer side dual of the producer side zip operation.

Since: 0.7.0

Splitting

many :: Monad m => Fold m a b -> Fold m b c -> Fold m a c Source #

Collect zero or more applications of a fold. many split collect applies the split fold repeatedly on the input stream and accumulates zero or more fold results using collect.

>>> two = Fold.take 2 Fold.toList
>>> twos = Fold.many two Fold.toList
>>> Stream.fold twos $ Stream.fromList [1..10]
[[1,2],[3,4],[5,6],[7,8],[9,10]]

Stops when collect stops.

See also: concatMap, foldMany

Since: 0.8.0

chunksOf :: Monad m => Int -> Fold m a b -> Fold m b c -> Fold m a c Source #

chunksOf n split collect repeatedly applies the split fold to chunks of n items in the input stream and supplies the result to the collect fold.

>>> twos = Fold.chunksOf 2 Fold.toList Fold.toList
>>> Stream.fold twos $ Stream.fromList [1..10]
[[1,2],[3,4],[5,6],[7,8],[9,10]]
chunksOf n split = many (take n split)

Stops when collect stops.

Since: 0.8.0

Nesting

concatMap :: Monad m => (b -> Fold m a c) -> Fold m a b -> Fold m a c Source #

Map a Fold returning function on the result of a Fold and run the returned fold. This operation can be used to express data dependencies between fold operations.

Let's say the first element in the stream is a count of the following elements that we have to add, then:

>>> import Data.Maybe (fromJust)
>>> count = fmap fromJust Fold.head
>>> total n = Fold.take n Fold.sum
>>> Stream.fold (Fold.concatMap total count) $ Stream.fromList [10,9..1]
45

Time: O(n^2) where n is the number of compositions.

See also: foldIterateM

Since: 0.8.0

Deprecated

sequence :: Monad m => Fold m a (m b) -> Fold m a b Source #

Deprecated: Use "rmapM id" instead

Flatten the monadic output of a fold to pure output.

Since: 0.7.0

mapM :: Monad m => (b -> m c) -> Fold m a b -> Fold m a c Source #

Deprecated: Use rmapM instead

Map a monadic function on the output of a fold.

Since: 0.7.0