This package defines the Fold
, NonemptyFold
, and EffectfulFold
types and
provides an assortment of ways to construct, combine, and use them.
Every gambler knows that the secret to surviving
Is knowing what to throw away and knowing what to keep
You got to know when to hold 'em, know when to fold 'em
Know when to walk away, and know when to run
— The Gambler by Don Schlitz, popularized by Kenny Rogers
Fold
The foldl'
function in the base
package is used when we want a strictly
evaluated result from traversing a list.
foldl' :: Foldable t => (b -> a -> b) -> b -> t a -> b
For example, to sum a list of numbers:
λ> import qualified Data.List as List
λ> List.foldl' (+) 0 [1..100]
5050
What if we put the first two parameters to List.foldl'
into a datatype?
data Fold a b = Fold
{ initial :: b
, step :: b -> a -> b }
Or, better yet, we can use a trick to turn the datatype into a Functor
(which
will become important when we discuss the Applicative
a bit later):
data Fold a b = forall x. Fold
{ initial :: x
, step :: x -> a -> x
, extract :: x -> b }
We can then express the concept of numeric summation as:
sum :: Num a => Fold a a
sum = Fold{ initial = 0, step = (+), extract = id }
This Fold
can be used to sum lists and other Foldable
collections, but it
can also be used to sum effectful streams. So even without any further
mechanism, just having this datatype gives us some useful expressive power.
There is no need for each streaming library to duplicate all the work of
defining its own copies of sum
, product
, all
, any
, and
, or
,
minimum
, maximum
, etc.; a library that provides some kind of Stream
type
needs only define a function to apply a fold to a stream ...
foldStream :: Fold a b -> Stream m a -> m b
... and then users can make use of any library of folds that they may find or
concoct. gambler
itself contains much of the functionality of the standard
Data.List
module, but there are more things in heaven and earth than are
dreamt of in this package.
NonemptyFold
There are some kinds of folding that only work if the input it nonempty.
Suppose, for example, we want the greatest of all the items. If there are no
items, there is no greatest item. We express this sort of thing with a slight
modification to Fold
:
data NonemptyFold a b = forall x. NonemptyFold
{ initial :: a -> x
, step :: x -> a -> x
, extract :: x -> b }
The only thing that's different is the type of the initial
field has changed
from x
to a -> x
; it is now parameterized on the first item.
The notion of selecting greatest item can now be expressed as:
maximum = NonemptyFold{ initial = id, step = max, extract = id }
A NonemptyFold
can be converted to a Fold
using Fold.Pure.nonemptyFold
.
The conversion changes the fold's return type from b
to Maybe b
to
accommodate the possibility of empty input.
EffectfulFold
There is a related function in base
that does the same thing as foldl'
but
in a monadic context:
foldM :: Foldable t => Monad m => (b -> a -> m b) -> b -> t a -> m b
This allows us to perform effects as we fold.
λ> import qualified Control.Monad as Monad
λ> Monad.foldM (\x a -> putStrLn ("* " <> show a) $> (x + a)) 0 [1..5]
* 1
* 2
* 3
* 4
* 5
15
The type we define corresponding to the arguments of Monad.foldM
is:
data EffectfulFold m a b = forall x. EffectfulFold
{ initial :: m x
, step :: x -> a -> m x
, extract :: x -> m b }
A regular Fold
can be converted to an EffectfulFold
of any monad using
Fold.Effectful.fold
.
Applicative instances
The Fold
and EffectfulFold
applicatives are great for computing multiple folds
over a collection in one pass over the data. For example, suppose that you want
to compute both the sum and the length of a list. The following approach works,
but it uses space inefficiently:
import qualified Data.List as List
sumAndLength :: Num a => [a] -> (a, Natural)
sumAndLength xs = (List.sum xs, List.genericLength xs)
The problem is this goes over the list in two passes. If you demand the result
of sum
, the Haskell runtime will materialize the entire list. However, the
runtime cannot garbage collect the list because the list is still required for
the call to length
. The space requirement of sumAndLength
is therefore
linear with respect to the size of the list. We can do much better.
With gambler
, we can instead write:
import qualified Fold.Pure as Fold
sumAndLength :: Num a => [a] -> (a, Natural)
sumAndLength = Fold.runFold $ (,) <$> Fold.sum <*> Fold.length
This achieves the same result using constant space.
ShortcutFold
Operations like sum
and length
inherently require the entirety of the input.
But suppose, for example, we want to know whether a list contains a particular
item. Once we find the item, we can stop looking. For situations like this, we
have ShortcutFold
.
data ShortcutFold a b = forall x y. ShortcutFold
{ initial :: Vitality x y
, step :: y -> a -> Vitality x y
, extractDead :: x -> b
, extractLive :: y -> b
}
data Vitality a b = Dead a | Alive Will b
data Will = Ambivalent | Tenacious
If initial
or step
returns a Dead
result, then no further input can be fed
into the fold. The Will
type will be discussed below.
Example of testing for the presence of a particular element:
λ> import qualified Fold.Shortcut as Fold
λ> Fold.run (Fold.element 'a') "back"
True
λ> Fold.run (Fold.element 'a') "front"
False
By running a shortcut fold over a partially-defined input list, we can verify
that evaluation does indeed halt once the element is found:
λ> Fold.run (Fold.element 'a') ("ba" ++ undefined)
True
ShortcutNonemptyFold
The non-empty variant of ShortcutFold
is ShortcutNonemptyFold
.
data ShortcutNonemptyFold a b = forall x y. ShortcutNonemptyFold
{ initial :: a -> Vitality x y
, step :: y -> a -> Vitality x y
, extractDead :: x -> b
, extractLive :: y -> b
}
Like NonemptyFold
, the only thing we've changed here is to add an a
parameter to the initial
field.
Shortcutting Applicative instances
Just like Fold
, ShortcutFold
can be combined in an applicative manner and
the input is traversed in a single pass.
λ> import qualified Fold.Shortcut as Fold
λ> Fold.run ((,) <$> Fold.index 4 <*> Fold.index 2) "abcdef"
(Just 'e',Just 'c')
With an applicative composition, traversal continues as long as any of the
constituent parts is alive and still expecting more input. In the example above,
index 2
becomes dead after the third character ('c'
), but the composition
stays alive until the fifth character ('e'
) results in the completion of
index 4
.
A fold indicates that it is finished by returning Dead
as its Vitality
. But
there is one more subtlety: When a fold is Alive
, it can further specify
whether it wants to keep receiving more items or not. This disposition is
called Will
. A Tenacious
will means that the fold wants more input; an
Ambivalent
will mean the fold can accept more input, but doesn't care. The
difference is that tenacious folds keep larger applicative compositions alive,
whereas ambivalent folds do not. Shortcut folds are tagged in the documentation
as "(ambivalent)" or "(tenacious)".
One example of an ambivalent shortcut fold is length
. We see in the example
below that the length fold keeps counting only as long as the other fold it
is composed with still needs input.
λ> Fold.run ((,) <$> Fold.length <*> Fold.element 'e') "abcdef"
(5,True)
λ> Fold.run ((,) <$> Fold.length <*> Fold.element 'z') "abcdef"
(6,False)
Consider another example:
λ> f1 = ((,) <$> Fold.length <*> Fold.element 'c')
λ> f2 = ((,) <$> Fold.length <*> Fold.element 'e')
λ> Fold.run ((,) <$> f1 <*> f2) "abcdef"
((5,True),(5,True))
The example above composes four folds (two length
folds and two element
folds). Notice that both of the length folds returned 5 because they keep
running until the entire fold stops, regardless of the way they're composed. We
can avoid this behavior with the demotivate
utility, which stops a fold from
being so eager. A demotivated fold will stop receiving input when all of its
components are either dead or ambivalent.
λ> Fold.run ((,) <$> Fold.demotivate f1 <*> Fold.demotivate f2) "abcdef"
((3,True),(5,True))
Quick start
To get quickly playing around with gambler
, launch GHCi using cabal
:
cabal repl --build-depends gambler
The example from the previous section can be run as follows:
λ> import qualified Fold
λ> Fold.runFold ((,) <$> Fold.sum <*> Fold.length) [1..1000000]
(500000500000,1000000)
Authors
This package began as mostly a copy of foldl, with some features removed to
minimize its dependency set. What remained in gambler-0.0
was essentially
the same as what could be found in foldl-1.4.13
, subject only to
reorganization, renaming, and minor modifications. The Fold
, EffectfulFold
,
and NonemptyFold
types are the work of Gabriella Gonzalez.
ShortcutFold
and ShortcutNonemptyFold
were added by Mission Valley Software
in gambler-0.1
.
Future plans
Once the Foldable1
class has been added to base
, the type of
Fold.Nonempty.run
may be generalized to accommodate it.