Copyright | (c) Sebastian Graf 2018 |
---|---|
License | ISC |
Maintainer | sgraf1337@gmail.com |
Portability | portable |
Safe Haskell | None |
Language | Haskell2010 |
What is This?
The purpose of datafix
is to separate declaring
data-flow problems
from computing their solutions by
fixed-point iteration.
The need for this library arose when I was combining two analyses within GHC for my master's thesis. I recently held a talk on that topic, feel free to click through if you want to know the details.
You can think of data-flow problems as problems that are solvable by dynamic programming or memoization, except that the dependency graph of data-flow problems doesn't need to be acyclic.
Data-flow problems are declared with the primitives in
Datafix.Description
and solved by Datafix.Worklist.
.solveProblem
With that out of the way, let's set in place the GHCi environment of our examples:
>>>
:set -XScopedTypeVariables
>>>
:set -XTypeApplications
>>>
:set -XTypeFamilies
>>>
import Datafix
>>>
import Data.Proxy (Proxy (..))
>>>
import Algebra.Lattice (JoinSemiLattice (..), BoundedJoinSemiLattice (..))
>>>
import Numeric.Natural
Use Case: Solving Recurrences
Let's start out by computing the fibonacci series:
>>>
:{
fib :: Natural -> Natural fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) :}
>>>
fib 3
2>>>
fib 10
55
Bring your rabbits to the vet while you can still count them...
Anyway, the fibonacci series is a typical problem exhibiting
overlapping subproblems. As a result, our fib
function from above scales badly in
the size of its input argument n
. Because we repeatedly recompute
solutions, the time complexity of our above function is in \(\mathcal{O}(2^n)\)!
We can do better by using dynamic programming or memoization to keep a cache of already computed sub-problems, which helps computing the \(n\)th item in \(\mathcal{O}(n)\) time and space:
>>>
:{
fib2 :: Natural -> Natural fib2 n = fibs !! fromIntegral n where fibs = 0 : 1 : zipWith (+) fibs (tail fibs) :}
>>>
fib2 3
2>>>
fib2 10
55
That's one of Haskell's pet issues: Expressing dynamic programs as lists through laziness.
As promised in the previous section, we can do the same using datafix
.
First, we need to declare a transfer function that makes the data
dependencies for the recursive case explicit, as if we were using
fix
to eliminate the recursion:
>>>
:{
transferFib :: forall m . (MonadDependency m, Domain m ~ Natural) => Node -> LiftedFunc Natural m transferFib (Node 0) = return 0 transferFib (Node 1) = return 1 transferFib (Node n) = do a <- dependOn @m (Node (n-1)) b <- dependOn @m (Node (n-2)) return (a + b) :}
MonadDependency
contains a single primitive dependOn
for that purpose.
Every point of the fibonacci series is modeled as a seperate Node
of the
data-flow graph.
By looking at the definition of LiftedFunc
, we can see that
LiftedFunc Natural m ~ m Natural
, so for our simple
Natural
Domain
, the transfer function is specified directly in
MonadDependency
.
Note that indeed we eliminated explicit recursion in transferFib
.
This allows the solution algorithm to track and discover dependencies
of the transfer function as it is executed!
With our transfer function (which denotes data-flow nodes in the semantics
of Natural
s) in place, we can construct a DataFlowProblem
:
>>>
:{
fibDfp :: forall m . (MonadDependency m, Domain m ~ Natural) => DataFlowProblem m fibDfp = DFP transferFib (const (eqChangeDetector @(Domain m))) :}
The eqChangeDetector
is important for cyclic dependency graphs and makes
sure we detect when a fixed-point has been reached.
That's it for describing the data-flow problem of fibonacci numbers.
We can ask Datafix.Worklist.
for a solution in a minute.solveProblem
The solveProblem
solver demands an instance of BoundedJoinSemiLattice
on the Domain
for when the data-flow graph is cyclic. We conveniently
delegate to the total Ord
instance for Natural
, knowing
that its semantic interpretation is irrelevant to us:
>>>
instance JoinSemiLattice Natural where (\/) = max
>>>
instance BoundedJoinSemiLattice Natural where bottom = 0
And now the final incantation of the solver:
>>>
solveProblem fibDfp Sparse NeverAbort (Node 10)
55
This will also execute in \(\mathcal{O}(n)\) space and time, all without worrying about a smart solution strategy involving how to tie knots or allocate vectors. Granted, this doesn't really pay off for simple problems like computing fibonacci numbers because of the boilerplate involved and the somewhat devious type-level story, but the intended use case is that of static analysis of programming languages.
Before I delegate you to a blog post about strictness analysis, we will look at a more devious reccurence relation with actual cycles in the resulting data-flow graph.
Use Case: Solving Cyclic Recurrences
The recurrence relation describing fibonacci numbers admits a clear plan of how to compute a solution, because the dependency graph is obviously acyclic: To compute the next new value of the sequence, only the prior two values are needed.
This is not true of the following reccurence relation:
\[ f(n) = \begin{cases} 2 \cdot f(\frac{n}{2}), & n \text{ even}\\ f(n+1)-1, & n \text{ odd} \end{cases} \]
The identity function is the only solution to this, but it is unclear how we could arrive at that conclusion just by translating that relation into Haskell:
>>>
:{
f n | even n = 2 * f (n `div` 2) | odd n = f (n + 1) - 1 :}
Imagine a call f 1
: This will call f 2
recursively, which again
will call f 1
. We hit a cyclic dependency!
Fortunately, we can use datafix
to compute the solution by fixed-point
iteration (which assumes monotonicity of the function to approximate):
>>>
:{
transferF :: forall m . (MonadDependency m, Domain m ~ Int) => Node -> LiftedFunc Int m transferF (Node n) | even n = (* 2) <$> dependOn @m (Node (n `div` 2)) | odd n = (subtract 1) <$> dependOn @m (Node (n + 1)) :}
>>>
:{
fDfp :: forall m . (MonadDependency m, Domain m ~ Int) => DataFlowProblem m fDfp = DFP transferF (const (eqChangeDetector @(Domain m))) :}
Specification of the data-flow problem works the same as for the fib
function.
As for Natural
, we need an instance of BoundedJoinSemiLattice
for Int
to compute a solution:
>>>
instance JoinSemiLattice Int where (\/) = max
>>>
instance BoundedJoinSemiLattice Int where bottom = minBound
Now it's just a matter of calling solveProblem
with the right parameters:
>>>
solveProblem fDfp Sparse NeverAbort (Node 0)
0>>>
solveProblem fDfp Sparse NeverAbort (Node 5)
5>>>
solveProblem fDfp Sparse NeverAbort (Node 42)
42>>>
solveProblem fDfp Sparse NeverAbort (Node (-10))
-10
Note how the specification of the data-flow problem was as unexciting as it was for the fibonacci sequence (modulo boilerplate), yet the recurrence we solved was pretty complicated already.
Of course, encoding the identity function this way is inefficient. But keep in mind that in general, we don't know the solution to a particular recurrence! It's always possible to solve the recurrence by hand upfront, but that's trading precious developer time for what might be a throw-away problem anyway.
Which brings us to the prime and final use case...
Use Case: Static Analysis
Recurrence equations occur all the time in denotational semantics and static data-flow analysis.
For every invocation of the compiler, for every module, for every analysis within the compiler, a recurrence relation representing program semantics is to be solved. Naturally, we can't task a human with solving a bunch of complicated recurrences everytime we hit compile.
In the imperative world, it's common-place to have some kind of fixed-point iteration framework carry out the iteration of the data-flow graph, but I could not find a similar abstraction for functional programming languages yet. Analyses for functional languages are typically carried out as iterated traversals of the syntax tree, but that is unsatisfying for a number of reasons:
- Solution logic of the data-flow problem is intertwined with its specification.
- Solution logic is duplicated among multiple analyses, violating DRY.
- A consequence of the last two points is that performance tweaks
have to be adapted for every analysis separately.
In the case of GHC's Demand Analyser, going from chaotic iteration
(which corresponds to naive iterated tree traversals) to an iteration
scheme that caches results of inner let-bindings, annotations to the
syntax tree are suddenly used like
State
threads, which makes the analysis logic even more complex than it already was.
So, I can only encourage any compiler dev who wants to integrate static
analyses into their compiler to properly specify the data-flow problems
in terms of datafix
and leave the intricacies of finding a good iteration
order to this library :)