syntax-tree: Typed ASTs

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Warnings:

Please see the README on GitHub at https://github.com/lamdu/syntax-tree#readme


[Skip to Readme]

Properties

Versions 0.1.0.0
Change log ChangeLog.md
Dependencies array, base (>=4.9 && <5), base-compat, binary, constraints, containers, deepseq, generic-constraints, generic-data, lattices, lens, monad-st, mtl, pretty, QuickCheck, show-combinators, template-haskell, th-abstraction (>=0.3), transformers [details]
License BSD-3-Clause
Copyright 2018 Yair Chuchem"
Author Yair Chuchem
Maintainer yairchu@gmail.com
Category Algorithms, Compilers/Interpreters, Language, Logic, Unification
Home page https://github.com/lamdu/syntax-tree#readme
Bug tracker https://github.com/lamdu/syntax-tree/issues
Source repo head: git clone https://github.com/lamdu/syntax-tree
Uploaded by YairChuchem at 2019-09-13T17:37:12Z

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for syntax-tree-0.1.0.0

[back to package description]

syntax-tree: Higher-Kinded Data + recursion-schemes

Parameterizing data types by a "field constructor" is a widely used technique, used by the "Higher-Kinded Data" pattern and by recursion-schemes, but these two approaches do not work well together and each one of them has its limitations.

syntax-tree is a Haskell library for describing rich nested and mutually recursive types with the "field constructor" parameter pattern, in a modular and composable manner.

Introduction to the "field constructor" pattern and the status-quo

Type: Simple type, simple functionality

Suppose we have the following type in an application:

data Person = Person
    { height :: Double
    , weight :: Double
    }

Let's imagine that we want to let a user fill in a Person via a form, where during the process the record may have missing fields.

We may want a way to represent a state with missing fields, but this type doesn't allow for it.

We can either create an additional type for that, or augment Person to provide more functionality. Augmenting Person is preferred because it will result in less boiler-plate and less types to maintain as we make changes to it.

Type -> Type: Adding a type parameter

A possible solution is to parameterize Person on the field type:

data Person a = Person
    { height :: a
    , weight :: a
    }

This would solve our problem.

We can parameterize with Double for the normal structure, and with Maybe Double for the variant with missing fields!

This approach reaches its limits when the fields have multiple different types, as in:

data Person = Person
    { height :: Double
    , weight :: Double
    , name :: Text
    }

We would now need an additional parameter to parameterize how to store the fields of type Text! Is there a way to use a single type parameter for both types of fields? Yes, there is:

(Type -> Type) -> Type: Higher-Kinded Data

We could represent Person like so:

data Person f = Person
    { height :: f Double
    , weight :: f Double
    , name :: f Text
    }

For the plain case we would use Person Identity. Identity from Data.Functor.Identity is defined as so:

data Identity a = Identity a

And for the variant with missing fields we would use Person Maybe.

The benefit of this parameterization over the previous one is that Person's kind doesn't need to change when adding more field types, so such changes don't propagate all over the code base.

Note that various helper classes such as Rank2.Functor and Rank2.Traversable (from the rank2classes package) allow us to conviniently convert between Person Identity and Person Maybe.

HKD for nested structures

Let's employ the same transformation we did for Person to a more complicated data structure:

data Expr
    = Const Int
    | Add Expr Expr
    | Mul Expr Expr

The HKD form of Expr would be:

data Expr f
    = Const (f Int)
    | Add (f (Expr f)) (f (Expr f))
    | Mul (f (Expr f)) (f (Expr f))

This does allow representing nested structures with missing elements. But classes like Rank2.Functor no longer work for it. To understand why let's look at Rank2.Functor's definition

class Functor f where
    (<$>) :: (forall a. p a -> q a) -> f p -> f q

The rank-2 function argument expects the field type a to stay the same when it changes p to q, however in the above formulation of Expr the field type Expr p change to Expr q when changing the type parameter.

Type -> Type: The recursion-schemes approach

The recursion-schemes formulation of Expr is the same as the Type -> Type approach discussed above:

data Expr a
    = Const Int
    | Add a a
    | Mul a a

(Note that recursion-schemes can generate it for us from the simple definition of Expr using TemplateHaskell)

This approach does have the single node type limitation, so we gave up on parameterizing over the Int in Const. This is a big limitation, but we do get several advantages.

We can represent an expression as Fix Expr, using:

newtype Fix f = Fix (f (Fix f))

We can then use useful combinators from recursion-schemes for folding and processing of Exprs.

unification-fd is a good example for the power of this approach. It implements unification for ASTs, where it uses the parameterization to store unification variables standing for terms.

In constrast to the HKD approach, we can also use rich fix-points which store several different fix-points within, like Diff:

data Diff f
    = Same (f (Fix f))
    | SameTopLevel (f (Diff f))
    | Different (f (Fix f)) (f (Fix f))

(Note how Diff parameterizes f by both Fix and Diff)

The main drawback of this approach is that in practice ASTs tend to be mutually recursive datatypes. For example:

data Expr
    = Var Text
    | App Expr Expr
    | Lam Text Typ Expr
data Typ
    = IntT
    | FuncT Typ Typ

This type is an example for an AST which recursion-schemes can't help us with.

Is there a way to present this structure? Yes:

(Index -> Type) -> Index -> Type: The multirec approach

multirec's way to define the above AST:

data Expr :: Index
data Typ :: Index

data AST :: (Index -> Type) -> Index -> Type where
    Var :: Text -> AST r Expr
    App :: r Expr -> r Expr -> AST r Expr
    Lam :: Text -> r Typ -> r Expr -> AST r Expr
    IntT :: AST r Typ
    FuncT :: r Typ -> r Typ -> AST r Typ

(this is a slight variant of multirec's actual presentation, where for improved legibility Index is used rather than Type)

multirec offers various utilities to process such data types. It offers HFunctor, a variant of Functor for these structures, and various recursive combinators.

But multirec has several limitations:

Knot -> Type: syntax-tree's approach

The syntax-tree representation of the above AST example:

data Expr k
    = Var Text
    | App (k # Expr) (k # Expr)
    | Lam Text (k # Typ) (k # Expr)
data Typ k
    = IntT
    | FuncT (k # Typ) (k # Typ)

The # type operator used above requires introduction:

type k # p = (GetKnot k) ('Knot p)

newtype Knot = Knot (Knot -> Type)

-- GetKnot is a getter from the Knot newtype lifted to the type level
type family GetKnot k where
    GetKnot ('Knot t) = t

('Knot is DataKinds syntax for using the Knot data constructor in types)

For this representation, syntax-tree offers the power and functionality of HKD, recursion-schemes, multirec, and some useful helpers:

Usage examples

First, how do we represent an example expression of the example language declared above? Let's start with a verbose way:

verboseExpr :: Tree Pure Expr
verboseExpr =
    Pure (Lam "x" (Pure IntT) (Pure (Var "x")))

Explanations for the above:

To write examples and tests more consicely, the KHasPlain class, along with a TemplateHaskell generator for it, exists:

> let e = kPlain # verboseExpr

> e
LamP "x" IntTP (VarP "x")

> :t e
e :: KPlain Expr

It's now easier to see that e represents \(x:Int). x

KPlain is a data family of "plain versions" of expressions. These are generated automatically via TemplateHaskell.

This is similar to how recursion-schemes can derive a parameterized version of an AST, but in the other way around: the parameterized type is the source and the plain one is generated. We believe this is a good choice because the parameterized type will be used more often in application code.

So now, let's define some example expressions in the shorter way:

exprA, exprB :: KPlain Expr

exprA = LamP "x" IntTP (VarP "x")

exprB = LamP "x" (FuncTP IntTP IntTP) (VarP "x")

What can we do with these expressions? Let's calculate a diff:

> let d = diffP exprA exprB

> d
CommonBodyP
(Lam "x"
    (DifferentP IntTP (FuncTP IntTP IntTP))
    (CommonSubTreeP (VarP "x"))
)

> :t d
d :: Tree DiffP Expr
-- (An Expr with the DiffP fix-point)

Let's see the type of diffP

> :t diffP
diffP ::
    ( RTraversable k
    , Recursively ZipMatch k
    , Recursively KHasPlain k
    ) =>
    KPlain k -> KPlain k -> Tree DiffP k

diffP can calculate the diff for any AST that is recursively traversable, can be matched, and has a plain representation.

Now, let's format this diff better:

> let formatDiff _ x y = "- " <> show x <> "\n+ " <> show y <> "\n"

> putStrLn (foldDiffsP formatDiff d)
- IntTP
+ FuncTP IntTP IntTP

> :t foldDiffsP
foldDiffsP ::
    ( Monoid r
    , Recursively KFoldable k
    , Recursively KHasPlain k
    ) =>
    (forall n. KHasPlain n => KRecWitness k n -> KPlain n -> KPlain n -> r) ->
    Tree DiffP k ->
    r

What does the ignored argument of formatDiff stand for? It is the KRecWitness k n from the type of foldDiffsP above. It is a witness argument that "proves" that the folded node n is a recursive node of k (a type parameter from foldDiffsP's type).

Witness parameters

First, I want to give thanks and credit: We learned of this elegant solution from multirec!

What are witness parameters?

Let's look at how KFunctor is defined:

class KNodes k => KFunctor k where
    -- | 'KFunctor' variant of 'fmap'
    mapK ::
        (forall n. KWitness k n -> Tree p n -> Tree q n) ->
        Tree k p ->
        Tree k q

KFunctor can change a tree of k's fix-point from p to q given a rank-n-function that transforms an element in p to an element in q given a witness that its node n it a node of k.

KWitness is a data family which comes from the KNodes instance of k.

For an example let's see the definition of Expr's KWitness:

data instance KWitness Expr n where
    W_Expr_Expr :: KWitness Expr Expr
    W_Expr_Typ :: KWitness Expr Typ

Note that this GADT gets automatically derived via a TemplateHaskell generator!

What does this give us? When mapping over an Expr we can:

Understanding Knots

multirec ties this knot by using indices to represent types. syntax-tree does this by using DataKinds and the Knot newtype which is used for both structures and their fix-points An implication of the two being the same is that the same classes and combinators are re-used for both.

What Haskell is this

syntax-tree is implemented with GHC and heavily relies on quite a few language extensions:

Many harmless syntactic extensions are also used:

Some extensions we use but would like to avoid (we're looking for alternative solutions but haven't found them):

How does syntax-tree compare/relate to

Note that comparisons to HKD, recursion-schemes, multirec, rank2classes, and unification-fd were discussed in depth above.

In addition:

hyperfunctions

S. Krstic et al [KLP2001] have described the a type which they call a "Hyperfunction". Here is it's definition from the hyperfunctions package:

newtype Hyper a b = Hyper { invoke :: Hyper b a -> b }

Knots are isomorphic to Hyper Type Type (assuming a PolyKinds variant of Hyper), so they can be seen as type-level "hyperfunctions".

For more info on hyperfunctions and their use cases in the value level see [LKS2013]

References

Data Types a la Carte

Data Types a la Carte (DTALC) describes how to define ASTs structurally.

I.e, rather than having

data Expr a
    = Val Int
    | Add a a -- "a" stands for a sub-expression (recursion-schemes style)

We can have

newtype Val a = Val Int

data Add a = Add a a

-- Expr is a structural sum of Val and Add
type Expr = Val :+: Add

This enables re-usability of the AST elements Val and Add in various ASTs, where the functionality is shared via type classes. Code using these type classes can work generically for different ASTs.

Like DTALC, syntax-tree has:

As an example of a reusable term let's look at the definition of App:

-- | A term for function applications.
data App expr k = App
    { _appFunc :: k # expr
    , _appArg :: k # expr
    }

Unlike a DTALC-based apply, which would be parameterized by a single type parameter (a :: Type), App is parameterized on two type parameters, (expr :: Knot -> Type) and (k :: Knot). expr represents the node type of App expr's child nodes and k is the tree's fix-point. This enables using App in mutually recursive ASTs where it may be parameterized by several different exprs.

Unlike the original DTALC paper which isn't suitable for mutually recursive ASTs, in syntax-tree one would have to declare an explicit expression type for each expression type for use as App's expr type parameter. Similarly, multirec's DTALC variant also requires explicitly declaring type indices.

While it is possible to declare ASTs as newtypes wrapping Sums and Products of existing terms and deriving all the instances via GeneralizedNewtypeDeriving, our usage and examples declare types in the straight forward way, with named data constructors, as we think results in more readable and performant code.

bound

bound is a library for expressing ASTs with type-safe De-Bruijn indices rather than parameter names, via an AST type constructor that is indexed on the variables in scope.

An intereseting aspect of bound's ASTs is that recursively they are made of an infinite amount of types.

When implementing syntax-tree we had the explicit goal of making sure that such ASTs are expressible with it, and for this reason the AST.Term.NamelessScope module implementing it is provided, and the test suite includes a language implementation based on it (LangA in the tests).

lens

syntax-tree strives to be maximally compatible with lens, and offers Traversals and Setters wherever possible. But unfortunately the RankNTypes nature of many combinators in syntax-tree makes them not composable with optics. For the special simpler cases when all child nodes have the same types the traverseK1 traversal and mappedK1 setter are available.