--- layout: post title: "The Advantage of Measures" date: 2014-02-11 author: Eric Seidel published: true comments: true external-url: categories: basic measures --- Yesterday someone asked on [Reddit][] how one might define GHC's [OrdList][] in a way that statically enforces its three key invariants. The accepted solution required rewriting `OrdList` as a `GADT` indexed by a proof of *emptiness* (which is essentially created by a run-time check), and used the new Closed Type Families extension in GHC 7.8 to define a type-level join of the Emptiness index. Today, let's see a somewhat more direct way of tackling this problem in LiquidHaskell, in which we need not change a single line of code (well.. maybe one), and need not perform any dynamic checks. The OrdList Type ---------------- The `OrdList` type is defined as follows: \begin{code} data OrdList a = None | One a | Many [a] -- Invariant: non-empty | Cons a (OrdList a) | Snoc (OrdList a) a | Two (OrdList a) -- Invariant: non-empty (OrdList a) -- Invariant: non-empty \end{code} As indicated by the comments the key invariants are that: * `Many` should take a *non-empty* list, * `Two` takes two non-empty `OrdList`s. What is a Non-Empty OrdList? ---------------------------- To proceed, we must tell LiquidHaskell what non-empty means. We do this with a [measure][] that describes the *number of elements* in a structure. When this number is strictly positive, the structure is non-empty. \begin{code} We've previously seen how to measure the size of a list. measure len :: [a] -> Int len ([]) = 0 len (x:xs) = 1 + (len xs) \end{code} We can use the same technique to measure the size of an `OrdList`. \begin{code} {-@ measure olen :: OrdList a -> Int olen (None) = 0 olen (One x) = 1 olen (Many xs) = (len xs) olen (Cons x xs) = 1 + (olen xs) olen (Snoc xs x) = 1 + (olen xs) olen (Two x y) = (olen x) + (olen y) @-} {-@ invariant {v:OrdList a | (olen v) >= 0} @-} \end{code} Now, we can use the measures to define aliases for **non-empty** lists and `OrdList`s. \begin{code} {-@ type ListNE a = {v:[a] | (len v) > 0} @-} {-@ type OrdListNE a = {v:OrdList a | (olen v) > 0} @-} \end{code} Capturing the Invariants In a Refined Type ------------------------------------------ Let's return to the original type, and refine it with the above non-empty variants to specify the invariants as \begin{code} part of the data declaration {-@ data OrdList [olen] a = None | One (x :: a) | Many (xs :: ListNE a) | Cons (x :: a) (xs :: OrdList a) | Snoc (xs :: OrdList a) (x :: a) | Two (x :: OrdListNE a) (y :: OrdListNE a) @-} \end{code} Notice immediately that LiquidHaskell can use the refined definition to warn us about malformed `OrdList` values. \begin{code} ok = Many [1,2,3] bad = Many [] badder = Two None ok \end{code} All of the above are accepted by GHC, but only the first one is actually a valid `OrdList`. Happily, LiquidHaskell will reject the latter two, as they violate the invariants. Basic Functions --------------- Now let's look at some of the functions! First, we'll define a handy alias for `OrdList`s of a given size: \begin{code} {-@ type OrdListN a N = {v:OrdList a | (olen v) = N} @-} \end{code} Now, the `nilOL` constructor returns an empty `OrdList`: \begin{code} {-@ nilOL :: OrdListN a {0} @-} nilOL = None \end{code} the `unitOL` constructor returns an `OrdList` with one element: \begin{code} {-@ unitOL :: a -> OrdListN a {1} @-} unitOL as = One as \end{code} and `snocOL` and `consOL` return outputs with precisely one more element: \begin{code} {-@ snocOL :: xs:OrdList a -> a -> OrdListN a {1 + (olen xs)} @-} snocOL as b = Snoc as b {-@ consOL :: a -> xs:OrdList a -> OrdListN a {1 + (olen xs)} @-} consOL a bs = Cons a bs \end{code} **Note:** The `OrdListN a {e}` syntax just lets us use LiquidHaskell expressions `e` as a parameter to the type alias `OrdListN`. Appending `OrdList`s -------------------- The above functions really aren't terribly interesting, however, since their types fall right out of the definition of `olen`. So how about something that takes a little thinking? \begin{code} {-@ appOL :: xs:OrdList a -> ys:OrdList a -> OrdListN a {(olen xs) + (olen ys)} @-} None `appOL` b = b a `appOL` None = a One a `appOL` b = Cons a b a `appOL` One b = Snoc a b a `appOL` b = Two a b \end{code} `appOL` takes two `OrdList`s and returns a list whose length is the **sum of** the two input lists. The most important thing to notice here is that we haven't had to insert any extra checks in `appOL`, unlike the [GADT][] solution. LiquidHaskell uses the definition of `olen` to infer that in the last case of `appOL`, `a` and `b` *must be non-empty*, so they are valid arguments to `Two`. We can prove other things about `OrdList`s as well, like the fact that converting an `OrdList` to a Haskell list preserves length \begin{code} {-@ toOL :: xs:[a] -> OrdListN a {(len xs)} @-} toOL [] = None toOL xs = Many xs \end{code} as does mapping over an `OrdList` \begin{code} {-@ mapOL :: (a -> b) -> xs:OrdList a -> OrdListN b {(olen xs)} @-} mapOL _ None = None mapOL f (One x) = One (f x) mapOL f (Cons x xs) = Cons (f x) (mapOL f xs) mapOL f (Snoc xs x) = Snoc (mapOL f xs) (f x) mapOL f (Two x y) = Two (mapOL f x) (mapOL f y) mapOL f (Many xs) = Many (map f xs) \end{code} as does converting a Haskell list to an `OrdList`. \begin{code} {-@ type ListN a N = {v:[a] | (len v) = N} @-} {-@ fromOL :: xs:OrdList a -> ListN a {(olen xs)} @-} fromOL a = go a [] where {-@ go :: xs:_ -> ys:_ -> {v:_ | (len v) = (olen xs) + (len ys)} @-} go None acc = acc go (One a) acc = a : acc go (Cons a b) acc = a : go b acc go (Snoc a b) acc = go a (b:acc) go (Two a b) acc = go a (go b acc) go (Many xs) acc = xs ++ acc \end{code} There's nothing super interesting to say about the `foldOL`s but I'll include them here for completeness' sake. \begin{code} foldrOL :: (a->b->b) -> b -> OrdList a -> b foldrOL _ z None = z foldrOL k z (One x) = k x z foldrOL k z (Cons x xs) = k x (foldrOL k z xs) foldrOL k z (Snoc xs x) = foldrOL k (k x z) xs foldrOL k z (Two b1 b2) = foldrOL k (foldrOL k z b2) b1 foldrOL k z (Many xs) = foldr k z xs foldlOL :: (b->a->b) -> b -> OrdList a -> b foldlOL _ z None = z foldlOL k z (One x) = k z x foldlOL k z (Cons x xs) = foldlOL k (k z x) xs foldlOL k z (Snoc xs x) = k (foldlOL k z xs) x foldlOL k z (Two b1 b2) = foldlOL k (foldlOL k z b1) b2 foldlOL k z (Many xs) = foldl k z xs \end{code} Concatenatation: Nested Measures -------------------------------- Now, the astute readers will have probably noticed that I'm missing one function, `concatOL`, which glues a list of `OrdList`s into a single long `OrdList`. With LiquidHaskell we can give `concatOL` a super precise type, which states that the size of the output list equals the *sum-of-the-sizes* of the input `OrdLists`. \begin{code} {-@ concatOL :: xs:[OrdList a] -> OrdListN a {(olens xs)} @-} concatOL [] = None concatOL (ol:ols) = ol `appOL` concatOL ols \end{code} The notion of *sum-of-the-sizes* of the input lists is specifed by the measure \begin{code} {-@ measure olens :: [OrdList a] -> Int olens ([]) = 0 olens (ol:ols) = (olen ol) + (olens ols) @-} {-@ invariant {v:[OrdList a] | (olens v) >= 0} @-} \end{code} LiquidHaskell is happy to verify the above signature, again without requiring any explict proofs. Conclusion ---------- The above illustrates the flexibility provided by LiquidHaskell *measures*. Instead of having to bake particular invariants into a datatype using indices or phantom types (as in the [GADT][] approach), we are able to split our properties out into independent *views* of the datatype, yielding an approach that is more modular as * we didn't have to go back and change the definition of `[]` to talk about `OrdList`s, * we didn't have to provide explict non-emptiness witnesses, * we obtained extra information about the behavior of API functions like `concatOL`. [GADT]: http://www.reddit.com/r/haskell/comments/1xiurm/how_to_define_append_for_ordlist_defined_as_gadt/cfbrinr [Reddit]: http://www.reddit.com/r/haskell/comments/1xiurm/how_to_define_append_for_ordlist_defined_as_gadt/ [OrdList]: http://www.haskell.org/platform/doc/2013.2.0.0/ghc-api/OrdList.html [measure]: http://goto.ucsd.edu/~rjhala/liquid/haskell/blog/blog/2013/01/31/safely-catching-a-list-by-its-tail.lhs/