Download

Here's full source (BSD3-licensed, Cabal sdist) for:

Generic Shape

It is sometimes useful to "project out" the shape or skeleton of a data structure. In abstract algebra, structure-preserving maps are called homomorphisms. Category theory is sometimes described of as the study of homomorphisms, which are there called morphisms.

Although I like theory, I'm not a theoretician so this post may annoy those with high expectations in terms of knowledge of type theory, CT etc. However, I'm pleased with the results, and learned a fair bit, and it may help someone out of a practical bind down the road, so here it is. Motivated by a particular need, to work with some very complicated types (GHC AST) in a quick and dirty way. A couple days' work would have sufficed, but to bring the library up to a shareable quality took a few more. Not only did I get my GHC API hacks; by a slight generalisation of the original idea, it looks like it'll solve the most potentially work-intensive plumbing stage of a big project. Delightful! Whether it will be sufficiently performant remains to be seen...

A couple days ago I started searching for information about how to implement such a generic homomorphism in Haskell, and even asked on the IRC, but didn't turn up anything despite some advice from experienced people. If I'd hit on the right keyword -- shape -- I'd have doubtlessly been directed to one of the excellent libraries in packages shapely-data and fixplate.

I think either of these would probably have sufficed for the task, but I don't presently have an inclination to confirm this. These packages are not needed to build this library or test. At some point soon I'd like to make a careful study of shapely-data and perhaps fixplate. I feel humbled when I look at the beautiful API for shapely-data, even though I don't know how to use it or what its capabilities are yet. By comparison, my implementation is much less grounded in theory, and is probably weaker in every way. If nothing else, this mini-project has revitalised my interest in the theory side.

This implementation:

Code and Examples

Homomorphism can preserve some value information, in addition to the structure, mapping to a k-ary tree of homogeneous type, such as Rose a


  data Rose a = R a [Rose a]

with the constant-valued ("parentheses langauge") representation obtained for a ~ ().

Achieving this for a homogeneous recursive source type is straightforward. However, we would like a generic solution, to handle arbitrary (heterogeneous) recursive types.

In Haskell, this is possible using a generics library such as SYB or Uniplate. I chose SYB as I had a little past experience with that. If you want to run the code, you'll need to install the SYB package.

This package supports a generic mapping which works over arbitrary recursive heterogeneous types. It differs from existing generic fmap (SYB's gmapT, or Oleg's gmap), in that it allows you to obtain a homogeneous result in the type of your choice, rather than adhering to the original types. Hence, this is not an fmap, it doesn't satisfy the laws. (It's a homomorphism.) It uses gfoldl, so fold-as-map. This theme is not new, but it's my take on a useful idea. I'm not trying to contribute theoretically, I just needed these tools and didn't find them. :)

What I came up with is summarised in the following code


  {-# LANGUAGE Rank2Types #-}

  import Data.Data ( Data )
  import Data.Data ( gmapQ )
  import Data.Generics.Aliases ( GenericQ )

  data Rose a = R a [Rose a] deriving Show
  type Homo a = Rose a
  type Shape = Homo ()

  -- (Note: This is now ghomK in the API.)
  ghom :: forall r d. Data d =>
             (r -> r -> r)
          -> GenericQ r
          -> d
          -> Homo r
  ghom k f x = foldl k' b (gmapQ (ghom k f) x)
   where
     b = R (f x) []
     k' (R r chs) nod@(R r' _) = R (r `k` r') (chs++[nod])

(If you use (nod:chs) the structure will be flipped horizontally.)

From this we can easily write the desired generic shapeOf


  shapeOf :: forall d. Data d => d -> Shape
  shapeOf = ghom (\_ _->()) (const ())

Without more ado, it's also possible to write a function which maps arbitrary data to its weighted tree representation

  weightedShapeOf :: forall d. Data d => d -> Homo Int
  weightedShapeOf = ghom (+) (const 1)

We can also use it to preserve choice values.

Let's run some tests.

[Apologies that the examples aren't very good, but they at least demonstrate usage.]


data TA = A1 | A2 TB TA TB
data TB = B TA
exprAB = A2 (B A1) A1 (B A1)
-- ((())()(()))

data TC = C1 Float (Int,Int) | C2 TD TC TD | C3 TC
data TD = D TC
exprCD = C2 (D (C1 1.1 (4,5))) (C3 (C1 2.2 (6,7))) (D (C1 3.3 (8,9)))
-- (((()(()())))((()(()())))((()(()()))))

data TE = E1 String | E2 (Int,Int) TF
data TF = F TE String
exprEF = E2 (2,5) (F (E1 "foo") "bar")
-- ((()(())())()())  -- with [Char] as a stop type, so String is treated as atomic

test_list = [[1,2],[3],[4,5,6::Int]]
-- ((()(()()))((()())((()(()(()())))())))

-----------------------------------------------

> showHomo $ shapeOf test_list
  
()
| ()
| | ()
| | ()
| | | ()
| | | ()
| ()
| | ()
| | | ()
| | | ()
| | ()
| | | ()
| | | | ()
| | | | ()
| | | | | ()
| | | | | ()
| | | | | | ()
| | | | | | ()
| | | ()

> showAsParens $ shapeOf test_list
  
((()(()()))((()())((()(()(()())))())))

> showAsParensBool $ ghom (mkQ False (odd::Int->Bool)) test_list
  
(.(.(*)(.(.)(.)))(.(.(*)(.))(.(.(.)(.(*)(.(.)(.))))(.))))

> showAsParensEnriched $ ghom (mkQ False (odd::Int->Bool)) test_list
  
(False(False(True)(False(False)(False)))(False(False(True)(False))(False(False(False)(False(True)(False(False)(False))))(False))))

> showHomo $ ghom (mkQ False (odd::Int->Bool)) test_list
  
False
| False
| | True
| | False
| | | False
| | | False
| False
| | False
| | | True
| | | False
| | False
| | | False
| | | | False
| | | | False
| | | | | True
| | | | | False
| | | | | | False
| | | | | | False
| | | False

> showHomo $ filterHomo id $ ghom (mkQ False (odd::Int->Bool)) test_list
  
False
| True
| True
| True

> showHetero $ ghomDyn test_list
  
[[1,2],[3],[4,5,6]]
  [1,2]
    1
    [2]
      2
      []
  [[3],[4,5,6]]
    [3]
      3
      []
    [[4,5,6]]
      [4,5,6]
        4
        [5,6]
          5
          [6]
            6
            []
      []

> showHetero $ filterHetero (/=(3::Int)) $ ghomDyn test_list
  
[[1,2],[3],[4,5,6]]
  1
  2
  4
  5
  6

> showBi $ heteroToBi False (odd::Int->Bool) $ ghomDyn test_list
  
([[1,2],[3],[4,5,6]], False)
  ([1,2], False)
    (1, True)
    ([2], False)
      (2, False)
      ([], False)
  ([[3],[4,5,6]], False)
    ([3], False)
      (3, True)
      ([], False)
    ([[4,5,6]], False)
      ([4,5,6], False)
        (4, False)
        ([5,6], False)
          (5, True)
          ([6], False)
            (6, False)
            ([], False)
      ([], False)

> showBi $ ghomBi (mkQ False (odd::Int->Bool)) test_list
  
([[1,2],[3],[4,5,6]], False)
  ([1,2], False)
    (1, True)
    ([2], False)
      (2, False)
      ([], False)
  ([[3],[4,5,6]], False)
    ([3], False)
      (3, True)
      ([], False)
    ([[4,5,6]], False)
      ([4,5,6], False)
        (4, False)
        ([5,6], False)
          (5, True)
          ([6], False)
            (6, False)
            ([], False)
      ([], False)

> showBi $ filterBi id $ ghomBi (mkQ False (odd::Int->Bool)) test_list
  
([[1,2],[3],[4,5,6]], False)
  (1, True)
  (3, True)
  (5, True)

> showHomo $ biToHomo $ filterBi id $ ghomBi (mkQ False (odd::Int->Bool)) test_list
  
False
| True
| True
| True

> let f (x::Int) = if odd x then Just x else Nothing

> showHomo $ ghom (mkQ Nothing f) test_list
  
Nothing
| Nothing
| | Just 1
| | Nothing
| | | Nothing
| | | Nothing
| Nothing
| | Nothing
| | | Just 3
| | | Nothing
| | Nothing
| | | Nothing
| | | | Nothing
| | | | Nothing
| | | | | Just 5
| | | | | Nothing
| | | | | | Nothing
| | | | | | Nothing
| | | Nothing

> showHomo $ filterHomoMM $ ghom (mkQ Nothing f) test_list
  
Nothing
| Just 1
| Nothing
| | Just 3
| | Just 5

> showHomo $ unliftHomoM 0 $ filterHomoMM $ ghom (mkQ Nothing f) test_list
  
0
| 1
| 0
| | 3
| | 5

> showAsParens $ shapeOf exprAB
  
((())()(()))

> showAsParens $ shapeOf exprCD
  
(((()(()())))((()(()())))((()(()()))))

> showAsParens $ shapeOf exprEF
  
((()())(((()(()(()()))))(()(()(()())))))

> show $ ( ( unGhomDyn $ ghomDyn exprEF ) :: TE )
  
E2 (2,5) (F (E1 "foo") "bar")

> showHomo $ ( gempty exprEF :: BiM Int)
  
(<<TE>>,Nothing)
| (<<(Int,Int)>>,Nothing)
| | (<<Int>>,Nothing)
| | (<<Int>>,Nothing)
| (<<TF>>,Nothing)
| | (<<TE>>,Nothing)
| | | (<<[Char]>>,Nothing)
| | | | (<<Char>>,Nothing)
| | | | (<<[Char]>>,Nothing)
| | | | | (<<Char>>,Nothing)
| | | | | (<<[Char]>>,Nothing)
| | | | | | (<<Char>>,Nothing)
| | | | | | (<<[Char]>>,Nothing)
| | (<<[Char]>>,Nothing)
| | | (<<Char>>,Nothing)
| | | (<<[Char]>>,Nothing)
| | | | (<<Char>>,Nothing)
| | | | (<<[Char]>>,Nothing)
| | | | | (<<Char>>,Nothing)
| | | | | (<<[Char]>>,Nothing)


Progressive refinement and accumulation:

> (showHomo $
     ( grefine
         (\ x -> case x of { E2 (y,z) _ -> Just (z+3)
                           ; _ -> Nothing })
         ( gempty exprEF :: BiM Int)
     )
  )

(<<TE>>,Just 8)
| (<<(Int,Int)>>,Nothing)
| | (<<Int>>,Nothing)
| | (<<Int>>,Nothing)
| (<<TF>>,Nothing)
| | (<<TE>>,Nothing)
| | | (<<[Char]>>,Nothing)
| | | | (<<Char>>,Nothing)
| | | | (<<[Char]>>,Nothing)
| | | | | (<<Char>>,Nothing)
| | | | | (<<[Char]>>,Nothing)
| | | | | | (<<Char>>,Nothing)
| | | | | | (<<[Char]>>,Nothing)
| | (<<[Char]>>,Nothing)
| | | (<<Char>>,Nothing)
| | | (<<[Char]>>,Nothing)
| | | | (<<Char>>,Nothing)
| | | | (<<[Char]>>,Nothing)
| | | | | (<<Char>>,Nothing)
| | | | | (<<[Char]>>,Nothing)

> (showHomo $
    ( gaccum
        ((\r1 r2 -> r1+r2) :: Int -> Int -> Int)
        (\ x -> case x of { E1 s -> Just (length s)
                          ; _ -> Nothing })
        ( grefine
            (\ x -> case x of { E2 (y,z) _ -> Just (z+3)
                              ; _ -> Nothing })
            ( gempty exprEF :: BiM Int)
        )
    )
  )

(<<TE>>,Just 8)
| (<<(Int,Int)>>,Nothing)
| | (<<Int>>,Nothing)
| | (<<Int>>,Nothing)
| (<<TF>>,Nothing)
| | (<<TE>>,Just 3)
| | | (<<[Char]>>,Nothing)
| | | | (<<Char>>,Nothing)
| | | | (<<[Char]>>,Nothing)
| | | | | (<<Char>>,Nothing)
| | | | | (<<[Char]>>,Nothing)
| | | | | | (<<Char>>,Nothing)
| | | | | | (<<[Char]>>,Nothing)
| | (<<[Char]>>,Nothing)
| | | (<<Char>>,Nothing)
| | | (<<[Char]>>,Nothing)
| | | | (<<Char>>,Nothing)
| | | | (<<[Char]>>,Nothing)
| | | | | (<<Char>>,Nothing)
| | | | | (<<[Char]>>,Nothing)


Testing that a Dynamic node can recover nodes elided below it:

Testing a chain of types:

> let (f::TH->Bool) x = case x of { H _ -> False ; _ -> True }

> show exprGHI

G (H I)

> showBi $ ghomBi (mkQ True f) exprGHI

(<<TG>>, True)
  (<<TH>>, False)
    (<<TI>>, True)

> showBi $ filterBi id $ ghomBi (mkQ True f) exprGHI

(<<TG>>, True)
  (<<TI>>, True)

> ( show $ ( ( unGhomBi $ filterBi id $ ghomBi (mkQ True f) exprGHI ) :: TG ) )

G (H I)

Testing a chain of constructors:

> show exprJ

J1 (J2 (J3 J))

> let (f::TJ->Bool) x = case x of { J1 _ -> False ; J3 _ -> False; _ -> True }

> showBi $ ghomBi (mkQ True f) exprJ

(<<TJ>>, False)
  (<<TJ>>, True)
    (<<TJ>>, False)
      (<<TJ>>, True)

> showBi $ filterBi id $ ghomBi (mkQ True f) exprJ

(<<TJ>>, False)
  (<<TJ>>, True)
    (<<TJ>>, True)

> ( show $ ( ( unGhomBi $ filterBi id $ ghomBi (mkQ True f) exprJ ) :: TJ ) )

J1 (J2 (J3 J))

Testing a mixture of types and constructors:

> show exprKLM

K2 (M1 (L2 L1) (L2 (L3 (M2 K3))))

> let (f::TL->Bool) x = case x of { L2 _ -> False ; L3 _ -> False; _ -> True }

> showBi $ ghomBi (mkQ True f) exprKLM

(<<TK>>, True)
  (<<TM>>, True)
    (<<TL>>, False)
      (<<TL>>, True)
    (<<TL>>, False)
      (<<TL>>, False)
        (<<TM>>, True)
          (<<TK>>, True)

> showBi $ filterBi id $ ghomBi (mkQ True f) exprKLM

(<<TK>>, True)
  (<<TM>>, True)
    (<<TL>>, True)
    (<<TM>>, True)
      (<<TK>>, True)

> ( show $ ( ( unGhomBi $ filterBi id $ ghomBi (mkQ True f) exprKLM ) :: TK ) )

K2 (M1 (L2 L1) (L2 (L3 (M2 K3))))


Testing filterHomoM and filterBiM:

> show test_list

[[1,2],[3],[4,5,6]]

> showHomo $ filterHomoM odd $ ghom (mkQ 0 (id::Int->Int)) test_list

Nothing
| Just 1
| Nothing
| | Just 3
| | Just 5

> showBi $ filterBiM odd $ ghomBi (mkQ 0 (id::Int->Int)) test_list

([[1,2],[3],[4,5,6]], Nothing)
  (1, Just 1)
  ([[3],[4,5,6]], Nothing)
    (3, Just 3)
    (5, Just 5)


Testing abstract datatype:

> show exprN

fromList [("",1.1),("pdsfhp",3.3),("sfv",2.2)]

> show $ Map.toList exprN

[("",1.1),("pdsfhp",3.3),("sfv",2.2)]

> showHomo $ shapeOf exprN

()
| ()
| | ()
| | | ()
| | | ()
| | ()
| | | ()
| | | | ()
| | | | | ()
| | | | | ()
| | | | | | ()
| | | | | | ()
| | | | | | | ()
| | | | | | | ()
| | | | | | | | ()
| | | | | | | | ()
| | | | | | | | | ()
| | | | | | | | | ()
| | | | | | | | | | ()
| | | | | | | | | | ()
| | | | ()
| | | ()
| | | | ()
| | | | | ()
| | | | | | ()
| | | | | | ()
| | | | | | | ()
| | | | | | | ()
| | | | | | | | ()
| | | | | | | | ()
| | | | | ()
| | | | ()

> showAsParensEnriched $ shapeOf exprN

(()(()(()(())(()))(()(()(()(())(()(())(()(())(()(())(()(())(()(())(())))))))(()))(()(()(()(())(()(())(()(())(()))))(()))(())))))

> showHomo $ ghom (mkQ 0.0 (\ (x::Float) -> x)) exprN

0.0
| 0.0
| | 0.0
| | | 0.0
| | | 1.1
| | 0.0
| | | 0.0
| | | | 0.0
| | | | | 0.0
| | | | | 0.0
| | | | | | 0.0
| | | | | | 0.0
| | | | | | | 0.0
| | | | | | | 0.0
| | | | | | | | 0.0
| | | | | | | | 0.0
| | | | | | | | | 0.0
| | | | | | | | | 0.0
| | | | | | | | | | 0.0
| | | | | | | | | | 0.0
| | | | 3.3
| | | 0.0
| | | | 0.0
| | | | | 0.0
| | | | | | 0.0
| | | | | | 0.0
| | | | | | | 0.0
| | | | | | | 0.0
| | | | | | | | 0.0
| | | | | | | | 0.0
| | | | | 2.2
| | | | 0.0

> showHomo $ filterHomo (>0.5) $ ghom (mkQ 0.0 (\ (x::Float) -> x)) exprN

0.0
| 1.1
| 3.3
| 2.2

> showHomo $ filterHomoM (>0.5) $ ghom (mkQ 0.0 (\ (x::Float) -> x)) exprN

Nothing
| Nothing
| | Just 1.1
| | Nothing
| | | Just 3.3
| | | Just 2.2


I have also tested this on GHC parse trees (the motivating problem), and it works fine, but there is a catch. These results are reported here.

So, for what it's worth, I was able to implement the generic homomorphisms. Probably this is already supported by some library, but I was unlucky in my search.

Future work will include:

Andrew Seniuk
June 12, 2014
rasfar@gmail.com