--- layout: post title: CSV Tables date: 2013-10-10 16:12 comments: true tags: - measures author: Ranjit Jhala published: true demo: Csv.hs --- Most demonstrations for verification techniques involve programs with complicated invariants and properties. However, these methods can often be rather useful for describing simple but important aspects of APIs or programs with more humble goals. I saw a rather [nice example][shapeless-csv] of using Scala's `Shapeless` library for preventing off-by-one errors in CSV processing code. Here's the same, short, example rephrased with LiquidHaskell.
23: module CSV where
24: 
25: -- | Using LiquidHaskell for CSV lists
26: -- c.f. http://www.reddit.com/r/scala/comments/1nhzi2/using_shapelesss_sized_type_to_eliminate_real/
The Type -------- Suppose you wanted to represent *tables* as a list of comma-separated values. For example, here's a table listing the articles and prices at the coffee shop I'm sitting in right now:
Item Price
Espresso 2.25
Macchiato 2.75
Cappucino 3.35
Americano 2.25
You might represent this with a simple Haskell data type:
64: 
65: data CSV = Csv { (CSV.CSV) -> [[(GHC.Types.Char)]]headers :: [String]
66:                , (CSV.CSV) -> [[[(GHC.Types.Char)]]]rows    :: [[String]]
67:                }
and now, the above table is just:
73: (CSV.CSV)zumbarMenu = x1:[[(GHC.Types.Char)]]
-> [{VV : [[(GHC.Types.Char)]] | ((len VV) == (len x1))}]
-> (CSV.CSV)Csv {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[  {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Item"     , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Price"]
74:                  [ {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Espresso" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"2.25" ]  
75:                  , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Macchiato", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"2.75" ]
76:                  , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Cappucino", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"3.35" ]
77:                  , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Americano", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"2.25" ]
78:                  ]
The Errors ---------- Our `CSV` type supports tables with an arbitrary number of `headers` and `rows` but of course, we'd like to ensure that each `row` has data for *each* header, that is, we don't end up with tables like this one
89: -- Eeks, we missed the header name!
90: 
91: (CSV.CSV)csvBad1 = x1:[[(GHC.Types.Char)]]
-> [{VV : [[(GHC.Types.Char)]] | ((len VV) == (len x1))}]
-> (CSV.CSV)Csv {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}[  {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Date" {- ??? -} ] 
92:               [ {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) > 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Mon", {VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"1"]
93:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) > 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Tue", {VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"2"]
94:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) > 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Wed", {VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"3"] 
95:               ]
96: 
or this one,
102: -- Blergh! we missed a column.
103: 
104: (CSV.CSV)csvBad2 = x1:[[(GHC.Types.Char)]]
-> [{VV : [[(GHC.Types.Char)]] | ((len VV) == (len x1))}]
-> (CSV.CSV)Csv {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[  {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Name" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Age"  ] 
105:               [ {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Alice", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"32"   ]
106:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Bob"  {- ??? -}]
107:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Cris" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"29"   ] 
108:               ]
Alas, both the above are valid inhabitants of the Haskell `CSV` type, and so, sneak past GHC. The Invariant ------------- Thus, we want to *refine* the `CSV` type to specify that the *number* of elements in each row is *exactly* the same as the *number* of headers. To do so, we merely write a refined data type definition:
123: {-@ data CSV = Csv { headers :: [String]
124:                    , rows    :: [{v:[String] | (len v) = (len headers)}]
125:                    }
126:   @-}
Here `len` is a *measure* [denoting the length of a list][list-measure]. Thus, `(len headers)` is the number of headers in the table, and the refinement on the `rows` field states that *each* `row` is a list of `String`s, with exactly the same number of elements as the number of `headers`. We can now have our arbitrary-arity tables, but LiquidHaskell will make sure that we don't miss entries here or there.
138: -- All is well! 
139: 
140: (CSV.CSV)csvGood = x1:[[(GHC.Types.Char)]]
-> [{VV : [[(GHC.Types.Char)]] | ((len VV) == (len x1))}]
-> (CSV.CSV)Csv {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Id", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Name", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Days"]
141:               [ {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"1", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Jan", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"31"]
142:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"2", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Feb", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"28"]
143:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"3", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Mar", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"31"]
144:               , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"4", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Apr", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"30"] 
145:               ]
Bonus Points ------------ How would you modify the specification to prevent table with degenerate entries like this one?
155: (CSV.CSV)deg = x1:[[(GHC.Types.Char)]]
-> [{VV : [[(GHC.Types.Char)]] | ((len VV) == (len x1))}]
-> (CSV.CSV)Csv {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[  {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Id", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Name", {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Days"]
156:           [ {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"1" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Jan" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"31"]
157:           , {VV : [{VV : [(GHC.Types.Char)] | ((len VV) >= 0)}]<\_ VV -> ((len VV) >= 0)> | (((null VV)) <=> false) && ((len VV) >= 0)}[{VV : [(GHC.Types.Char)]<\_ VV -> false> | (((null VV)) <=> false) && ((len VV) >= 0)}"2" , {VV : [(GHC.Types.Char)] | ((len VV) >= 0)}"Feb" , {VV : [{VV : (GHC.Types.Char) | false}]<\_ VV -> false> | (((null VV)) <=> true) && ((len VV) == 0) && ((len VV) >= 0)}""]
158:           ]
[shapeless-csv]: http://www.reddit.com/r/scala/comments/1nhzi2/using_shapelesss_sized_type_to_eliminate_real/ [list-measure]: /blog/2013/01/31/safely-catching-a-list-by-its-tail.lhs/