arith-encode-1.0.2: A practical arithmetic encoding (aka Godel numbering) library.

Safe HaskellNone
LanguageHaskell2010

Data.ArithEncode

Description

ArithEncode is a library that provides tools for defining arithmetic encodings for arbitrary datatypes. The library is designed so that multiple encoding schemes can be defined for a given datatype, and a given encoding need not encode all possible instances of the datatype.

An Encoding is an object which is passed as the first argument to seven different functions. The primary function of an Encoding is manifest in the encode and decode functions, which define an isomorphism between the datatype and the natural numbers (or a bounded set thereof), represented using Integers. The encode and decode functions have the following properties:

  • decode enc (encode enc v) == v for all values v in the domain
  • encode enc v == encode enc w only if w == v
  • decode enc n == decode enc m only if n == m

The inDomain function indicates whether or not a given value is in the domain of the encoding. Passing a value v where inDomain enc v == False into any other function may result in an IllegalArgument exception. (For performance reasons, encodings are not strictly required to throw IllegalArgument, but the result should not be considered valid if they do not throw the exception).

This library provides a large collection of combinators for constructing more complex Encodings out of simpler ones. The provided combinators should be appropriate for constructing Encodings for most datatypes.

As an example, the following definition creates an Encoding for the Tree Integer type:

tree :: Encoding (Tree Integer)
tree =
  let
    ...
    nodeEncoding nodeenc =
      wrap unmakeNode makeNode (pair interval (seq nodeenc))
  in
    recursive nodeEncoding

In this example, the makeNode and unmakeNode functios are simply "glue"; their definitions are

makeNode (label, children) =
  Node { rootLabel = label, subForest = children }

unmakeNode Node { rootLabel = label, subForest = children } =
  Just (label, children)

The resulting Encoding maps any Tree Integer to a unique Integer value.

Encodings have a number of practical uses. First, all Encodings in this library satisfy a completeness property, which guarantees that they map each value to a finite natural number (or in the case of constructions on Encodings, they preserve completeness). Hence, they can be used as an enumeration procedure for their underlying datatype.

Second, as Encodings define an isomorphism to the natural numbers, they provide an efficient binary encode/decode procedure in theory. In practice, the techniques used to guarantee the completeness property may result in long encodings of some types (particularly sequences). Also, knowledge of the distribution of the domain is necessary in order to achieve the most succinct possible encoding.

Documentation