flat: Principled and efficient bit-oriented binary serialization.

[ bsd3, data, library, parsing, serialization ] [ Propose Tags ]
Versions [RSS] 0.2, 0.2.2, 0.3, 0.3.2, 0.3.4, 0.4, 0.4.2, 0.4.4, 0.5, 0.5.2, 0.6
Change log CHANGELOG
Dependencies array (>=0.5.1.0), base (>=4.8 && <5), bytestring (>=0.10.6), containers, deepseq (>=1.4), dlist (>=0.6), ghc-prim, mono-traversable (>=0.10.0.2), pretty (>=1.1.2), primitive, semigroups, text, vector [details]
License BSD-3-Clause
Copyright Copyright: (c) 2016-2018 Pasqualino `Titto` Assini
Author Pasqualino `Titto` Assini
Maintainer tittoassini@gmail.com
Category Data, Parsing, Serialization
Home page http://quid2.org
Source repo head: git clone https://github.com/Quid2/flat
Uploaded by PasqualinoAssini at 2018-11-06T09:38:41Z
Distributions LTSHaskell:0.6, NixOS:0.6, Stackage:0.6
Reverse Dependencies 5 direct, 1 indirect [details]
Downloads 7303 total (149 in the last 30 days)
Rating 2.25 (votes: 2) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2018-11-06 [all 1 reports]

Readme for flat-0.3.4

[back to package description]

Build Status Hackage version

Haskell implementation of Flat, a principled, portable and efficient binary data format (specs).

How To Use It For Fun and Profit

To (de)serialise a data type, make it an instance of the Flat class.

There is Generics based support to automatically derive instances of additional types.

Let's see some code, we need a couple of extensions:

{-# LANGUAGE DeriveGeneric, DeriveAnyClass #-}

Import the Flat library:

import Data.Flat

Define a couple of custom data types, deriving Generic and Flat:

data Direction = North | South | Center | East | West deriving (Show,Generic,Flat)
data List a = Nil | Cons a (List a) deriving (Show,Generic,Flat)

For encoding, use flat, for decoding, use unflat:

unflat . flat $ Cons North (Cons South Nil) :: Decoded (List Direction)
-> Right (Cons North (Cons South Nil))

For the decoding to work correctly, you will naturally need to know the type of the serialised data. This is ok for applications that do not require long-term storage and that do not need to communicate across independently evolving agents. For those who do, you will need to supplement flat with something like zm.

Define Instances for Abstract/Primitive types

A set of primitives are available to define Flat instances for abstract or primitive types.

Instances for some common, primitive or abstract data types (Bool,Words,Int,String,Text,ByteStrings,Tuples, Lists, Sequences, Maps ..) are already defined in Data.Flat.Instances.

Optimal Bit-Encoding

A pecularity of Flat is that it uses an optimal bit-encoding rather than the usual byte-oriented one.

To see this, let's define a pretty printing function: bits encodes a value as a sequence of bits, prettyShow displays it nicely:

p :: Flat a => a -> String
p = prettyShow . bits

Now some encodings:

p West
-> "111"
p (Nil::List Direction)
-> "0"
aList = Cons North (Cons South (Cons Center (Cons East (Cons West Nil))))
p aList
-> "10010111 01110111 10"

As you can see, aList fits in less than 3 bytes rather than 11 as would be the case with other Haskell byte oriented serialisation packages like binary or store.

For the serialisation to work with byte-oriented devices or storage, we need to add some padding:

f :: Flat a => a -> String
f = prettyShow . paddedBits
f West
-> "11100001"
f (Nil::List Direction)
-> "00000001"
f $ Cons North (Cons South (Cons Center (Cons East (Cons West Nil))))
-> "10010111 01110111 10000001"

The padding is a sequence of 0s terminated by a 1 running till the next byte boundary (if we are already at a byte boundary it will add an additional byte of value 1, that's unfortunate but there is a good reason for this, check the specs).

Byte-padding is automatically added by the function flat and removed by unflat.

Performance

For some hard data, see this comparison of the major haskell serialisation libraries.

Briefly:

  • Size: flat produces significantly smaller binaries than all other libraries (3/4 times usually)
  • Encoding: store and flat are usually faster
  • Decoding: store, flat and cereal are usually faster
  • Transfer time (serialisation time + transport time on the network + deserialisation at the receiving end): flat is usually faster for all but the highest network speeds

Compatibility

GHC

Tested with:

  • ghc 7.10.3, 8.0.2, 8.2.2, 8.4.4 and 8.6.1 (x64)

Should also work with (not recently tested):

  • ghc 7.10.3/LLVM 3.5.2 (Arm7)

GHCJS

Passes all tests in the flat testsuite, except for those relative to short bytestrings (Data.ByteString.Short) that are unsupported by ghcjs.

Check stack-ghcjs.yaml to see with what versions of ghcjs it has been tested.

If you use a different version of ghcjs, you might want to run the test suite by setting your compiler in stack-ghcjs.yaml and then running:

stack test --stack-yaml=stack-ghcjs.yaml

NOTE: Versions prior to 0.33 encode Double values incorrectly when they are not aligned with a byte boundary.

NOTE: A native TypeScript/JavaScript version of flat is under development.

ETA

It builds (with etlas 1.5.0.0 and eta eta-0.8.6b2 under macOS Sierra) and seems to be working, though the full test suite could not be run due to Eta's issues compiling some of the test suite dependencies.

Installation

Get the latest stable version from hackage.

Known Bugs and Infelicities

Longish compilation times

'flat` relies more than other serialisation libraries on extensive inlining for its good performance, this unfortunately leads to longer compilation times.

If you have many data types or very large ones this might become an issue.

A couple of good practices that will eliminate or mitigate this problem are:

  • During development, turn optimisations off (stack --fast or -O0 in the cabal file).

  • Keep your serialisation code in a separate module(s).

Data types with more than 512 constructors are currently unsupported

See also the full list of open issues.

Acknowledgements

flat reuses ideas and readapts code from various packages, mainly: store, binary-bits and binary and includes contributions from Justus Sagemüller.