serdoc
======

Unified serialization with semi-automatic documentation

Introduction
------------
SerDoc provides:

- A unified interface for serialization formats ("codecs"), in the form of a
  'Serializable' typeclass.
- A mini-EDSL (`FieldInfo`) for describing serialization formats as first-class
  data structures, and a typeclass (`HasInfo`) to link them to codecs and
  serializable Haskell types.
- Building blocks and utility code for implementing `Codec`, `Serializable`,
  and `HasInfo` for existing or new serialization formats.

It also includes an implementation of these typeclasses for [the `binary`
package](https://hackage.haskell.org/package/binary).

Components
----------
SerDoc is split up into two sub-projects:

- `serdoc-core (this library)`, which provides the typeclasses and building blocks
- `serdoc-binary`, which provides instances for `binary`

How To Use
----------

### Serializing and deserializing values

For encoding, it's straightforward - use `encode`.

For decoding, you have a few options, depending on the codec you use. The most
general form is `decodeM`; apart from that, a family of similar functions is
provided, following the conventions:

- Monadic decoders have an `M` suffix; pure decoders (where the decoding Monad
  is `Except err` or `Identity`) have no `M` suffix (e.g., `decode` vs.
  `decodeM`).
- Flavors that ignore any remaining unconsumed input have a `_` suffix (e.g.
  `decode_`).
- Flavors that convert decoding errors to `Either`s have an `Either` suffix
  (e.g. `decodeMEither`); these require that the decoding monad is `Except err`
  or `ExceptT err m`.

Keep in mind that depending on how the codec works, the serialized data may be
returned / consumed via the `Encoded` type, or passed by (mutable) reference
through the `Context` object. The API purposefully supports both ways, because
a given codec may only support one or the other.

### Implementing `HasInfo` and `Serializable` for an existing codec

- For **newtype** wrappers that use the same serialization format as their
  wrapped payloads, the easiest way is to use `GeneralizedNewtypeDeriving` to
  derive instances for both typeclasses.
- For **newtype** wrappers that should implement a *different* serialization
  format, you may need to hand-write instances; if you do this, take special
  care to ensure that the `HasInfo` instance matches the actual serialization.
- **Lists** and some other data structures are supported out of the box and
  require no explicit instance; they are serialized using a 32-bit list length
  followed by the serialized list elements, in order. If you need a different
  representation, then newtype-wrapping may be necessary.
- For **enumeration types**, a generic wrapper, `ViaEnum`, is provided, which
  you can use in combination with the `DerivingVia` extension; alternatively,
  you can use `enumInfo`, `encodeEnum`, and `decodeEnum` to write the instances
  yourself.
- **`String`**, due to being just a type alias for `[Char]`, will only
  serialize iff an instance for `Char` exists. However, it is usually preferred
  to convert your strings to `Text`.
- For **record types**, consider using `deriveSerDoc` (found in
  `Data.SerDoc.TH`). This Template Haskell function will generate matching
  instances for both typeclasses, following the convention of serializing all
  fields in the order they appear in the type declaration, and labelled by
  their Haskell field names. Obviously this will only work if instances for
  each of the record fields exist.

### Adding your own codecs

A codec is indicated using a phantom type; no values of that type ever need to
exist at runtime, we merely use it to identify the codec we want, so we can
define it as a constructorless `data` type (like `Void`), e.g.:

    data MyFantasticCodec

We then need a `Codec` instance, which is where we define:

- A type for a *context* passed to each invocation of `encode` and `decodeM`;
  this can be anything you want, depending on the needs of your codec. If the
  codec does not require any context, use `()`.
- A monadic data type used for encoding, `MonadEncode`. For pure codecs, this
  can be `Identity`; if you serialize directly to something like a file handle
  or network socket, it will typically have to be `IO`, or some `MonadIO`.
- A monadic data type used for decoding, `MonadDecode`. Since decoding can
  fail, this will typically involve not just the required effects for the
  decoding process itself, but also some form of error handling. It is
  recommended to use `Except err` for pure codecs, and `ExceptT err IO` (or `MonadIO m
  => ExceptT err m`) for codecs that require IO, where `err` is an appropriate
  error data structure for your codec.
- The default encoding to use for enum types (optional, defaults to `Word16`).

Providing instances for a reasonable set of primitive values and data
structures is highly recommended; a minimum viable set might be:

- `()`
- `Bool`
- `Int`, `Int8`, `Int16`, `Int32`, `Int64`
- `Word`, `Word8`, `Word16`, `Word32`, `Word64`
- Whatever type you picked for the default enum encoding
- `[a]`
- `Maybe a`
- `Either a b`
- Tuples up to 7 elements
- `ByteString`