perdure: Robust persistence for acyclic immutable data

[ database, library, program ] [ Propose Tags ]

Perdure(TM), a Cognimeta product, aims to provide a simple and robust persistence mechanism for acyclic immutable data with an easily comprehensible cost-model. It persists to raw block devices.

For some classes of applications, it can replace the use of traditional DMBS and keep the data modelled in a manner that is natural for purely functional languages. It is not quite orthogonal persistence for Haskell, which would automatically turn any non-persistent program into a persistent one, but it aims to minimize the scope of changes required to make an application persistent.

The persistence process is strict, it does not persist thunks or closures, but only the fully evaluated data, which must be acyclic. It also requires some changes to the data types and algorithms. References must be inserted at some strategic places within data structures. These cut the structure into separately loadable parts. These parts should hold at least a few hundred bytes to reduce the frequency of disk reads. The application has control over the representation so it can optimize it in terms of the anticipated access patterns. A SizeRef reference type is provided which separates automatically if its size threshold is reached, and inlines the data otherwise. As a convenience, a Map type is also provided which uses SizeRef on internal tree nodes. It can grow beyond the memory size and still remain efficient for lookups and traversals. Other persistent data structures with similar properties can be built on top of Perdure.

For the user application, dereferencing is a pure computation. This is similar to lazy loading, but the reference data types do not hold ordinary references to their referent, only their location. This allows the referent to be unloaded transparently. Also since the reference holds the location and size of the data on the storage medium, dereferencing is very simple and efficient. No index need to be accessed, we only need to check a cache in case the data is already loaded.

Persisters for data types are created using safe combinators. Serialization is done at the bit level, allowing for very compact representations.

Undetected corruption is virtually impossible even in the presence of drive failures. Each separately loadable section has a 128-bit digest, and that digest is stored in the reference(s). This approach alleviates the need for low-level storage replication such as RAID, and takes care of replication at the persistence layer level. This seems appropriate given the increasing need for geographically distributed replication.

Persistence occurs in discrete transactions. These run in the IO monad. Transactions may be requested by multiple threads but they may block until they get their turn. Within a transaction, multiple threads or sparks can be used to examine the current state and compute the next state to be persisted.

Reference counting is used to automatically reclaim unused storage.

Since persisted data is immutable, it is trivial for applications to keep some or all historical states by using a list-like type as their root persisted type. Those past states can be used for analysis or possibly to recover data lost because of an application-level error. The library includes a History data type which automatically preserves a greater number of recent states and fewer older states.

We support 32bit, 64bit, little-endian and big-endian architectures. We allow platforms to write data and generate digests in their native format for maximum speed, but they should be able to read the data written by other platforms with the necessary conversions. Each reference stores the endianness and word size of the referent representation so databases can be moved between platforms without any conversion, and they could be read concurrently by different platforms.

A general mechanism Database.Perdure.Rev is provided to upgrade the persisted types. Its goal is similar to that of the safecopy package. Here however, the Rev module simply exports a type that is similar to Either but whose persister leaves room in the representation to accommodate future versions. We use it to create growing lists of alternatives such as User1.User :> User0.User :> NoRev. This type should always be your root persisted type. You should also use it on the nested parts of your data whose type is more subject to change: this will avoid having to propagate all type changes up to the root.

This library is young and there remains limitations that will need to be addressed in future releases:

  • Memory references are not tracked. At this point it is assumed that the current persisted state is the only root for persistent GC purposes. So no reference must be kept to previous persisted states. This is not enforced by the API so care is needed. Otherwise dereferencing a dangling reference might fail (but will not return bad data thanks to the digest).

  • Care is needed by the application developers to ensure they do not change type persisters. The persisters themselves are not persisted so the library cannot verify that it is using the same persister as with previous executions. There are two problems with persisting persisters: they are not acyclic, and they refer to user specified bijections. We encourage you to treat types and persisters as immutable code. Put each persisted type in a distinct file, such as User0.hs. When a revision is needed create User1.hs.

  • The current allocation strategy is trivial. We must add a new node type in the free space representation that will allow us to search efficiently for a block of sufficient size. We currently scan all free intervals from the start. This relatively simple fix should be done shortly, but until it is done the library will not be appropriate for databases beyond ~100MB.

  • Reclamation of free space simply occurs after some fixed number of state writes.

  • Replication is local only at the moment. We write to all replicas, and read from the first correct one. We need to support remote storage which can become temporarily unavailable, and user controlled policies regarding those occurrences. Note that quorum voting is not needed for replica control of most (non-root) allocations because digests are contained in the references and we can detect when we are reading outdated data.

  • Only raw disks configured for write-through caching will guarantee atomicity. Any other setup introduces more layers, such as file systems, which could lie about when data has actually been written out to disk. Perhaps our view is too pessimistic, and with proper action by this library other storage types could be supported. Currently file based storage is only supported for testing purposes and should not be used for live databases.

  • A directory of bad sectors with alternative locations will need to be added. It will only need to be looked-up and updated for reads with incorrect digests.

  • Sharing is not realized unless it happens in separate steps. During the writing of some state, a reference creates an allocation. In subsequent states we can refer to that same reference in multiple locations and sharing will occur. If a reference is shared in memory before being first written out, it will be written out multiple times without sharing of the referent. In the future we may rely on System.Mem.StableName to improve on that.

  • Currently, references are local. We would like to be able to refer to the allocations of another thread or process. This will be a more involved addition so it will require distributed garbage collection such as weighted reference counting.

  • Documentation and examples are not sufficient and should be our first focus. For a minimal example, look at Database.Perdure.TestState.testStatesF in exe-src.

Given those limitations, Perdure is not applicable for very large scale projects at the moment. But it can be ideal for smaller projects where there is no point in burdening the developer with a distinct data model. It can also be used as a temporary solution before integrating to external databases.

Cognimeta's Iota Charts web application https://iotacharts.com, is based on Perdure and has been has been live for close to a year. Its database is relatively small at ~80MB currently, but we have been very pleased with the results.

Perdure has been developed by Cognimeta Inc. over the past two years. We are releasing this as open source under the permissive Apache license 2.0. The persistence mechanism is relatively simple and concise and its open source nature can provide the inquisitive user with added confidence about the security of its data. Also Cognimeta has been using Haskell exclusively, and has benefited from many excellent open source libraries. We are happy to contribute back, and are hoping for constructive and critical feedback from this very bright community.

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0, 0.1.1, 0.1.2, 0.2.0, 0.2.1
Dependencies array (>=0.4.0.0), base (>=4.5.0.0 && <5), binary (>=0.5.1.0), bytestring (>=0.9.2.1), cognimeta-utils (==0.1.2), collections-api (>=1.0.0.0), comonad-transformers (>=2.1.1.1), containers (>=0.4.2.1), cryptohash (==0.7.2), data-binary-ieee754 (>=0.4.2.1), data-lens (>=2.9.0), data-lens-fd (>=2.0.2), data-lens-template (>=2.1.5), filepath (>=1.3.0.0), ghc-prim (>=0.2.0.0), MonadRandom (>=0.1.8), mtl (>=2.1.2), perdure (>=0.2.1), primitive (>=0.4.1), QuickCheck (>=2.4.0.1), stm (>=2.3), strict (>=0.3.2), tagged (>=0.4.2.1), template-haskell (>=2.7.0.0), time (>=1.4.0.1), transformers (>=0.3.0.0), unix (>=2.5.1.0) [details]
License LicenseRef-OtherLicense
Copyright (c) 2010-2012 Cognimeta Inc.
Author Patrick Premont for Cognimeta Inc.
Maintainer Patrick Premont <ppremont@cognimeta.com>
Category Database
Home page https://github.com/Cognimeta/perdure
Source repo head: git clone git://github.com/Cognimeta/perdure.git
Uploaded by PatrickPremont at 2012-09-26T15:14:51Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Executables perdure
Downloads 3803 total (3 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]