muesli: A simple document-oriented database

[ database, library, mit ] [ Propose Tags ] [ Report a vulnerability ]

muesli is a simple MVCC document-oriented database featuring ACID transactions, automatic index management and minimal boilerplate.

Import the Database.Muesli.Types module to mark up your types for indexing, Database.Muesli.Query for writing and running queries, and Database.Muesli.Handle for database management. The rest of the modules are internal, but exposed just in case.

See the README.md file for an usage example.

[Skip to Readme]

Modules

[Index]

Database
- Muesli

Downloads

muesli-0.1.1.0.tar.gz [browse] (Cabal source package)
Package description (revised from the package)

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

Package maintainers

CalinArdelean

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.0, 0.1.0.1, 0.1.1.0
Change log	CHANGELOG.md
Dependencies	base (>=4.8 && <5), bytestring, cereal, containers, directory, filepath, hashable, mtl, psqueues, time [details]
License	MIT
Copyright	Copyright (c) 2015 Călin Ardelean
Author	Călin Ardelean
Maintainer	Călin Ardelean
Uploaded	by CalinArdelean at 2015-05-29T13:03:39Z
Revised	Revision 1 made by CalinArdelean at 2025-09-07T10:29:05Z
Category	Database
Home page	https://codeberg.org/nilac/muesli
Bug tracker	https://codeberg.org/nilac/muesli/issues
Source repo	head: git clone https://codeberg.org/nilac/muesli.git
Distributions
Reverse Dependencies	1 direct, 0 indirect [details]
Downloads	2241 total (6 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user Build status unknown [no reports yet]

Readme for muesli-0.1.1.0

[back to package description]

muesli

A simple document-oriented database engine for Haskell.

Use cases

backing store for p2p / cloud nodes, mobile apps, etc.
higher capacity replacement for acid-state (only indexes are held in memory).
no dependency substitute for SQLite.
ACIDic replacement for CouchDB and the like.

Features

ACID transactions implemented on the MVCC model.
automatic index management based on tags prepended to fields' types (see example below).
minimal boilerplate: instead of TemplateHaskell we use GHC.Generics and deriving.
monadic queries, with standard primitive operations like: lookup, insert, update, delete, filter, range, and filterRange.
range queries afford efficient cursor-like navigation (paging) through large datasets. For example this is the equivalent SQL for filterRange:

SELECT TOP page * FROM table
WHERE (filterFld = filterVal) AND
      (sortVal = NULL OR sortFld < sortVal) AND
      (sortKey = NULL OR ID < sortKey)
ORDER BY sortFld, ID DESC

easy to reason about performance: all primitive queries run in O(p*log(n)).
type safety: impossible to attempt deserializing a record at a wrong type (or address), and risk getting bogus data with no error thrown. References are tagged with a phantom type and created only by the database. There are also Num/Integral instances to support more generic apps, but normally those are not needed.
multiple backends supported: currently file, and soon (™️) in-memory, remote.
portability: it should work on all platforms, including mobile.
p2p replication: soon (™️)

Note: Some of these features become misfeatures for certain scenarios which would make either a pure in-memory cache, or a real database more appropriate.

In particular the query language is very basic at the current stage. Sure, you can use the customary Functor / Applicative / Monad interface, but you will have to write all kinds of wrapper queries to make things manageable.

The design principle is to only upgrade the query language in tandem with the indexes. Right now the indexes are not very smart, so the query language will not lie about it with some nice but poorly implemented abstraction.

Example use

First, mark up your types. You must use the record syntax to name the accessors so they'll be queryable. You can filter on Reference fields, sort and range on Sortables, and reverse lookup Uniques. The database will extract these keys using the Indexable and Document instances with the help of GHC.Generics, including from deep inside any Foldable.

{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric  #-}

import Database.Muesli.Types

data Person = Person
  { personName  :: Unique (Sortable String)
  , personEmail :: String
  } deriving (Show, Generic, Serialize)

instance Document Person

data Content  = Text String | HTML String | XHTML String
  deriving (Show, Generic, Serialize, Indexable)

data BlogPost = BlogPost
  { postURI          :: Unique String
  , postTitle        :: Sortable String
  , postAuthor       :: Maybe (Reference Person)
  , postContributors :: [Reference Person]
  , postTags         :: [Sortable String]
  , postContent      :: Content
  , publishedDate    :: Sortable DateTime
  } deriving (Show, Generic, Serialize)

instance Document BlogPost

Then, write some queries (updateUnique searches by unique key, and either inserts or updates depending on result):

{-# LANGUAGE OverloadedStrings #-}

import Database.Muesli.Query

updatePerson :: String -> String -> Transaction l m (Reference Person, Person)
updatePerson name email = do
  let name' = Sortable name
  let p = Person name' email
  pid <- updateUnique "personName" (Unique name') p
  return (pid, p)

postsByContrib :: Reference Person -> Transaction l m [(Reference BlogPost, BlogPost)]
postsByContrib pid = filter "postContributors" (Just pid) "postTitle"

flagContributor :: Reference Person -> Transaction l m ()
flagContributor pid = do
  is <- postsByContributor pid
  forM_ is $ \(bpid, bp) ->
    update bpid bp { postTags = postTags bp ++ Sortable "stolen" }

Then you can run these transactions with runQuery inside some MonadIO context. Note that Transaction itself is an instance of MonadIO, so you can do arbitrary IO inside. The l parameter specifies which storage backend you use. Currently only a portable binary file backend is implemented, used with Handle FileLogState.

import Database.Muesli.Query
import Database.Muesli.Handle

flagIt :: (MonadIO m, LogState l) => Handle l -> String -> String ->
           m (Either TransactionAbort ())
flagIt h name email = runQuery h $ do
  (pid, _) <- updatePerson name email
  flagContributor pid

main :: IO ()
main = bracket
  (putStrLn "opening DB..." >>
   open (Just "blog.log") (Just "blog.dat") Nothing Nothing)
  (\(h :: Handle FileLogState) -> putStrLn "closing DB..." >> close h)
  (\h -> flagIt h "Bender Bending Rodríguez" "bender@ilovebender.com")

TODO

Implementation

2 files, one for transactions/indexes, and another for serialized data
same file format for transactions and indexes, loading indexes is the same as replaying transactions
transaction file only contains int keys extracted from tagged fields
processing a record (updating indexes) while loading the log is O(log n)
previous 2 points make the initial loading much faster and using significantly less memory then acid-state, which serializes entire records, including potentially very large string fields, typical in "document-oriented" scenarios. It was suggested that in such cases you should store this data in external files. But then, if you want to regain the ACID property, and already have some indexes laying around, you are well on your way of creating muesli.
data file only contains serialized records and gaps, no metadata
LRU cache holds deserialized objects wrapped in Data.Dynamic. On SSDs deserialization is far more costly than file IO, so having our own cache is a better solution than just memory mapping the file.
♻️ GC creates asynchronously new copies of both files, doing cleanup and compaction, and only locks the world at the end
🔒 all locks are held for at most O(log n) time
Reference, Unique and Sortable are newtypes that have a set of general instances for Indexable and Document which are used by a generic function
transactions defer updates by collecting IDs and serialized data, which are checked (under lock) for consistency at the end

Change log

Available here.

License

MIT license. See the license file for details.