streamly-lmdb-0.7.0: Stream data to or from LMDB databases using the streamly library.
Safe HaskellSafe-Inferred
LanguageHaskell2010

Streamly.External.LMDB

Description

Acknowledgments

The functionality for the limits and getting the environment and database, in particular the idea of specifying the read-only or read-write mode at the type level, was mostly obtained from the lmdb-simple library.

Synopsis

Environment

With LMDB, one first creates a so-called “environment,” which one can think of as a file or folder on disk.

data Environment mode Source #

openEnvironment :: Mode mode => FilePath -> Limits -> IO (Environment mode) Source #

Open an LMDB environment in either ReadWrite or ReadOnly mode. The FilePath argument may be either a directory or a regular file, but it must already exist. If a regular file, an additional file with "-lock" appended to the name is used for the reader lock table.

Note that an environment must have been opened in ReadWrite mode at least once before it can be opened in ReadOnly mode.

An environment opened in ReadOnly mode may still modify the reader lock table (except when the filesystem is read-only, in which case no locks are used).

closeEnvironment :: Mode mode => Environment mode -> IO () Source #

Closes the given environment.

If you have merely a few dozen environments at most, there should be no need for this. (It is a common practice with LMDB to create one’s environments once and reuse them for the remainder of the program’s execution.) If you find yourself needing this, it is your responsibility to heed the documented caveats.

In particular, you will probably, before calling this function, want to (a) use closeDatabase, and (b) pass in precreated transactions and cursors to readLMDB and unsafeReadLMDB to make sure there are no transactions or cursors still left to be cleaned up by the garbage collector. (As an alternative to (b), one could try manually triggering the garbage collector.)

Mode

class Mode a Source #

Minimal complete definition

isReadOnlyMode

Instances

Instances details
Mode ReadOnly Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

Mode ReadWrite Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

data ReadWrite Source #

Instances

Instances details
Mode ReadWrite Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

data ReadOnly Source #

Instances

Instances details
Mode ReadOnly Source # 
Instance details

Defined in Streamly.External.LMDB.Internal

Limits

data Limits Source #

LMDB environments have various limits on the size and number of databases and concurrent readers.

Constructors

Limits 

Fields

  • mapSize :: !Int

    Memory map size, in bytes (also the maximum size of all databases).

  • maxDatabases :: !Int

    Maximum number of named databases.

  • maxReaders :: !Int

    Maximum number of concurrent ReadOnly transactions (also the number of slots in the lock table).

defaultLimits :: Limits Source #

The default limits are 1 MiB map size, 0 named databases, and 126 concurrent readers. These can be adjusted freely, and in particular the mapSize may be set very large (limited only by available address space). However, LMDB is not optimized for a large number of named databases so maxDatabases should be kept to a minimum.

The default mapSize is intentionally small, and should be changed to something appropriate for your application. It ought to be a multiple of the OS page size, and should be chosen as large as possible to accommodate future growth of the database(s). Once set for an environment, this limit cannot be reduced to a value smaller than the space already consumed by the environment, however it can later be increased.

If you are going to use any named databases then you will need to change maxDatabases to the number of named databases you plan to use. However, you do not need to change this field if you are only going to use the single main (unnamed) database.

gibibyte :: Int Source #

A convenience constant for obtaining a 1 GiB map size.

tebibyte :: Int Source #

A convenience constant for obtaining a 1 TiB map size.

Database

After creating an environment, one creates within it one or more databases.

data Database mode Source #

getDatabase :: Mode mode => Environment mode -> Maybe String -> IO (Database mode) Source #

Gets a database with the given name. When creating a database (i.e., getting it for the first time), one must do so in ReadWrite mode.

If only one database is desired within the environment, the name can be Nothing (known as the “unnamed database”).

If one or more named databases (a database with a Just name) are desired, the maxDatabases of the environment’s limits should have been adjusted accordingly. The unnamed database will in this case contain the names of the named databases as keys, which one is allowed to read but not write.

clearDatabase :: Mode mode => Database mode -> IO () Source #

Clears, i.e., removes all key-value pairs from, the given database.

closeDatabase :: Mode mode => Database mode -> IO () Source #

Closes the given database.

If you have merely a few dozen databases at most, there should be no need for this. (It is a common practice with LMDB to create one’s databases once and reuse them for the remainder of the program’s execution.) If you find yourself needing this, it is your responsibility to heed the documented caveats.

Reading

readLMDB :: (MonadIO m, Mode mode) => Database mode -> Maybe (ReadOnlyTxn, Cursor) -> ReadOptions -> Unfold m Void (ByteString, ByteString) Source #

Creates an unfold with which we can stream key-value pairs from the given database.

If an existing read-only transaction and cursor are not provided, a read-only transaction and cursor are automatically created and kept open for the duration of the unfold; we suggest doing this as a first option. However, if you find this to be a bottleneck (e.g., if you find upon profiling that a significant time is being spent at mdb_txn_begin, or if you find yourself having to increase maxReaders in the environment’s limits because the transactions and cursors are not being garbage collected fast enough), consider precreating a transaction and cursor using beginReadOnlyTxn and openCursor.

In any case, bear in mind at all times LMDB’s caveats regarding long-lived transactions.

If you don’t want the overhead of intermediate ByteStrings (on your way to your eventual data structures), use unsafeReadLMDB instead.

unsafeReadLMDB :: (MonadIO m, Mode mode) => Database mode -> Maybe (ReadOnlyTxn, Cursor) -> ReadOptions -> (CStringLen -> IO k) -> (CStringLen -> IO v) -> Unfold m Void (k, v) Source #

Similar to readLMDB, except that the keys and values are not automatically converted into Haskell ByteStrings.

To ensure safety, make sure that the memory pointed to by the CStringLen for each key/value mapping function call is (a) only read (and not written to); and (b) not used after the mapping function has returned. One way to transform the CStringLens to your desired data structures is to use unsafePackCStringLen.

Read-only transactions and cursors

beginReadOnlyTxn :: Environment mode -> IO ReadOnlyTxn Source #

Begins an LMDB read-only transaction for use with readLMDB or unsafeReadLMDB. It is your responsibility to (a) use the transaction only on databases in the same environment, (b) make sure that those databases were already obtained before the transaction was begun, and (c) dispose of the transaction with abortReadOnlyTxn.

abortReadOnlyTxn :: ReadOnlyTxn -> IO () Source #

Disposes of a read-only transaction created with beginReadOnlyTxn.

openCursor :: ReadOnlyTxn -> Database mode -> IO Cursor Source #

Opens a cursor for use with readLMDB or unsafeReadLMDB. It is your responsibility to (a) make sure the cursor only gets used by a single readLMDB or unsafeReadLMDB Unfold at the same time (to be safe, one can open a new cursor for every readLMDB or unsafeReadLMDB call), (b) make sure the provided database is within the environment on which the provided transaction was begun, and (c) dispose of the cursor with closeCursor (logically before abortReadOnlyTxn, although the order doesn’t really matter for read-only transactions).

closeCursor :: Cursor -> IO () Source #

Disposes of a cursor created with openCursor.

Read options

data ReadOptions Source #

Constructors

ReadOptions 

Fields

  • readDirection :: !ReadDirection
     
  • readStart :: !(Maybe ByteString)

    If Nothing, a forward [backward] iteration starts at the beginning [end] of the database. Otherwise, it starts at the first key that is greater [less] than or equal to the Just key.

  • readUnsafeFFI :: !Bool

    Use unsafe FFI calls under the hood. This can increase iteration speed, but one should bear in mind that unsafe FFI calls can have an adverse impact on the performance of the rest of the program (e.g., its ability to effectively spawn green threads).

Instances

Instances details
Show ReadOptions Source # 
Instance details

Defined in Streamly.External.LMDB

defaultReadOptions :: ReadOptions Source #

By default, we start reading from the beginning of the database (i.e., from the smallest key), and we don’t use unsafe FFI calls.

data ReadDirection Source #

Direction of key iteration.

Constructors

Forward 
Backward 

Instances

Instances details
Show ReadDirection Source # 
Instance details

Defined in Streamly.External.LMDB

Writing

writeLMDB :: MonadIO m => Database ReadWrite -> WriteOptions -> Fold m (ByteString, ByteString) () Source #

Creates a fold with which we can stream key-value pairs into the given database.

It is the responsibility of the user to execute the fold on a bound thread.

The fold currently cannot be used with a scan. (The plan is for this shortcoming to be remedied with or after a future release of streamly that addresses the underlying issue.)

Please specify a suitable transaction size in the write options; the default of 1 (one write transaction for each key-value pair) could yield suboptimal performance. One could try, e.g., 100 KB chunks and benchmark from there.

data WriteOptions Source #

Constructors

WriteOptions 

Fields

  • writeTransactionSize :: !Int

    The number of key-value pairs per write transaction.

  • writeOverwriteOptions :: !OverwriteOptions
     
  • writeAppend :: !Bool

    Assume the input data is already ordered. This allows the use of MDB_APPEND under the hood and substantially improves write performance. An exception will be thrown if the assumption about the ordering is not true.

  • writeUnsafeFFI :: !Bool

    Use unsafe FFI calls under the hood. This can increase write performance, but one should bear in mind that unsafe FFI calls can have an adverse impact on the performance of the rest of the program (e.g., its ability to effectively spawn green threads).

defaultWriteOptions :: WriteOptions Source #

By default, we use a write transaction size of 1 (one write transaction for each key-value pair), allow overwriting, don’t assume that the input data is already ordered, and don’t use unsafe FFI calls.

data OverwriteOptions Source #

Constructors

OverwriteAllow

When a key reoccurs, overwrite the value.

OverwriteAllowSame

When a key reoccurs, throw an exception except when the value is the same.

OverwriteDisallow

When a key reoccurs, throw an exception.

Instances

Instances details
Eq OverwriteOptions Source # 
Instance details

Defined in Streamly.External.LMDB

Error types