biohazard-0.6.2: bioinformatics support library

Safe HaskellSafe




class Sizeable a where Source


usedBytes :: a -> Int Source

data PQ_Conf Source

A Priority Queue that can fall back to external storage.

Note that such a Priority Queue automatically gives rise to an external sorting algorithm: enqueue everything, dequeue until empty.

Whatever is to be stored in this queue needs to be in Binary, because it may need to be moved to external storage on demand. We also need a way to estimate the memory consumption of an enqueued object. When constructing the queue, the maximum amount of RAM to consume is set. Note that open input streams use memory for buffering, too.

Enqueued objects are kept in an in memory heap until the memory consumption becomes too high. At that point, the whole heap is sorted and dumped to external storage. If necessary, the file to do so is created and kept open. The newly created stream is added to a heap so that dequeueing objects amounts to performing a merge sort on multiple external streams. To conserve on file descriptors, we concatenate multiple streams into a single file, then use pread(2) on that as appropriate. If too many streams are open (how do we set that limit?), we do exactly that: merge-sort all streams and the in-memory heap into a single new stream. One file is created for each generation of streams, so that mergind handles streams of roughly equal length.

XXX Truth be told, this queue isn't backed externally, and ignores all limits. It *is* a Priority Queue, though!

XXX May want to add callbacks for significant events (new file, massive merge, deletion of file?)

XXX Need to track memory consumption of input buffers.

XXX Need a way to decide when too many streams are open. That point is reached when seeking takes about as much time as reading (which depends on buffer size and system characteristics), so that an additional merge pass becomes economical.

XXX These will be useful: unix-bytestring:System.Posix.IO.ByteString.fdPread temporary:System.IO.Temp.openBinaryTempFile lz4:Codec.Compression.LZ4




max_mb :: Int

memory limit

temp_path :: FilePath

path to temporary files (a directory will be created) functions to report progress go here

data PQ a Source

withPQ :: (Binary a, Ord a, Sizeable a) => PQ_Conf -> (PQ a -> IO b) -> IO b Source

makePQ :: (Binary a, Ord a, Sizeable a) => PQ_Conf -> IO (PQ a) Source

Creates a priority queue. Note that the priority queue creates files, which will only be cleaned up if deletePQ is called.

deletePQ :: PQ a -> IO () Source

Deletes the priority queue and all associated temporary files.

enqueuePQ :: (Binary a, Ord a, Sizeable a) => a -> PQ a -> IO () Source

Enqueues an element. This operation may result in the creation of a file or in an enormous merge of already created files.

dequeuePQ :: (Binary a, Ord a, Sizeable a) => PQ a -> IO () Source

Removes the minimum element from the queue. If the queue is already empty, nothing happens. As a result, it is possible that one or more file become empty and are deleted.

getMinPQ :: (Binary a, Ord a, Sizeable a) => PQ a -> IO (Maybe a) Source

peekMinPQ :: (Binary a, Ord a, Sizeable a) => PQ a -> IO (Maybe a) Source

Returns the minimum element from the queue. If the queue is empty, Nothing is returned. Else the minimum element currently in the queue.

sizePQ :: (Binary a, Ord a, Sizeable a) => PQ a -> IO Int Source