biohazard-0.6.5: bioinformatics support library

Safe HaskellNone
LanguageHaskell98

Bio.Iteratee.Bgzf

Description

Handling of BGZF files. Right now, we have an Enumeratee each for input and output. The input iteratee can optionally supply virtual file offsets, so that seeking is possible.

Synopsis

Documentation

data Block Source

One BGZF block: virtual offset and contents. Could also be a block of an uncompressed file, if we want to support indexing of uncompressed BAM or some silliness like that.

Constructors

Block 

decompressBgzfBlocks' :: MonadIO m => Int -> Enumeratee ByteString Block m a Source

Decompress a BGZF stream into a stream of Blocks, np fold parallel.

decompressBgzf :: MonadIO m => Enumeratee ByteString ByteString m a Source

Decompress a BGZF stream into a stream of ByteStrings.

decompressPlain :: MonadIO m => Enumeratee ByteString Block m a Source

Decompresses a plain file. What's actually happening is that the offset in the input stream is tracked and added to the ByteStrings giving Blocks. This results in the same interface as decompressing actual Bgzf.

maxBlockSize :: Int Source

Maximum block size for Bgzf: 64k with some room for headers and uncompressible stuff

bgzfEofMarker :: ByteString Source

The EOF marker for BGZF files. This is just an empty string compressed as BGZF. Appended to BAM files to indicate their end.

liftBlock :: Monad m => Iteratee ByteString m a -> Iteratee Block m a Source

Runs an Iteratee for ByteStrings when decompressing BGZF. Adds internal bookkeeping.

getOffset :: Monad m => Iteratee Block m FileOffset Source

Get the current virtual offset. The virtual address in a BGZF stream contains the offset of the current block in the upper 48 bits and the current offset into that block in the lower 16 bits. This scheme is compatible with the way BAM files are indexed.

isBgzf :: Monad m => Iteratee ByteString m Bool Source

Tests whether a stream is in BGZF format. Does not consume any input.

isGzip :: Monad m => Iteratee ByteString m Bool Source

Tests whether a stream is in GZip format. Also returns True on a Bgzf stream, which is technically a special case of GZip.

parMapChunksIO :: (MonadIO m, Nullable s) => Int -> (s -> IO t) -> Enumeratee s t m a Source

Parallel map of an IO action over the elements of a stream

This Enumeratee applies an IO action to every chunk of the input stream. These IO actions are run asynchronously in a limited parallel way. Don't forget to evaluate

compressBgzf :: MonadIO m => Enumeratee BgzfChunk ByteString m a Source

Like compressBgzf', with sensible defaults.

compressBgzf' :: MonadIO m => CompressParams -> Enumeratee BgzfChunk ByteString m a Source

Compresses a stream of ByteStrings into a stream of BGZF blocks, in parallel