biohazard-0.6.5: bioinformatics support library

Safe HaskellNone
LanguageHaskell98

Bio.Iteratee.Builder

Description

Buffer builder to assemble Bgzf blocks. (This will probably be renamed.) The plan is to serialize stuff (BAM and BCF) into a buffer, then Bgzf chunks from the buffer and reuse it. This should avoid redundant copying and relieve some pressure from the garbage collector. And I hope to plug a mysterious memory leak that doesn't show up in the profiler.

Exported functions with unsafe in the name resulting in a type of Push omit the bounds checking. To use them safely, an appropriate ensureBuffer has to precede them.

Synopsis

Documentation

data BB Source

The MutableByteArray is garbage collected, so we don't get leaks. Once it has grown to a practical size (and the initial 128k should be very practical), we don't get fragmentation either. We also avoid copies for the most part, since no intermediate ByteStrings, either lazy or strict have to be allocated.

Constructors

BB 

Fields

buffer :: !(MutableByteArray RealWorld)
 
len :: !Int
 
mark :: !Int
 
mark2 :: !Int
 

newtype Push Source

Constructors

Push (BB -> IO BB) 

newBuffer :: IO BB Source

Creates a buffer with initial capacity of ~128k.

ensureBuffer :: Int -> Push Source

Ensures a given free space in the buffer by doubling its capacity if necessary.

unsafeSetMark :: Push Source

Sets a mark. This can later be filled in with a record length (used to create BAM records).

endRecord :: Push Source

Ends a record by filling the length into the field that was previously marked. Terrible things will happen if this wasn't preceded by a corresponding setMark.

endRecordPart1 :: Push Source

Ends the first part of a record. The length is filled in *before* the mark, which is specifically done to support the *two* length fields in BCF. It also remembers the current position. Horrible things happen if this isn't preceeded by *two* succesive invocations of setMark.

endRecordPart2 :: Push Source

Ends the second part of a record. The length is filled in at the mark, but computed from the sencond mark only. This is specifically done to support the *two* length fields in BCF. Horrible things happen if this isn't preceeded by *two* succesive invocations of setMark and one of endRecordPart1.