Safe Haskell | None |
---|---|
Language | Haskell2010 |
Buffer builder to assemble Bgzf blocks. The plan is to serialize stuff (BAM and BCF) into a buffer, then Bgzf chunks from the buffer. We use a large buffer, and we always make sure there is plenty of space in it (to avoid redundant checks). Whenever a block is ready to be compressed, we stick it into a MVar. When we run out of space, we simply use a new buffer. Multiple threads grab pieces from the MVar, compress them, pass them downstream through another MVar. A final thread restores the order and writes the blocks.
- data BB = BB {}
- newBuffer :: Int -> IO BB
- fillBuffer :: BB -> BgzfTokens -> IO (BB, BgzfTokens)
- expandBuffer :: Int -> BB -> IO BB
- encodeBgzf :: MonadIO m => Int -> Enumeratee (Endo BgzfTokens) ByteString m b
- data BgzfTokens
- = TkWord32 !Word32 BgzfTokens
- | TkWord16 !Word16 BgzfTokens
- | TkWord8 !Word8 BgzfTokens
- | TkFloat !Float BgzfTokens
- | TkDouble !Double BgzfTokens
- | TkString !ByteString BgzfTokens
- | TkDecimal !Int BgzfTokens
- | TkLnString !ByteString BgzfTokens
- | TkSetMark BgzfTokens
- | TkEndRecord BgzfTokens
- | TkEndRecordPart1 BgzfTokens
- | TkEndRecordPart2 BgzfTokens
- | TkEnd
- | TkBclSpecial !BclArgs BgzfTokens
- | TkLowLevel !Int (BB -> IO BB) BgzfTokens
- data BclArgs = BclArgs BclSpecialType !(Vector Word8) !Int !Int !Int !Int
- data BclSpecialType
- int_loop :: Ptr Word8 -> Int -> IO Int
- loop_bcl_special :: Ptr Word8 -> BclArgs -> IO Int
Documentation
We manage a large buffer (multiple megabytes), of which we fill an
initial portion. We remeber the size, the used part, and two marks
where we later fill in sizes for the length prefixed BAM or BCF
records. We move the buffer down when we yield a piece downstream,
and when we run out of space, we simply move to a new buffer.
Garbage collection should take care of the rest. Unused mark
must
be set to (maxBound::Int) so it doesn't interfere with flushing.
fillBuffer :: BB -> BgzfTokens -> IO (BB, BgzfTokens) Source #
expandBuffer :: Int -> BB -> IO BB Source #
Creates a new buffer, copying the content from an old one, with higher capacity.
encodeBgzf :: MonadIO m => Int -> Enumeratee (Endo BgzfTokens) ByteString m b Source #
Expand a chain of tokens into a buffer, sending finished pieces downstream as soon as possible.
data BgzfTokens Source #
Things we are able to encode. Taking inspiration from binary-serialise-cbor, we define these as a lazy list-like thing and consume it in a interpreter.
data BclSpecialType Source #