Safe Haskell | None |
---|---|
Language | Haskell2010 |
Parsers for BAM and SAM.
TONOTDO:
- Reader for gzipped/bzipped/bgzf'ed SAM. Storing SAM is a bad idea, so why would anyone ever want to compress, much less index it?
- Proper support for the "=" symbol. It's completely alien to the ususal representation of sequences.
Synopsis
- data Block = Block {}
- decompressBgzfBlocks :: MonadIO m => Enumeratee Bytes Block m a
- decompressBgzf :: MonadIO m => Enumeratee Bytes Bytes m a
- compressBgzf :: MonadIO m => Enumeratee BgzfChunk Bytes m a
- decodeBam :: Monad m => (BamMeta -> Iteratee [BamRaw] m a) -> Iteratee Block m (Iteratee [BamRaw] m a)
- getBamRaw :: Monad m => Iteratee Block m [BamRaw]
- decodeAnyBam :: MonadIO m => BamrawEnumeratee m a
- decodeAnyBamFile :: (MonadIO m, MonadMask m) => FilePath -> (BamMeta -> Iteratee [BamRaw] m a) -> m (Iteratee [BamRaw] m a)
- type BamrawEnumeratee m b = Enumeratee' BamMeta Bytes [BamRaw] m b
- type BamEnumeratee m b = Enumeratee' BamMeta Bytes [BamRec] m b
- isBamOrSam :: MonadIO m => Iteratee Bytes m (BamEnumeratee m a)
- isBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a))
- isPlainBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a))
- isGzipBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a))
- isBgzfBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a))
- decodeSam :: Monad m => (BamMeta -> Iteratee [BamRec] m a) -> Iteratee Bytes m (Iteratee [BamRec] m a)
- decodeSam' :: Monad m => Refs -> Enumeratee Bytes [BamRec] m a
- decodeAnyBamOrSam :: MonadIO m => BamEnumeratee m a
- decodeAnyBamOrSamFile :: (MonadIO m, MonadMask m) => FilePath -> (BamMeta -> Iteratee [BamRec] m a) -> m (Iteratee [BamRec] m a)
- concatInputs :: (MonadIO m, MonadMask m) => [FilePath] -> Enumerator' BamMeta [BamRaw] m a
- concatDefaultInputs :: (MonadIO m, MonadMask m) => Enumerator' BamMeta [BamRaw] m a
- mergeInputs :: (MonadIO m, MonadMask m) => (BamMeta -> Enumeratee [BamRaw] [BamRaw] (Iteratee [BamRaw] m) a) -> [FilePath] -> Enumerator' BamMeta [BamRaw] m a
- mergeDefaultInputs :: (MonadIO m, MonadMask m) => (BamMeta -> Enumeratee [BamRaw] [BamRaw] (Iteratee [BamRaw] m) a) -> Enumerator' BamMeta [BamRaw] m a
- combineCoordinates :: Monad m => BamMeta -> Enumeratee [BamRaw] [BamRaw] (Iteratee [BamRaw] m) a
- combineNames :: Monad m => BamMeta -> Enumeratee [BamRaw] [BamRaw] (Iteratee [BamRaw] m) a
Documentation
One BGZF block: virtual offset and contents. Could also be a block of an uncompressed file, if we want to support indexing of uncompressed BAM or some silliness like that.
Block | |
|
decompressBgzfBlocks :: MonadIO m => Enumeratee Bytes Block m a Source #
decompressBgzf :: MonadIO m => Enumeratee Bytes Bytes m a Source #
Decompress a BGZF stream into a stream of Bytes
s.
compressBgzf :: MonadIO m => Enumeratee BgzfChunk Bytes m a Source #
Like compressBgzf'
, with sensible defaults.
decodeBam :: Monad m => (BamMeta -> Iteratee [BamRaw] m a) -> Iteratee Block m (Iteratee [BamRaw] m a) Source #
Decode a BAM stream into raw entries. Note that the entries can be
unpacked using decodeBamEntry
. Also note that this is an
Enumeratee in spirit, only the BamMeta
and Refs
need to get
passed separately.
decodeAnyBam :: MonadIO m => BamrawEnumeratee m a Source #
Checks if a file contains BAM in any of the common forms, then decompresses it appropriately. We support plain BAM, Bgzf'd BAM, and Gzip'ed BAM.
The recommendation for these functions is to use decodeAnyBam
(or
decodeAnyBamFile
) for any code that can handle BamRaw
input, but
decodeAnyBamOrSam
(or decodeAnyBamOrSamFile
) for code that needs
BamRec
. That way, SAM is supported automatically, and seeking will
be supported if possible.
decodeAnyBamFile :: (MonadIO m, MonadMask m) => FilePath -> (BamMeta -> Iteratee [BamRaw] m a) -> m (Iteratee [BamRaw] m a) Source #
type BamrawEnumeratee m b = Enumeratee' BamMeta Bytes [BamRaw] m b Source #
type BamEnumeratee m b = Enumeratee' BamMeta Bytes [BamRec] m b Source #
isBamOrSam :: MonadIO m => Iteratee Bytes m (BamEnumeratee m a) Source #
isBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a)) Source #
Tests if a data stream is a Bam file.
Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is
recognized as Bam, a decoder (suitable Enumeratee) for it is
returned. This uses iLookAhead
internally, so it shouldn't consume
anything from the stream.
isPlainBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a)) Source #
Tests if a data stream is a Bam file.
Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is
recognized as Bam, a decoder (suitable Enumeratee) for it is
returned. This uses iLookAhead
internally, so it shouldn't consume
anything from the stream.
isGzipBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a)) Source #
Tests if a data stream is a Bam file.
Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is
recognized as Bam, a decoder (suitable Enumeratee) for it is
returned. This uses iLookAhead
internally, so it shouldn't consume
anything from the stream.
isBgzfBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a)) Source #
Tests if a data stream is a Bam file.
Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is
recognized as Bam, a decoder (suitable Enumeratee) for it is
returned. This uses iLookAhead
internally, so it shouldn't consume
anything from the stream.
decodeSam :: Monad m => (BamMeta -> Iteratee [BamRec] m a) -> Iteratee Bytes m (Iteratee [BamRec] m a) Source #
Iteratee-style parser for SAM files, designed to be compatible with the BAM parsers. Parses plain uncompressed SAM, nothing else. Since it is supposed to work the same way as the BAM parser, it requires the presense of the SQ header lines. These are stripped from the header text and turned into the symbol table.
decodeSam' :: Monad m => Refs -> Enumeratee Bytes [BamRec] m a Source #
Parser for SAM that doesn't look for a header. Has the advantage that it doesn't stall on a pipe that never delivers data. Has the disadvantage that it never reads the header and therefore needs a list of allowed RNAMEs.
decodeAnyBamOrSam :: MonadIO m => BamEnumeratee m a Source #
Checks if a file contains BAM in any of the common forms, then decompresses it appropriately. If the stream doesn't contain BAM at all, it is instead decoded as SAM. Since SAM is next to impossible to recognize reliably, we don't even try. Any old junk is decoded as SAM and will fail later.
decodeAnyBamOrSamFile :: (MonadIO m, MonadMask m) => FilePath -> (BamMeta -> Iteratee [BamRec] m a) -> m (Iteratee [BamRec] m a) Source #
concatInputs :: (MonadIO m, MonadMask m) => [FilePath] -> Enumerator' BamMeta [BamRaw] m a Source #
concatDefaultInputs :: (MonadIO m, MonadMask m) => Enumerator' BamMeta [BamRaw] m a Source #
mergeInputs :: (MonadIO m, MonadMask m) => (BamMeta -> Enumeratee [BamRaw] [BamRaw] (Iteratee [BamRaw] m) a) -> [FilePath] -> Enumerator' BamMeta [BamRaw] m a Source #
mergeDefaultInputs :: (MonadIO m, MonadMask m) => (BamMeta -> Enumeratee [BamRaw] [BamRaw] (Iteratee [BamRaw] m) a) -> Enumerator' BamMeta [BamRaw] m a Source #
combineCoordinates :: Monad m => BamMeta -> Enumeratee [BamRaw] [BamRaw] (Iteratee [BamRaw] m) a Source #
combineNames :: Monad m => BamMeta -> Enumeratee [BamRaw] [BamRaw] (Iteratee [BamRaw] m) a Source #