biohazard-2.1: bioinformatics support library

Safe HaskellNone
LanguageHaskell2010

Bio.Bam.Reader

Description

Parsers for BAM and SAM.

Synopsis

Documentation

decodeBam :: (MonadIO m, MonadLog m) => ByteStream m r -> m (BamMeta, Stream (Of BamRaw) m r) Source #

Decodes either BAM or SAM.

The input can be plain, gzip'ed or bgzf'd and either BAM or SAM. BAM is reliably recognized, anything else is treated as SAM. The offsets stored in BAM records make sense only for uncompressed or bgzf'd BAM.

decodeBamFile :: (MonadIO m, MonadLog m, MonadMask m) => FilePath -> (BamMeta -> Stream (Of BamRaw) m () -> m r) -> m r Source #

decodeBamFiles :: (MonadMask m, MonadLog m, MonadIO m) => [FilePath] -> ([(BamMeta, Stream (Of BamRaw) m ())] -> m r) -> m r Source #

Reads multiple bam files.

A continuation is run on the list of headers and streams. Since no attempt is made to unify the headers, this will work for completely unrelated bam files. All files are opened at the same time, which might run into the file descriptor limit given some ridiculous workflows.

decodePlainSam :: (MonadLog m, MonadIO m) => ByteStream m r -> m (BamMeta, Stream (Of BamRaw) m r) Source #

Streaming parser for SAM files.

It parses plain uncompressed SAM and returns a result compatible with decodePlainBam. Since it is supposed to work the same way as the BAM parser, it requires a symbol table for the reference names. This is extracted from the @SQ lines in the header. Note that reading SAM tends to be inefficient; if you care about performance at all, use BAM.

concatInputs :: (MonadIO m, MonadLog m, MonadMask m) => [FilePath] -> (BamMeta -> Stream (Of BamRaw) m () -> m r) -> m r Source #

Reads multiple bam inputs in sequence.

Only one file is opened at a time, so they must also be consumed in sequence. If you can afford to open all inputs simultaneously, you probably want to use mergeInputsOn instead. The filename "-" refers to stdin, if no filenames are given, stdin is read. Since we can't look ahead into further files, the header of the first input is used for the result, and an exception is thrown if one of the subsequent headers is incompatible with the first one.

mergeInputsOn :: (Ord x, MonadIO m, MonadLog m, MonadMask m) => (BamRaw -> x) -> [FilePath] -> (BamMeta -> Stream (Of BamRaw) m () -> m r) -> m r Source #

Reads multiple bam files and merges them.

If the inputs are all sorted by the thing being merged on, the output will be sorted, too. The headers are all merged sensibly, even if their reference lists differ. However, for performance reasons, we don't want to change the rname and mrnm fields in potentially all records. So instead of allowing arbitrary reference lists to be merged, we throw an exception unless every input is compatible with the effective reference list.