biohazard-0.6.15: bioinformatics support library

Safe HaskellNone
LanguageHaskell2010

Bio.Bam.Index

Synopsis

Documentation

data BamIndex a Source #

Full index, unifying BAI and CSI style. In both cases, we have the binning scheme, parameters are fixed in BAI, but variable in CSI. Checkpoints are created from the linear index in BAI or from the loffset field in CSI.

Constructors

BamIndex 

Fields

  • minshift :: !Int

    Minshift parameter from CSI

  • depth :: !Int

    Depth parameter from CSI

  • unaln_off :: !Int64

    Best guess at where the unaligned records start

  • extensions :: a

    Room for stuff (needed for tabix)

  • refseq_bins :: !(Vector Bins)

    Records for the binning index, where each bin has a list of segments belonging to it.

  • refseq_ckpoints :: !(Vector Ckpoints)

    Known checkpoints of the form (pos,off) where off is the virtual offset of the first record crossing pos.

Instances

Show a => Show (BamIndex a) Source # 

Methods

showsPrec :: Int -> BamIndex a -> ShowS #

show :: BamIndex a -> String #

showList :: [BamIndex a] -> ShowS #

readBamIndex :: FilePath -> IO (BamIndex ()) Source #

Reads any index we can find for a file. If the file name has a .bai or .csi extension, we read it. Else we look for the index by adding such an extension and by replacing the extension with these two, and finally in the file itself. The first file that exists and can actually be parsed, is used.

readBaiIndex :: MonadIO m => Iteratee Bytes m (BamIndex ()) Source #

Read an index in BAI or CSI format, recognized automatically. Note that TBI is supposed to be compressed using bgzip; it must be decompressed before being passed to readBaiIndex.

readTabix :: MonadIO m => Iteratee Bytes m TabIndex Source #

Reads a Tabix index. Note that tabix indices are compressed, this is taken care of.

data Region Source #

Constructors

Region 

Fields

newtype Subsequence Source #

A mostly contiguous subset of a sequence, stored as a set of non-overlapping intervals in an IntMap from start position to end position (half-open intervals, naturally).

Constructors

Subsequence (IntMap Int) 

eneeBamRefseq :: Monad m => BamIndex b -> Refseq -> Enumeratee [BamRaw] [BamRaw] m a Source #

Seeks to a given sequence in a Bam file and enumerates only those records aligning to that reference. We use the first checkpoint available for the sequence. This requires an appropriate index, and the file must have been opened in such a way as to allow seeking. Enumerates over the BamRaw records of the correct sequence only, doesn't enumerate at all if the sequence isn't found.

eneeBamUnaligned :: Monad m => BamIndex b -> Enumeratee [BamRaw] [BamRaw] m a Source #

Seeks to the part of a Bam file that contains unaligned reads and enumerates those. Sort of the dual to eneeBamRefseq. We use the best guess at where the unaligned stuff starts. If no such guess is available, we decode everything.

subsampleBam :: (MonadIO m, MonadMask m) => FilePath -> Enumerator' BamMeta [BamRaw] m b Source #

Subsample randomly from a BAM file. If an index exists, this produces an infinite stream taken from random locations in the file.

XXX It would be cool if we could subsample from multiple BAM files. It's a bit annoying to code: we'd probably read the indices up front, estimate how many reads we'd find in each file, then open them recursively to form a monad stack where the merging function has to select randomly where to read from. Hm.