biohazard-2.1: bioinformatics support library

Safe HaskellNone
LanguageHaskell2010

Bio.Bam.Index

Synopsis

Documentation

data BamIndex a Source #

Full index, unifying BAI and CSI style. In both cases, we have the binning scheme, parameters are fixed in BAI, but variable in CSI. Checkpoints are created from the linear index in BAI or from the loffset field in CSI.

Constructors

BamIndex 

Fields

  • minshift :: !Int

    Minshift parameter from CSI

  • depth :: !Int

    Depth parameter from CSI

  • unaln_off :: !Int64

    Best guess at where the unaligned records start

  • extensions :: a

    Room for stuff (needed for tabix)

  • refseq_bins :: !(Vector Bins)

    Records for the binning index, where each bin has a list of segments belonging to it.

  • refseq_ckpoints :: !(Vector Ckpoints)

    Known checkpoints of the form (pos,off) where off is the virtual offset of the first record crossing pos.

Instances
Show a => Show (BamIndex a) Source # 
Instance details

Defined in Bio.Bam.Index

Methods

showsPrec :: Int -> BamIndex a -> ShowS #

show :: BamIndex a -> String #

showList :: [BamIndex a] -> ShowS #

withIndexedBam :: (MonadIO m, MonadLog m, MonadMask m) => FilePath -> (BamMeta -> BamIndex () -> Handle -> m r) -> m r Source #

readBamIndex :: FilePath -> IO (BamIndex ()) Source #

Reads any index we can find for a file.

If the file name has a .bai or .csi extension, optionally followed by .gz, we read it. Else we look for the index by adding such an extension and by replacing the extension with these two, and finally try the file itself. The first file that exists is used.

readBaiIndex :: MonadIO m => ByteStream m r -> m (BamIndex ()) Source #

Reads an index in BAI or CSI format, recognized automatically. The index can be compressed, even though this isn't standard.

readTabix :: MonadIO m => ByteStream m r -> m TabIndex Source #

Reads a Tabix index. Note that tabix indices are compressed, this is taken care of automatically.

data Region Source #

Constructors

Region 

Fields

Instances
Eq Region Source # 
Instance details

Defined in Bio.Bam.Regions

Methods

(==) :: Region -> Region -> Bool #

(/=) :: Region -> Region -> Bool #

Ord Region Source # 
Instance details

Defined in Bio.Bam.Regions

Show Region Source # 
Instance details

Defined in Bio.Bam.Regions

newtype Subsequence Source #

A mostly contiguous subset of a sequence, stored as a set of non-overlapping intervals in an IntMap from start position to end position (half-open intervals, naturally).

Constructors

Subsequence (IntMap Int) 
Instances
Show Subsequence Source # 
Instance details

Defined in Bio.Bam.Regions

streamBamRefseq :: (MonadIO m, MonadLog m) => BamIndex b -> Handle -> Refseq -> Stream (Of BamRaw) m () Source #

Streams one reference from a bam file.

Seeks to a given sequence in a Bam file and enumerates only those records aligning to that reference. We use the first checkpoint available for the sequence, which an appropriate index. Streams the BamRaw records of the correct reference sequence only, and produces an empty stream if the sequence isn't found.

streamBamUnaligned :: MonadIO m => BamIndex b -> Handle -> Stream (Of BamRaw) m () Source #

Reads from a Bam file the part with unaligned reads.

Sort of the dual to streamBamRefseq. Since the index does not actually point to the unaligned part at the end, we use a best guess at where the unaligned stuff might start, then skip over any aligned records. Our "fallback guess" is to decode from the current position; this only works if something else already consumed the Bam header.