biohazard-2.1: bioinformatics support library

Safe HaskellNone
LanguageHaskell2010

Bio.Bam.Pileup

Description

Pileup, similar to Samtools

Pileup turns a sorted sequence of reads into a sequence of "piles", one for each site where a genetic variant might be called. We will scan each read's CIGAR line and MD field in concert with the sequence and effective quality. Effective quality is the lowest available quality score of QUAL, MAPQ, and BQ. For aDNA calling, a base is represented as four probabilities, derived from a position dependent damage model.

Synopsis

Documentation

data PrimChunks Source #

The primitive pieces for genotype calling: A position, a base represented as four likelihoods, an inserted sequence, and the length of a deleted sequence. The logic is that we look at a base followed by some indel, and all those indels are combined into a single insertion and a single deletion.

Constructors

Seek !Int PrimBase

skip to position (at start or after N operation)

Indel [Nucleotides] [DamagedBase] PrimBase

observed deletion and insertion between two bases

EndOfRead

nothing anymore

Instances
Show PrimChunks Source # 
Instance details

Defined in Bio.Bam.Pileup

data PrimBase Source #

Constructors

Base

more chunks

Fields

Instances
Show PrimBase Source # 
Instance details

Defined in Bio.Bam.Pileup

data DamagedBase Source #

Represents our knowledge about a certain base, which consists of the base itself (A,C,G,T, encoded as 0..3; no Ns), the quality score (anything that isn't A,C,G,T becomes A with quality 0), and a substitution matrix representing post-mortem but pre-sequencing substitutions.

Unfortunately, none of this can be rolled into something more simple, because damage and sequencing error behave so differently.

Damage information is polymorphic. We might run with a simple version (a matrix) for calling, but we need more (a matrix and a mutable matrix, I think) for estimation.

Constructors

DB

reference base from MD field

Fields

Instances
Show DamagedBase Source # 
Instance details

Defined in Bio.Bam.Pileup

newtype DmgToken Source #

Constructors

DmgToken 

Fields

dissect :: DmgToken -> BamRaw -> [PosPrimChunks] Source #

Decomposes a BAM record into chunks suitable for piling up. We pick apart the CIGAR and MD fields, and combine them with sequence and quality as appropriate. Clipped bases are removed/skipped as needed. We also apply a substitution matrix to each base, which must be supplied along with the read.

data CallStats Source #

Statistics about a genotype call. Probably only useful for filtering (so not very useful), but we keep them because it's easy to track them.

Constructors

CallStats 
Instances
Eq CallStats Source # 
Instance details

Defined in Bio.Bam.Pileup

Show CallStats Source # 
Instance details

Defined in Bio.Bam.Pileup

Generic CallStats Source # 
Instance details

Defined in Bio.Bam.Pileup

Associated Types

type Rep CallStats :: Type -> Type #

Semigroup CallStats Source # 
Instance details

Defined in Bio.Bam.Pileup

Monoid CallStats Source # 
Instance details

Defined in Bio.Bam.Pileup

type Rep CallStats Source # 
Instance details

Defined in Bio.Bam.Pileup

type Rep CallStats = D1 (MetaData "CallStats" "Bio.Bam.Pileup" "biohazard-2.1-LAR9kjxyI4PJ81NLnLu2iq" False) (C1 (MetaCons "CallStats" PrefixI True) ((S1 (MetaSel (Just "read_depth") SourceUnpack SourceStrict DecidedStrict) (Rec0 Int) :*: S1 (MetaSel (Just "reads_mapq0") SourceUnpack SourceStrict DecidedStrict) (Rec0 Int)) :*: (S1 (MetaSel (Just "sum_mapq") SourceUnpack SourceStrict DecidedStrict) (Rec0 Int) :*: S1 (MetaSel (Just "sum_mapq_squared") SourceUnpack SourceStrict DecidedStrict) (Rec0 Int))))

newtype V_Nuc Source #

Constructors

V_Nuc (Vector Nucleotide) 
Instances
Eq V_Nuc Source # 
Instance details

Defined in Bio.Bam.Pileup

Methods

(==) :: V_Nuc -> V_Nuc -> Bool #

(/=) :: V_Nuc -> V_Nuc -> Bool #

Ord V_Nuc Source # 
Instance details

Defined in Bio.Bam.Pileup

Methods

compare :: V_Nuc -> V_Nuc -> Ordering #

(<) :: V_Nuc -> V_Nuc -> Bool #

(<=) :: V_Nuc -> V_Nuc -> Bool #

(>) :: V_Nuc -> V_Nuc -> Bool #

(>=) :: V_Nuc -> V_Nuc -> Bool #

max :: V_Nuc -> V_Nuc -> V_Nuc #

min :: V_Nuc -> V_Nuc -> V_Nuc #

Show V_Nuc Source # 
Instance details

Defined in Bio.Bam.Pileup

Methods

showsPrec :: Int -> V_Nuc -> ShowS #

show :: V_Nuc -> String #

showList :: [V_Nuc] -> ShowS #

newtype V_Nucs Source #

Constructors

V_Nucs (Vector Nucleotides) 
Instances
Eq V_Nucs Source # 
Instance details

Defined in Bio.Bam.Pileup

Methods

(==) :: V_Nucs -> V_Nucs -> Bool #

(/=) :: V_Nucs -> V_Nucs -> Bool #

Ord V_Nucs Source # 
Instance details

Defined in Bio.Bam.Pileup

Show V_Nucs Source # 
Instance details

Defined in Bio.Bam.Pileup

data IndelVariant Source #

Constructors

IndelVariant 
Instances
Eq IndelVariant Source # 
Instance details

Defined in Bio.Bam.Pileup

Ord IndelVariant Source # 
Instance details

Defined in Bio.Bam.Pileup

Show IndelVariant Source # 
Instance details

Defined in Bio.Bam.Pileup

Generic IndelVariant Source # 
Instance details

Defined in Bio.Bam.Pileup

Associated Types

type Rep IndelVariant :: Type -> Type #

type Rep IndelVariant Source # 
Instance details

Defined in Bio.Bam.Pileup

type Rep IndelVariant = D1 (MetaData "IndelVariant" "Bio.Bam.Pileup" "biohazard-2.1-LAR9kjxyI4PJ81NLnLu2iq" False) (C1 (MetaCons "IndelVariant" PrefixI True) (S1 (MetaSel (Just "deleted_bases") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 V_Nucs) :*: S1 (MetaSel (Just "inserted_bases") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 V_Nuc)))

type BasePile = [DamagedBase] Source #

Map quality and a list of encountered bases, with damage information and reference base if known.

type IndelPile = [(Qual, ([Nucleotides], [DamagedBase]))] Source #

Map quality and a list of encountered indel variants. The deletion has the reference sequence, if known, an insertion has the inserted sequence with damage information.

data Pile' a b Source #

Running pileup results in a series of piles. A Pile' has the basic statistics of a VarCall, but no likelihood values and a pristine list of variants instead of a proper call. We emit one pile with two BasePiles (one for each strand) and one IndelPile (the one immediately following) at a time.

Constructors

Pile 
Instances
(Show a, Show b) => Show (Pile' a b) Source # 
Instance details

Defined in Bio.Bam.Pileup

Methods

showsPrec :: Int -> Pile' a b -> ShowS #

show :: Pile' a b -> String #

showList :: [Pile' a b] -> ShowS #

type Pile = Pile' (BasePile, BasePile) (IndelPile, IndelPile) Source #

Raw pile. Bases and indels are piled separately on forward and backward strands.

pileup :: Monad m => Stream (Of PosPrimChunks) m b -> Stream (Of Pile) m b Source #

The pileup enumeratee takes BamRaws, dissects them, interleaves the pieces appropriately, and generates Pile's. The output will contain at most one BasePile and one IndelPile for each position, piles are sorted by position.

This top level driver receives BamRaws. Unaligned reads and duplicates are skipped (but not those merely failing quality checks). Processing stops when the first read with invalid br_rname is encountered or a t end of file.