Safe Haskell | None |
---|---|
Language | Haskell98 |
- type DamageModel a = Bool -> Int -> Vector (Mat44 a)
- data To = Nucleotide :-> Nucleotide
- (!) :: Mat44D -> To -> Double
- noDamage :: Num a => DamageModel a
- data DamageParameters float = DP {
- ssd_sigma :: !float
- ssd_delta :: !float
- ssd_lambda :: !float
- ssd_kappa :: !float
- dsd_sigma :: !float
- dsd_delta :: !float
- dsd_lambda :: !float
- genSubstMat :: Fractional a => a -> a -> Mat44 a
- memoDamageModel :: DamageModel a -> DamageModel a
- univDamage :: Fractional a => DamageParameters a -> DamageModel a
Documentation
type DamageModel a = Bool -> Int -> Vector (Mat44 a) Source
Things specific to ancient DNA, e.g. damage models.
For aDNA, we need a substitution probability. We have three options: use an empirically determined PSSM, use an arithmetically defined PSSM based on the Johnson model, use a context sensitive PSSM based on the Johnson model and an alignment. Using Dindel, actual substitutions relative to a called haplotype would be taken into account. Since we're not going to do that, taking alignments into account is difficult, somewhat approximate, and therefore not worth the hassle.
We represent substitution matrices by the type Mat44D
. Internally,
this is a vector of packed vectors. Conveniently, each of the packed
vectors represents all transition into the given nucleotide.
A DamageModel
is a function that gives substitution matrices for
each position in a read. The DamageModel
can depend on the length
of the read and whether its alignment is reversed. In practice, we
should probably memoize precomputed damage models somehow.
Nucleotide :-> Nucleotide infix 9 |
(!) :: Mat44D -> To -> Double infix 8 Source
Convenience function to access a substitution matrix that has a mnemonic reading.
noDamage :: Num a => DamageModel a Source
DamageModel
for undamaged DNA. The likelihoods follow directly
from the quality score. This needs elaboration to see what to do
with amibiguity codes (even though those haven't actually been
observed in the wild).
data DamageParameters float Source
Parameters for the universal damage model.
We assume the correct model is either no damage, or single strand damage, or double strand damage. Each of them comes with a probability. It turns out that blending them into one is simply accomplished by multiplying these probabilities onto the deamination probabilities.
For single stranded library prep, only one kind of damage occurs (C
to T), it occurs at low frequency (ssd_delta
) everywhere, at high
frequency (ssd_sigma
) in single stranded parts, and the overhang
length is distributed exponentially with parameter ssd_lambda
at
the 5' end and ssd_kappa
at the 3' end. (Without UDG treatment,
those will be equal. With UDG, those are much smaller and in fact
don't literally represent overhangs.)
For double stranded library prep, we get C->T damage at the 5' end
and G->A at the 3' end with rate dsd_sigma
and both in the interior
with rate dsd_delta
. Everything is symmetric, and therefore the
orientation of the aligned read doesn't matter either. Both
overhangs follow a distribution with parameter dsd_lambda
.
DP | |
|
Read float => Read (DamageParameters float) Source | |
Show float => Show (DamageParameters float) Source |
genSubstMat :: Fractional a => a -> a -> Mat44 a Source
Generic substitution matrix, has C->T and G->A deamination as
parameters. Setting p
or q
to 0 as appropriate makes this apply
to the single stranded or undamaged case.
memoDamageModel :: DamageModel a -> DamageModel a Source
univDamage :: Fractional a => DamageParameters a -> DamageModel a Source