Safe Haskell | None |
---|---|
Language | Haskell2010 |
Trimming of reads as found in BAM files. Implements trimming low quality sequence from the 3' end.
- trim_3 :: Int -> BamRec -> BamRec
- trim_3' :: ([Nucleotides] -> [Qual] -> Bool) -> BamRec -> BamRec
- trim_low_quality :: Qual -> a -> [Qual] -> Bool
- default_fwd_adapters :: [Vector Nucleotides]
- default_rev_adapters :: [Vector Nucleotides]
- find_merge :: [Vector Nucleotides] -> [Vector Nucleotides] -> Vector Nucleotides -> Vector Qual -> Vector Nucleotides -> Vector Qual -> (Int, Int, Int)
- mergeBam :: Int -> Int -> [Vector Nucleotides] -> [Vector Nucleotides] -> BamRec -> BamRec -> [BamRec]
- find_trim :: [Vector Nucleotides] -> Vector Nucleotides -> Vector Qual -> (Int, Int, Int)
- trimBam :: Int -> Int -> [Vector Nucleotides] -> BamRec -> [BamRec]
- mergeTrimBam :: Monad m => Int -> Int -> [Vector Nucleotides] -> [Vector Nucleotides] -> Enumeratee [BamRec] [BamRec] m a
- twoMins :: (Bounded a, Ord a) => a -> Int -> (Int -> a) -> (a, Int, a)
- merged_seq :: (Vector v Nucleotides, Vector v Qual) => Int -> v Nucleotides -> v Qual -> v Nucleotides -> v Qual -> v Nucleotides
- merged_qual :: (Vector v Nucleotides, Vector v Qual) => Word8 -> Int -> v Nucleotides -> v Qual -> v Nucleotides -> v Qual -> v Qual
Documentation
trim_3' :: ([Nucleotides] -> [Qual] -> Bool) -> BamRec -> BamRec Source #
Trims from the 3' end of a sequence.
trim_3' p b
trims the 3' end of the sequence in b
at the
earliest position such that p
evaluates to true on every suffix
that was trimmed off. Note that the 3' end may be the beginning of
the sequence if it happens to be stored in reverse-complemented form.
Also note that trimming from the 3' end may not make sense for reads
that were constructed by merging paired end data (but we cannot take
care of that here). Further note that trimming may break dependent
information, notably the "mate" information of the mate and many
optional fields.
trim_low_quality :: Qual -> a -> [Qual] -> Bool Source #
Trim predicate to get rid of low quality sequence.
trim_low_quality q ns qs
evaluates to true if all qualities in qs
are smaller (i.e. worse) than q
.
default_fwd_adapters :: [Vector Nucleotides] Source #
For merging, we don't need the complete adapters (length around 70!), only a sufficient prefix. Taking only the more-or-less constant part (length around 30), there aren't all that many different adapters in the world. To deal with pretty much every library, we only need the following forward adapters, which will be the default (defined here in the direction they would be sequenced in): Genomic R2, Multiplex R2, Fraft P7.
default_rev_adapters :: [Vector Nucleotides] Source #
Like default_rev_adapters
, these are the few adapters needed for
the reverse read (defined in the direction they would be sequenced in
as part of the second read): Genomic R1, CL 72.
find_merge :: [Vector Nucleotides] -> [Vector Nucleotides] -> Vector Nucleotides -> Vector Qual -> Vector Nucleotides -> Vector Qual -> (Int, Int, Int) Source #
Finds the merge point. Input is list of forward adapters, list of reverse adapters, sequence1, quality1, sequence2, quality2; output is merge point and two qualities (YM, YN).
mergeBam :: Int -> Int -> [Vector Nucleotides] -> [Vector Nucleotides] -> BamRec -> BamRec -> [BamRec] Source #
Overlap-merging of read pairs. We shall compute the likelihood for every possible overlap, then select the most likely one (unless it looks completely random), compute a quality from the second best merge, then merge and clamp the quality accordingly. (We could try looking for chimaera after completing the merge, if only we knew which ones to expect?)
Two reads go in, with two adapter lists. We return Nothing
if all
merges looked mostly random. Else we return the two original reads,
flagged as eflagVestigial
*and* the merged version, flagged as
eflagMerged
and optionally eflagTrimmed
. All reads contain the
computed qualities (in YM and YN), which we also return.
The merging automatically limits quality scores some of the time. We additionally impose a hard limit of 63 to avoid difficulties representing the result, and even that is ridiculous. Sane people would further limit the returned quality! (In practice, map quality later imposes a limit anyway, so no worries...)
find_trim :: [Vector Nucleotides] -> Vector Nucleotides -> Vector Qual -> (Int, Int, Int) Source #
Finds the trimming point. Input is list of forward adapters, sequence, quality; output is trim point and two qualities (YM, YN).
trimBam :: Int -> Int -> [Vector Nucleotides] -> BamRec -> [BamRec] Source #
Trimming for a single read: we need one adapter only (the one coming after the read), here provided as a list of options, and then we merge with an empty second read. Results in up to two reads (the original, possibly flagged, and the trimmed one, definitely flagged, and two qualities).
mergeTrimBam :: Monad m => Int -> Int -> [Vector Nucleotides] -> [Vector Nucleotides] -> Enumeratee [BamRec] [BamRec] m a Source #
merged_seq :: (Vector v Nucleotides, Vector v Qual) => Int -> v Nucleotides -> v Qual -> v Nucleotides -> v Qual -> v Nucleotides Source #
merged_qual :: (Vector v Nucleotides, Vector v Qual) => Word8 -> Int -> v Nucleotides -> v Qual -> v Nucleotides -> v Qual -> v Qual Source #