biohazard-1.0.1: bioinformatics support library

Safe HaskellNone
LanguageHaskell2010

Bio.Bam.Filter

Description

Quality filters adapted from prehistoric pipeline.

Synopsis

Documentation

filterPairs :: Monad m => (BamRec -> [BamRec]) -> (Maybe BamRec -> Maybe BamRec -> [BamRec]) -> Enumeratee [BamRec] [BamRec] m a Source #

A filter/transformation applied to pairs of reads. We supply a predicate to be applied to single reads and one to be applied to pairs, tha latter can get incomplete pairs, too, if mates have been separated or filtered asymmetrically.

type QualFilter = BamRec -> BamRec Source #

A quality filter is simply a transformation on BamRecs. By convention, quality filters should set flagFailsQC, a further step can then remove the failed reads. Filtering of individual reads tends to result in mate pairs with inconsistent flags, which in turn will result in lone mates and all sort of troubles with programs that expect non-broken BAM files. It is therefore recommended to use pairFilter with suitable predicates to do the post processing.

complexSimple :: Double -> QualFilter Source #

Simple complexity filter aka "Nancy Filter". A read is considered not-sufficiently-complex if the most common base accounts for greater than the cutoff fraction of all non-N bases.

complexEntropy :: Double -> QualFilter Source #

Filter on order zero empirical entropy. Entropy per base must be greater than cutoff.

qualityAverage :: Int -> QualFilter Source #

Filter on average quality. Reads without quality string pass.

qualityMinimum :: Int -> Qual -> QualFilter Source #

Filter on minimum quality. In qualityMinimum n q, a read passes if it has no more than n bases with quality less than q. Reads without quality string pass.

qualityFromOldIllumina :: BamRec -> BamRec Source #

Convert quality scores from old Illumina scale (different formula and offset 64 in FastQ).

qualityFromNewIllumina :: BamRec -> BamRec Source #

Convert quality scores from new Illumina scale (standard formula but offset 64 in FastQ).