biohazard-0.6.3: bioinformatics support library

Safe HaskellNone
LanguageHaskell98

Bio.Bam.Filter

Synopsis

Documentation

filterPairs :: Monad m => (BamRec -> [BamRec]) -> (Maybe BamRec -> Maybe BamRec -> [BamRec]) -> Enumeratee [BamRec] [BamRec] m a Source

Quality filters adapted from old pipeline.

A filter/transformation applied to pairs of reads. We supply a predicate to be applied to single reads and one to be applied to pairs, tha latter can get incomplete pairs, too, if mates have been separated or filtered asymmetrically.

type QualFilter = BamRec -> BamRec Source

A quality filter is simply a transformation on BamRecs. By convention, quality filters should set flagFailsQC, a further step can then remove the failed reads. Filtering of individual reads tends to result in mate pairs with inconsistent flags, which in turn will result in lone mates and all sort of troubles with programs that expect non-broken BAM files. It is therefore recommended to use pairFilter with suitable predicates to do the post processing.

complexSimple :: Double -> QualFilter Source

Simple complexity filter aka "Nancy Filter". A read is considered not-sufficiently-complex if the most common base accounts for greater than the cutoff fraction of all non-N bases.

complexEntropy :: Double -> QualFilter Source

Filter on order zero empirical entropy. Entropy per base must be greater than cutoff.

qualityAverage :: Int -> QualFilter Source

Filter on average quality. Reads without quality string pass.

qualityMinimum :: Int -> Qual -> QualFilter Source

Filter on minimum quality. In qualityMinimum n q, a read passes if it has no more than n bases with quality less than q. Reads without quality string pass.

qualityFromOldIllumina :: BamRec -> BamRec Source

Convert quality scores from old Illumina scale (different formula and offset 64 in FastQ).

qualityFromNewIllumina :: BamRec -> BamRec Source

Convert quality scores from new Illumina scale (standard formula but offset 64 in FastQ).