Safe Haskell	None
Language	Haskell2010

Bio.Bam.Fastq

Description

Parser for FastA/FastQ, ByteStream style, written such that it works well with module Bio.Bam.

Input streams are broken into numbered lines, then into records. Records can start with empty lines, which are ignored, or random junk, which is ignored, but results in a warning, followed by a header indicating either a FastA (begins with > or ;) or FastQ record (begins with @). More description lines begining with ; are allowed, and silently ignored. All following lines not starting with +, >, ; or @ are sequence lines. (Only) in a FastQ record, this is followed by a separator line starting with a +, which is ignored, and exactly as many quality lines as there were sequence lines. A missing separator results in a warning and the record being parsed without quality scores.

In sequence lines, IUPAC-IUB ambiguity codes are converted to Nucleotides, white space is skipped silently. Any other character becomes an unknown base ('=' in SAM) and a warning is emitted. Note that downstream tools are unlikely to handle the resulting unknown bases and/or empty records gracefully. If the quality lines do not have the same total length as the sequence lines (this includes missing quality lines due to end-of-stream), a warning is emitted, and the record receives no quality scores (just as if it was a FastA record). Else, if the quality lines have a different layout than the sequence lines, a warning is emitted, but they are still used.

Quality scores must be stored as raw bytes with offset 33. (Other variants, like 454's ASCII qualities and Solexa's raw bytes with offset 64 are difficult to detect, and extinct in the wild anyway.) If the second word of the header stores multiple fields, we try to extract Illumina's "QC failed" flag and either an index sequence or a read group name from it.

Other flags are commonly encoded into the sequence names. We do not handle those here, but most of the conventions at MPI EVAN are dealt with by removeWarts.

Synopsis

parseFastq :: MonadLog m => ByteStream m r -> Stream (Of BamRec) m r
data EmptyRecord = EmptyRecord !Int !Bytes
data IncoherentQualities = IncoherentQualities !Int !Bytes
data IncongruentQualities = IncongruentQualities !Int !Bytes
data JunkFound = JunkFound !Int !Bytes
data QualitiesMissing = QualitiesMissing !Int !Bytes
data SequenceHasGaps = SequenceHasGaps !Int !Bytes

Documentation

parseFastq :: MonadLog m => ByteStream m r -> Stream (Of BamRec) m r Source #

data EmptyRecord Source #

Constructors

EmptyRecord !Int !Bytes

Instances

Show EmptyRecord Source #
Instance details Defined in Bio.Bam.Fastq Methods showsPrec :: Int -> EmptyRecord -> ShowS # show :: EmptyRecord -> String # showList :: [EmptyRecord] -> ShowS #
Exception EmptyRecord Source #
Instance details Defined in Bio.Bam.Fastq Methods toException :: EmptyRecord -> SomeException # fromException :: SomeException -> Maybe EmptyRecord # displayException :: EmptyRecord -> String #

data IncoherentQualities Source #

Emitted when a quality record does not fit the sequence record.

Constructors

IncoherentQualities !Int !Bytes

Instances

Show IncoherentQualities Source #
Instance details Defined in Bio.Bam.Fastq Methods showsPrec :: Int -> IncoherentQualities -> ShowS # show :: IncoherentQualities -> String # showList :: [IncoherentQualities] -> ShowS #
Exception IncoherentQualities Source #
Instance details Defined in Bio.Bam.Fastq Methods toException :: IncoherentQualities -> SomeException # fromException :: SomeException -> Maybe IncoherentQualities # displayException :: IncoherentQualities -> String #

data IncongruentQualities Source #

Emitted when a quality record has different layout than the sequence.

Constructors

IncongruentQualities !Int !Bytes

Instances

Show IncongruentQualities Source #
Instance details Defined in Bio.Bam.Fastq Methods showsPrec :: Int -> IncongruentQualities -> ShowS # show :: IncongruentQualities -> String # showList :: [IncongruentQualities] -> ShowS #
Exception IncongruentQualities Source #
Instance details Defined in Bio.Bam.Fastq Methods toException :: IncongruentQualities -> SomeException # fromException :: SomeException -> Maybe IncongruentQualities # displayException :: IncongruentQualities -> String #

data JunkFound Source #

Emitted when random text is found instead of a header.

Constructors

JunkFound !Int !Bytes

Instances

Show JunkFound Source #
Instance details Defined in Bio.Bam.Fastq Methods showsPrec :: Int -> JunkFound -> ShowS # show :: JunkFound -> String # showList :: [JunkFound] -> ShowS #
Exception JunkFound Source #
Instance details Defined in Bio.Bam.Fastq Methods toException :: JunkFound -> SomeException # fromException :: SomeException -> Maybe JunkFound # displayException :: JunkFound -> String #

data QualitiesMissing Source #

Emitted when a quality separator was expected, but not found.

Constructors

QualitiesMissing !Int !Bytes

Instances

Show QualitiesMissing Source #
Instance details Defined in Bio.Bam.Fastq Methods showsPrec :: Int -> QualitiesMissing -> ShowS # show :: QualitiesMissing -> String # showList :: [QualitiesMissing] -> ShowS #
Exception QualitiesMissing Source #
Instance details Defined in Bio.Bam.Fastq Methods toException :: QualitiesMissing -> SomeException # fromException :: SomeException -> Maybe QualitiesMissing # displayException :: QualitiesMissing -> String #

data SequenceHasGaps Source #

Emitted when a sequence record contains strange characters

Constructors

SequenceHasGaps !Int !Bytes

Instances

Show SequenceHasGaps Source #
Instance details Defined in Bio.Bam.Fastq Methods showsPrec :: Int -> SequenceHasGaps -> ShowS # show :: SequenceHasGaps -> String # showList :: [SequenceHasGaps] -> ShowS #
Exception SequenceHasGaps Source #
Instance details Defined in Bio.Bam.Fastq Methods toException :: SequenceHasGaps -> SomeException # fromException :: SomeException -> Maybe SequenceHasGaps # displayException :: SequenceHasGaps -> String #