Safe Haskell | None |
---|
This module defines common data structures for biosequences, i.e. data that represents nucleotide or protein sequences.
Basically, anything resembling or wrapping a sequence should
implement the BioSeq
class (and BioSeqQual
if quality information
is available).
The data types are mostly wrappers from lazy bytestrings from
Lazy
and Char8
, but most users
of this module should not need to access the underlying data types directly.
- newtype Qual = Qual {}
- newtype Offset = Offset {}
- newtype SeqData = SeqData {
- unSD :: ByteString
- newtype SeqLabel = SeqLabel {
- unSL :: ByteString
- newtype QualData = QualData {
- unQD :: ByteString
- class BioSeq s where
- class BioSeq sq => BioSeqQual sq where
- toFasta :: BioSeq s => s -> ByteString
- toFastaQual :: BioSeqQual s => s -> ByteString
- toFastQ :: BioSeqQual s => s -> ByteString
- module Data.Stringable
Data definitions
A quality value is in the range 0..255.
An Offset
is a zero-based index into a sequence
Sequence data are lazy bytestrings of ASCII characters.
Sequence data are lazy bytestrings of ASCII characters.
Quality data are lazy bytestrings of Qual
s.
Class definitions
The BioSeq
class models sequence data, and any data object that
represents a biological sequence should implement it.
:: s | |
-> SeqLabel | Sequence identifier (typically first word of the header) |
:: s | |
-> SeqLabel | Sequence header (may contain whitespace), by convention the
first word matches the |
:: s | |
-> SeqData | Sequence data |
:: s | |
-> Offset | Sequence length |
:: s | |
-> SeqLabel | Deprecated. Instead, use |
Deprecated: Warning: 'seqlabel' is deprecated, use 'seqid' or 'seqheader' instead.
class BioSeq sq => BioSeqQual sq whereSource
The BioSeqQual class extends BioSeq
with quality data. Any correspondig data object
should be an instance, this will allow Fasta formatted quality data toFastaQual
, as
well as the combined FastQ format (via toFastQ
).
Helper functions
toFasta :: BioSeq s => s -> ByteStringSource
Any BioSeq
can be formatted as Fasta, 60-char lines.
toFastaQual :: BioSeqQual s => s -> ByteStringSource
Output Fasta-formatted quality data (.qual files), where quality values are output as whitespace-separated integers.
toFastQ :: BioSeqQual s => s -> ByteStringSource
Output FastQ-formatted data. For simplicity, only the Sanger quality format is supported, and only four lines per sequence (i.e. no line breaks in sequence or quality data).
module Data.Stringable