biocore-0.3.1: A bioinformatics library

Safe HaskellNone

Bio.Core.Sequence

Contents

Description

This module defines common data structures for biosequences, i.e. data that represents nucleotide or protein sequences.

Basically, anything resembling or wrapping a sequence should implement the BioSeq class (and BioSeqQual if quality information is available).

The data types are mostly wrappers from lazy bytestrings from Lazy and Char8, but most users of this module should not need to access the underlying data types directly.

Synopsis

Data definitions

newtype Qual Source

A quality value is in the range 0..255.

Constructors

Qual 

Fields

unQual :: Word8
 

newtype Offset Source

An Offset is a zero-based index into a sequence

Constructors

Offset 

Fields

unOff :: Int64
 

newtype SeqData Source

Sequence data are lazy bytestrings of ASCII characters.

Constructors

SeqData 

Fields

unSD :: ByteString
 

newtype SeqLabel Source

Sequence data are lazy bytestrings of ASCII characters.

Constructors

SeqLabel 

Fields

unSL :: ByteString
 

newtype QualData Source

Quality data are lazy bytestrings of Quals.

Constructors

QualData 

Fields

unQD :: ByteString
 

Class definitions

class BioSeq s whereSource

The BioSeq class models sequence data, and any data object that represents a biological sequence should implement it.

Methods

seqidSource

Arguments

:: s 
-> SeqLabel

Sequence identifier (typically first word of the header)

seqheaderSource

Arguments

:: s 
-> SeqLabel

Sequence header (may contain whitespace), by convention the first word matches the seqid

seqdataSource

Arguments

:: s 
-> SeqData

Sequence data

seqlengthSource

Arguments

:: s 
-> Offset

Sequence length

seqlabelSource

Arguments

:: s 
-> SeqLabel

Deprecated. Instead, use seqid if you want the unique ID, or seqheader if you want the FASTA style header with ID and comments.

Deprecated: Warning: 'seqlabel' is deprecated, use 'seqid' or 'seqheader' instead.

class BioSeq sq => BioSeqQual sq whereSource

The BioSeqQual class extends BioSeq with quality data. Any correspondig data object should be an instance, this will allow Fasta formatted quality data toFastaQual, as well as the combined FastQ format (via toFastQ).

Methods

seqqual :: sq -> QualDataSource

Helper functions

toFasta :: BioSeq s => s -> ByteStringSource

Any BioSeq can be formatted as Fasta, 60-char lines.

toFastaQual :: BioSeqQual s => s -> ByteStringSource

Output Fasta-formatted quality data (.qual files), where quality values are output as whitespace-separated integers.

toFastQ :: BioSeqQual s => s -> ByteStringSource

Output FastQ-formatted data. For simplicity, only the Sanger quality format is supported, and only four lines per sequence (i.e. no line breaks in sequence or quality data).