biohazard-0.6.15: bioinformatics support library

Safe HaskellNone
LanguageHaskell2010

Bio.Base

Description

Common data types used everywhere. This module is a collection of very basic "bioinformatics" data types that are simple, but don't make sense to define over and over.

Synopsis

Documentation

newtype Nucleotide Source

A nucleotide base. We only represent A,C,G,T. The contained Word8 ist guaranteed to be 0..3.

Constructors

N 

Fields

unN :: Word8
 

Instances

Bounded Nucleotide Source 
Enum Nucleotide Source 
Eq Nucleotide Source 
Ord Nucleotide Source 
Read Nucleotide Source 
Show Nucleotide Source 
Ix Nucleotide Source 
Storable Nucleotide Source 
Unbox Nucleotide Source 
MVector MVector Nucleotide Source 
Vector Vector Nucleotide Source 
data Vector Nucleotide = V_Nucleotide (Vector Word8) Source 
data MVector s Nucleotide = MV_Nucleotide (MVector s Word8) Source 

newtype Nucleotides Source

A nucleotide base in an alignment. Experience says we're dealing with Ns and gaps all the type, so purity be damned, they are included as if they were real bases.

To allow Nucleotidess to be unpacked and incorporated into containers, we choose to represent them the same way as the BAM file format: as a 4 bit wide field. Gaps are encoded as 0 where they make sense, N is 15. The contained Word8 is guaranteed to be 0..15.

Constructors

Ns 

Fields

unNs :: Word8
 

newtype Qual Source

Qualities are stored in deciban, also known as the Phred scale. To represent a value p, we store -10 * log_10 p. Operations work directly on the "Phred" value, as the name suggests. The same goes for the Ord instance: greater quality means higher "Phred" score, meand lower error probability.

Constructors

Q 

Fields

unQ :: Word8
 

Instances

Bounded Qual Source 
Eq Qual Source 
Ord Qual Source 
Show Qual Source 
Storable Qual Source 
Unbox Qual Source 
MVector MVector Qual Source 
Vector Vector Qual Source 
data Vector Qual = V_Qual (Vector Word8) Source 
data MVector s Qual = MV_Qual (MVector s Word8) Source 

toQual :: (Floating a, RealFrac a) => a -> Qual Source

fromQual :: Qual -> Double Source

fromQualRaised :: Double -> Qual -> Double Source

probToQual :: (Floating a, RealFrac a) => Prob' a -> Qual Source

newtype Prob' a Source

A positive floating point value stored in log domain. We store the natural logarithm (makes computation easier), but allow conversions to the familiar "Phred" scale used for Qual values.

Constructors

Pr 

Fields

unPr :: a
 

Instances

Unbox a0 => MVector MVector (Prob' a) Source 
Unbox a0 => Vector Vector (Prob' a) Source 
Eq a => Eq (Prob' a) Source 
(Floating a, Fractional a, Ord a) => Fractional (Prob' a) Source 
(Floating a, Ord a) => Num (Prob' a) Source 
Ord a => Ord (Prob' a) Source 
RealFloat a => Show (Prob' a) Source 
Storable a => Storable (Prob' a) Source 
Unbox a0 => Unbox (Prob' a) Source 
data MVector s (Prob' a0) = MV_Prob' (MVector s a) Source 
data Vector (Prob' a0) = V_Prob' (Vector a) Source 

type Prob = Prob' Double Source

Common way of using Prob'.

toProb :: Floating a => a -> Prob' a Source

fromProb :: Floating a => Prob' a -> a Source

qualToProb :: Floating a => Qual -> Prob' a Source

pow :: Num a => Prob' a -> a -> Prob' a infixr 8 Source

data Word8 :: *

Instances

Bounded Word8 
Enum Word8 
Eq Word8 
Integral Word8 
Data Word8 
Num Word8 
Ord Word8 
Read Word8 
Real Word8 
Show Word8 
Ix Word8 
Storable Word8 
Bits Word8 
FiniteBits Word8 
Unbox Word8 
Prim Word8 
Lift Word8 
Hashable Word8 
Random Word8 
MVector MVector Word8 
Vector Vector Word8 
data Vector Word8 = V_Word8 (Vector Word8) 
data MVector s Word8 = MV_Word8 (MVector s Word8) 

toNucleotide :: Char -> Nucleotide Source

Converts a character into a Nucleotides. The usual codes for A,C,G,T and U are understood, - and . become gaps and everything else is an N.

toNucleotides :: Char -> Nucleotides Source

Converts a character into a Nucleotides. The usual codes for A,C,G,T and U are understood, - and . become gaps and everything else is an N.

isGap :: Nucleotides -> Bool Source

Tests if a Nucleotides is a gap. Returns true only for the gap.

isBase :: Nucleotides -> Bool Source

Tests if a Nucleotides is a base. Returns True for everything but gaps.

isProperBase :: Nucleotides -> Bool Source

Tests if a Nucleotides is a proper base. Returns True for A,C,G,T only.

compl :: Nucleotide -> Nucleotide Source

Complements a Nucleotides.

compls :: Nucleotides -> Nucleotides Source

Complements a Nucleotides.

type Seqid = ByteString Source

Sequence identifiers are ASCII strings Since we tend to store them for a while, we use strict byte strings.

data Position Source

Coordinates in a genome. The position is zero-based, no questions about it. Think of the position as pointing to the crack between two bases: looking forward you see the next base to the right, looking in the reverse direction you see the complement of the first base to the left.

To encode the strand, we (virtually) reverse-complement any sequence and prepend it to the normal one. That way, reversed coordinates have a negative sign and automatically make sense. Position 0 could either be the beginning of the sequence or the end on the reverse strand... that ambiguity shouldn't really matter.

Constructors

Pos 

Fields

p_seq :: !Seqid

sequence (e.g. some chromosome)

p_start :: !Int

offset, zero-based

Instances

shiftPosition :: Int -> Position -> Position Source

Moves a Position. The position is moved forward according to the strand, negative indexes move backward accordingly.

data Range Source

Ranges in genomes We combine a position with a length. In 'Range pos len', pos is always the start of a stretch of length len. Positions therefore move in the opposite direction on the reverse strand. To get the same stretch on the reverse strand, shift r_pos by r_length, then reverse direction (or call reverseRange).

Constructors

Range 

Fields

r_pos :: !Position
 
r_length :: !Int
 

Instances

shiftRange :: Int -> Range -> Range Source

Moves a Range. This is just shiftPosition lifted.

reverseRange :: Range -> Range Source

Reverses a Range to give the same Range on the opposite strand.

extendRange :: Int -> Range -> Range Source

Extends a range. The length of the range is simply increased.

insideRange :: Range -> Range -> Range Source

Expands a subrange. (range1 insideRange range2) interprets range1 as a subrange of range2 and computes its absolute coordinates. The sequence name of range1 is ignored.

wrapRange :: Int -> Range -> Range Source

Wraps a range to a region. This simply normalizes the start position to be in the interval '[0,n)', which only makes sense if the Range is to be mapped onto a circular genome. This works on both strands and the strand information is retained.

w2c :: Word8 -> Char

c2w :: Char -> Word8

findAuxFile :: FilePath -> IO FilePath Source

Finds a file by searching the environment variable BIOHAZARD like a PATH.