biohazard-0.6.5: bioinformatics support library

Safe HaskellNone
LanguageHaskell98

Bio.Align

Synopsis

Documentation

data Mode Source

Mode argument for myersAlign, determines where free gaps are allowed.

Constructors

Globally

align globally, without gaps at either end

HasPrefix

align so that the second sequence is a prefix of the first

IsPrefix

align so that the first sequence is a prefix of the second

Instances

myersAlign :: Int -> ByteString -> Mode -> ByteString -> (Int, ByteString, ByteString) Source

Align two strings. myersAlign maxd seqA mode seqB tries to align seqA to seqB, which will work as long as no more than maxd gaps or mismatches are incurred. The mode argument determines if either of the sequences is allowed to have an overhanging tail.

The result is the triple of the actual distance (gaps + mismatches) and the two padded sequences. These sequences are the original sequences with dashes inserted for gaps.

The algorithm is the O(nd) algorithm by Myers, implemented in C. A gap and a mismatch score the same. The strings are supposed to code for DNA, the code understands IUPAC-IUB ambiguity codes. Two characters match iff there is at least one nucleotide both can code for. Note that N is a wildcard, while X matches nothing.

showAligned :: Int -> [ByteString] -> [ByteString] Source

Nicely print an alignment. An alignment is simply a list of strings with inserted gaps to make them align. We split them into manageable chunks, stack them vertically and add a line showing asterisks in every column where all aligned strings agree. The result is almost the Clustal format.