elynx-seq-0.0.1: Handle molecular sequences

Copyright(c) Dominik Schrempf 2018
LicenseGPL-3
Maintainerdominik.schrempf@gmail.com
Stabilityunstable
Portabilityportable
Safe HaskellNone
LanguageHaskell2010

ELynx.Data.Sequence.MultiSequenceAlignment

Description

Creation date: Thu Oct 4 18:40:18 2018.

Synopsis

Documentation

msaNSequences :: MultiSequenceAlignment -> Int Source #

Number of sequences.

  • Input, output

summarizeMSA :: MultiSequenceAlignment -> ByteString Source #

Similar to summarizeSequenceList but with different Header.

  • Manipulation

msaJoin :: MultiSequenceAlignment -> MultiSequenceAlignment -> MultiSequenceAlignment Source #

Join two MultiSequenceAlignments vertically. That is, add more sequences to an alignment. See also msaConcatenate.

msaConcatenate :: MultiSequenceAlignment -> MultiSequenceAlignment -> MultiSequenceAlignment Source #

Concatenate two MultiSequenceAlignments horizontally. That is, add more sites to an alignment. See also msaJoin.

filterColumnsOnlyStd :: MultiSequenceAlignment -> MultiSequenceAlignment Source #

Only keep columns with standard characters. Alignment columns with IUPAC characters are removed.

filterColumnsStd :: Double -> MultiSequenceAlignment -> MultiSequenceAlignment Source #

Filter columns with proportion of standard character larger than given number.

filterColumnsNoGaps :: MultiSequenceAlignment -> MultiSequenceAlignment Source #

Only keep columns without gaps or unknown characters.

  • Analysis

type FrequencyData = Matrix Double Source #

Frequency data; do not store the actual characters, but only their frequencies.

toFrequencyData :: MultiSequenceAlignment -> FrequencyData Source #

Calculcate frequency of characters in multi sequence alignment.

kEffEntropy :: FrequencyData -> [Double] Source #

Diversity analysis. See kEffEntropy.

kEffHomoplasy :: FrequencyData -> [Double] Source #

Diversity analysis. See kEffEntropy.

countIUPACChars :: MultiSequenceAlignment -> Int Source #

Count the number of standard (i.e., not extended IUPAC) characters in the alignment.

countGaps :: MultiSequenceAlignment -> Int Source #

Count the number of gaps in the alignment.

countUnknowns :: MultiSequenceAlignment -> Int Source #

Count the number of unknown characters in the alignment.

  • Sub sample

subSample :: [Int] -> MultiSequenceAlignment -> MultiSequenceAlignment Source #

Sample the given sites from a multi sequence alignment.

randomSubSample :: PrimMonad m => Int -> MultiSequenceAlignment -> Gen (PrimState m) -> m MultiSequenceAlignment Source #

Randomly sample a given number of sites of the multi sequence alignment.