BiobaseInfernal-0.7.1.0: Infernal data structures and tools

Safe HaskellNone

Biobase.SElab.CM

Description

Infernal CMs.

TODO order of nucleotides? ACGU?

TODO fastCM :: CM -> FastCM to make a data structure that is suitable for high-performance applications.

Synopsis

Documentation

data CMVersion Source

Encode the CM versions we can parse

data NodeType Source

Encode CM node types.

Constructors

BIF 
MATP 
MATL 
MATR 
BEGL 
BEGR 
ROOT 
END 

newtype NodeID Source

Node IDs

Constructors

NodeID 

Fields

unNodeID :: Int
 

data StateType Source

Encode CM state types.

Constructors

D 
MP 
ML 
MR 
IL 
IR 
S 
E 
B 
EL 

newtype StateID Source

State IDs

Constructors

StateID 

Fields

unStateID :: Int
 

data Emits Source

Certain states (IL,IR,ML,MR) emit a single nucleotide, one state emits a pair (MP), other states emit nothing.

Constructors

EmitsSingle 

Fields

_single :: [(Char, BitScore)]
 
EmitsPair 

Fields

_pair :: [(Char, Char, BitScore)]
 
EmitNothing 

data State Source

A single state.

Constructors

State 

Fields

_stateID :: StateID

The ID of this state

_nodeID :: NodeID

to which node does this state belong

_nodeType :: NodeType

node type for this state

_stateType :: StateType

type of the state

_transitions :: [(StateID, BitScore)]

which transitions, id and bitscore

_emits :: Emits

do we emit characters

data CM Source

This is an Infernal covariance model. We have a number of blocks:

  • basic information like the name of the CM, accession number, etc.
  • advanced information: nodes and their states, and the states themselves.
  • unsorted information from the header / blasic block

The CM data structure is not suitable for high-performance applications.

  • score inequalities: trusted (lowest seed score) >= gathering (lowest full score) >= noise (random strings)

Local entries into the CM.

The localBegin lens returns a map of state id's. We either have just the root node (with the S state), or a set of states with type: MP,ML,MR,B.

The localEnd lens on the other hand is the set of possible early exits from the model.

Constructors

CM 

Fields

_name :: Identification Rfam

name of model as in tRNA

_accession :: Accession Rfam

RFxxxxx identification

_version :: CMVersion

We can parse version 1.0 and 1.1 CMs

_trustedCutoff :: BitScore

lowest score of any seed member

_gathering :: BitScore

all scores at or above gathering score are in the full alignment

_noiseCutoff :: Maybe BitScore

highest score NOT included as member

_nullModel :: Vector BitScore

Null-model: categorical distribution on ACGU

_nodes :: Map NodeID (NodeType, [StateID])

each node has a set of states

_states :: Map StateID State

each state has a type, some emit characters, and some have children

_localBegin :: Map StateID BitScore

Entries into the CM.

_localEnd :: Map StateID BitScore

Exits out of the CM.

_unsorted :: Map ByteString ByteString

all lines that are not handled. Multiline entries are key->multi-line entry

_hmm :: Maybe HMM3
 

Instances

type ID2CM = Map (Identification Rfam) CMSource

Map of model names to individual CMs.

type AC2CM = Map (Accession Rfam) CMSource

Map of model accession numbers to individual CMs.

makeLocal :: Double -> Double -> CM -> CMSource

Make a CM have local start/end behaviour, with pbegin and pend probabilities given.

makeLocalBegin :: Double -> CM -> CMSource

Insert all legal local beginnings, disable root node (and root states). The pbegin probability the the total probability for local begins. The remaining 1-pbegin is the probability to start with node 1.

makeLocalEnd :: Double -> CM -> CMSource

Insert all legal local ends.