gsc-weighting-0.2.2: Generic implementation of Gerstein/Sonnhammer/Chothia weighting.

Safe HaskellNone
LanguageHaskell98

Data.Weighting.GSC

Synopsis

Documentation

gsc :: Dendrogram a -> Dendrogram (a, Distance) Source

O(n^2) Calculates the Gerstein/Sonnhammer/Chothia weights for all elements of a dendrogram. Weights are annotated to the leafs of the dendrogram while distances in branches are kept unchanged.

Distances d in branches should be non-increasing and between 0 (in the leafs) and 1. The final weights are normalized to average to 1 (i.e. sum to the number of sequences, the same they would sum if all weights were 1).

For example, suppose we have

dendro = Branch 0.8
           (Branch 0.5
             (Branch 0.2
               (Leaf A)
               (Leaf B))
             (Leaf C))
           (Leaf D)

This is the same as GSC paper's example, however they used similarities while we are using distances (i.e. applying (1-) to the distances would give exactly their example). Then gsc dendro is

gsc dendro == Branch 0.8
                (Branch 0.5
                  (Branch 0.2
                    (Leaf (A,0.7608695652173914))
                    (Leaf (B,0.7608695652173914)))
                  (Leaf (C,1.0869565217391306)))
                (Leaf (D,1.3913043478260871))

which is exactly what they calculated.