biohazard-1.1.1: bioinformatics support library

Safe HaskellSafe
LanguageHaskell2010

Bio.Util.Numeric

Description

Random useful stuff I didn't know where to put.

Synopsis

Documentation

wilson :: Double -> Int -> Int -> (Double, Double, Double) Source #

Calculates the Wilson Score interval. If (l,m,h) = wilson c x n, then m is the binary proportion and (l,h) it's c-confidence interval for x positive examples out of n observations. c is typically something like 0.05.

invnormcdf :: (Ord a, Floating a) => a -> a Source #

choose :: Integral a => a -> a -> a Source #

Binomial coefficient: \( \mbox{choose n k} = \frac{n!}{(n-k)! k!} \)

estimateComplexity :: (Integral a, Floating b, Ord b) => a -> a -> Maybe b Source #

Try to estimate complexity of a whole from a sample. Suppose we sampled total things and among those singles occured only once. How many different things are there?

Let the total number be m. The copy number follows a Poisson distribution with paramter lambda. Let \( z := e^{\lambda} \), then we have:

\[ P( 0 ) = e^{-\lambda} = \frac{1}{z} \\ P( 1 ) = \lambda e^{-\lambda} = \frac{\ln z}{z} \\ P(\ge 1) = 1 - e^{-\lambda} = 1 - \frac{1}{z} \\ \] \[ \mbox{singles} = m \frac{\ln z}{z} \\ \mbox{total} = m \left( 1 - \frac{1}{z} \right) \\ \] \[ D := \frac{\mbox{total}}{\mbox{singles}} = (1 - \frac{1}{z}) * \frac{z}{\ln z} \\ f := z - 1 - D \ln z = 0 \]

To get z, we solve using Newton iteration and then substitute to get m:

\[ df/dz = 1 - D/z \\ z' = z - \frac{ z (z - 1 - D \ln z) }{ z - D } \\ m = \mbox{singles} * \frac{z}{\ln z} \]

It converges as long as the initial z is large enough, and 10D (in the line for zz below) appears to work well.

showNum :: Show a => a -> String Source #

log1p :: (Floating a, Ord a) => a -> a Source #

Computes log (1+x) to a relative precision of 10^-8 even for very small x. Stolen from http://www.johndcook.com/cpp_log_one_plus_x.html

expm1 :: (Floating a, Ord a) => a -> a Source #

Computes \( e^x - 1 \) to a relative precision of 10^-10 even for very small x. Stolen from http://www.johndcook.com/cpp_expm1.html

(<#>) :: (Floating a, Ord a) => a -> a -> a infixl 5 Source #

Computes \( \ln \left( e^x + e^y \right) \) without leaving the log domain and hence without losing precision.

log1mexp :: (Floating a, Ord a) => a -> a Source #

Computes \( \ln (1 - e^x) \), following Martin Mächler.

log1pexp :: (Floating a, Ord a) => a -> a Source #

Computes \( \ln (1 + e^x) \), following Martin Mächler.

lsum :: (Floating a, Ord a) => [a] -> a Source #

Computes \( \ln ( \sum_i e^{x_i} ) \) sensibly. The list must be sorted in descending(!) order.

llerp :: (Floating a, Ord a) => a -> a -> a -> a Source #

Computes \( \ln \left( c e^x + (1-c) e^y \right) \).