biohazard-0.6.2: bioinformatics support library

Bio.Util

Synopsis

# Documentation

wilson :: Double -> Int -> Int -> (Double, Double, Double) Source

Random useful stuff I didn't know where to put.

calculates the Wilson Score interval. If `(l,m,h) = wilson c x n`, then `m` is the binary proportion and `(l,h)` it's `c`-confidence interval for `x` positive examples out of `n` observations. `c` is typically something like 0.05.

invnormcdf :: (Ord a, Floating a) => a -> a Source

choose :: Integral a => a -> a -> a Source

Binomial coefficient: `n choose k == n! / ((n-k)! k!)`

estimateComplexity :: (Integral a, Floating b, Ord b) => a -> a -> Maybe b Source

Try to estimate complexity of a whole from a sample. Suppose we sampled `total` things and among those `singles` occured only once. How many different things are there?

Let the total number be `m`. The copy number follows a Poisson distribution with paramter `lambda`. Let `z := e^{lambda}`, then we have:

P( 0 ) = e^{-lambda} = 1/z P( 1 ) = lambda e^{-lambda} = ln z / z P(>=1) = 1 - e^{-lambda} = 1 - 1/z

singles = m ln z / z total = m (1 - 1/z)

D := totalsingles = (1 - 1z) * z / ln z f := z - 1 - D ln z = 0

To get `z`, we solve using Newton iteration and then substitute to get `m`:

dfdz = 1 - Dz z' := z - z (z - 1 - D ln z) / (z - D) m = singles * z /log z

It converges as long as the initial `z` is large enough, and `10D` (in the line for `zz` below) appears to work well.

showNum :: Show a => a -> String Source

float2mini :: RealFloat a => a -> Word8 Source

Conversion to 0.4.4 format minifloat: This minifloat fits into a byte. It has no sign, four bits of precision, and the range is from 0 to 63488, initially in steps of 1/8. Nice to store quality scores with reasonable precision and range.

mini2float :: Fractional a => Word8 -> a Source

Conversion from 0.4.4 format minifloat, see `float2mini`.

log1p :: (Floating a, Ord a) => a -> a Source

Computes `log (1+x)` to a relative precision of `10^-8` even for very small `x`. Stolen from http://www.johndcook.com/cpp_log_one_plus_x.html

expm1 :: (Floating a, Ord a) => a -> a Source

Computes `exp x - 1` to a relative precision of `10^-10` even for very small `x`. Stolen from http://www.johndcook.com/cpp_expm1.html

phredplus :: Double -> Double -> Double infixl 3 Source

Computes `-10 * log_10 (10 ** (-x/10) + 10 ** (-y/10))` without losing precision. Used to add numbers on "the Phred scale", otherwise known as (deci-)bans.

phredminus :: Double -> Double -> Double infixl 3 Source

Computes `-10 * log_10 (10 ** (-x/10) - 10 ** (-y/10))` without losing precision. Used to subtract numbers on "the Phred scale", otherwise known as (deci-)bans.

phredsum :: [Double] -> Double Source

Computes `-10 * log_10 (sum [10 ** (-x/10) | x <- xs])` without losing precision.

(<#>) :: Double -> Double -> Double infixl 3 Source

Computes `1-p` without leaving the "Phred scale"