| Safe Haskell | Safe |
|---|---|
| Language | Haskell98 |
Bio.Util
- wilson :: Double -> Int -> Int -> (Double, Double, Double)
- invnormcdf :: (Ord a, Floating a) => a -> a
- choose :: Integral a => a -> a -> a
- estimateComplexity :: (Integral a, Floating b, Ord b) => a -> a -> Maybe b
- showNum :: Show a => a -> String
- showOOM :: Double -> String
- float2mini :: RealFloat a => a -> Word8
- mini2float :: Fractional a => Word8 -> a
- log1p :: (Floating a, Ord a) => a -> a
- expm1 :: (Floating a, Ord a) => a -> a
- phredplus :: Double -> Double -> Double
- phredminus :: Double -> Double -> Double
- phredsum :: [Double] -> Double
- (<#>) :: Double -> Double -> Double
- phredconverse :: Double -> Double
Documentation
wilson :: Double -> Int -> Int -> (Double, Double, Double) Source
Random useful stuff I didn't know where to put.
calculates the Wilson Score interval.
If (l,m,h) = wilson c x n, then m is the binary proportion and
(l,h) it's c-confidence interval for x positive examples out of
n observations. c is typically something like 0.05.
invnormcdf :: (Ord a, Floating a) => a -> a Source
estimateComplexity :: (Integral a, Floating b, Ord b) => a -> a -> Maybe b Source
Try to estimate complexity of a whole from a sample. Suppose we
sampled total things and among those singles occured only once.
How many different things are there?
Let the total number be m. The copy number follows a Poisson
distribution with paramter lambda. Let z := e^{lambda}, then
we have:
P( 0 ) = e^{-lambda} = 1/z P( 1 ) = lambda e^{-lambda} = ln z / z P(>=1) = 1 - e^{-lambda} = 1 - 1/z
singles = m ln z / z total = m (1 - 1/z)
D := totalsingles = (1 - 1z) * z / ln z f := z - 1 - D ln z = 0
To get z, we solve using Newton iteration and then substitute to
get m:
dfdz = 1 - Dz z' := z - z (z - 1 - D ln z) / (z - D) m = singles * z /log z
It converges as long as the initial z is large enough, and 10D
(in the line for zz below) appears to work well.
float2mini :: RealFloat a => a -> Word8 Source
Conversion to 0.4.4 format minifloat: This minifloat fits into a byte. It has no sign, four bits of precision, and the range is from 0 to 63488, initially in steps of 1/8. Nice to store quality scores with reasonable precision and range.
mini2float :: Fractional a => Word8 -> a Source
Conversion from 0.4.4 format minifloat, see float2mini.
log1p :: (Floating a, Ord a) => a -> a Source
Computes log (1+x) to a relative precision of 10^-8 even for
very small x. Stolen from http://www.johndcook.com/cpp_log_one_plus_x.html
expm1 :: (Floating a, Ord a) => a -> a Source
Computes exp x - 1 to a relative precision of 10^-10 even for
very small x. Stolen from http://www.johndcook.com/cpp_expm1.html
phredplus :: Double -> Double -> Double infixl 3 Source
Computes -10 * log_10 (10 ** (-x/10) + 10 ** (-y/10)) without
losing precision. Used to add numbers on "the Phred scale",
otherwise known as (deci-)bans.
phredminus :: Double -> Double -> Double infixl 3 Source
Computes -10 * log_10 (10 ** (-x/10) - 10 ** (-y/10)) without
losing precision. Used to subtract numbers on "the Phred scale",
otherwise known as (deci-)bans.
phredsum :: [Double] -> Double Source
Computes -10 * log_10 (sum [10 ** (-x/10) | x <- xs]) without losing
precision.
phredconverse :: Double -> Double Source
Computes 1-p without leaving the "Phred scale"