Safe Haskell | Safe-Inferred |
---|---|
Language | Haskell2010 |
regression-simple
provides (hopefully) simple regression functions.
The
is the simplest one.linear
:: Foldable f => (a -> (Double, Double)) -> f a -> V2
There are variants with weights, y-errors, and x and y-errors.
In addition, package includes Levenberg–Marquardt algorithm implementation
to fit arbitrary functions (with one, two or three parameters),
as long as you can give their partial derivatives as well (ad
package is handy for that).
For multiple independent variable ordinary least squares or Levenberg-Marquard with functions with > 3 parameter you should look elsewhere.
Package has been tested to return similar results as fit
functionality in gnuplot
(L-M doesn't always converge to exactly the same points in parameter space).
Synopsis
- linear :: Foldable f => (a -> (Double, Double)) -> f a -> V2
- linearFit :: Foldable f => (a -> (Double, Double)) -> f a -> Fit V2
- linearWithWeights :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V2
- linearWithYerrors :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V2
- linearWithXYerrors :: Foldable f => (a -> (Double, Double, Double, Double)) -> f a -> NonEmpty (Fit V2)
- linearFit' :: LinRegAcc -> Fit V2
- data LinRegAcc = LinRegAcc {}
- zeroLinRegAcc :: LinRegAcc
- addLinReg :: LinRegAcc -> Double -> Double -> LinRegAcc
- addLinRegW :: LinRegAcc -> Double -> Double -> Double -> LinRegAcc
- quadratic :: Foldable f => (a -> (Double, Double)) -> f a -> V3
- quadraticFit :: Foldable f => (a -> (Double, Double)) -> f a -> Fit V3
- quadraticWithWeights :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V3
- quadraticWithYerrors :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V3
- quadraticWithXYerrors :: Foldable f => (a -> (Double, Double, Double, Double)) -> f a -> NonEmpty (Fit V3)
- quadraticFit' :: QuadRegAcc -> Fit V3
- data QuadRegAcc = QuadRegAcc {}
- zeroQuadRegAcc :: QuadRegAcc
- addQuadReg :: QuadRegAcc -> Double -> Double -> QuadRegAcc
- addQuadRegW :: QuadRegAcc -> Double -> Double -> Double -> QuadRegAcc
- quadRegAccToLin :: QuadRegAcc -> LinRegAcc
- levenbergMarquardt1 :: Foldable f => (Double -> a -> (Double, Double, Double)) -> Double -> f a -> NonEmpty (Fit Double)
- levenbergMarquardt1WithWeights :: Foldable f => (Double -> a -> (Double, Double, Double, Double)) -> Double -> f a -> NonEmpty (Fit Double)
- levenbergMarquardt1WithYerrors :: Foldable f => (Double -> a -> (Double, Double, Double, Double)) -> Double -> f a -> NonEmpty (Fit Double)
- levenbergMarquardt1WithXYerrors :: Foldable f => (Double -> a -> (Double, Double, Double, Double, Double, Double)) -> Double -> f a -> NonEmpty (Fit Double)
- levenbergMarquardt2 :: Foldable f => (V2 -> a -> (Double, Double, V2)) -> V2 -> f a -> NonEmpty (Fit V2)
- levenbergMarquardt2WithWeights :: Foldable f => (V2 -> a -> (Double, Double, V2, Double)) -> V2 -> f a -> NonEmpty (Fit V2)
- levenbergMarquardt2WithYerrors :: Foldable f => (V2 -> a -> (Double, Double, V2, Double)) -> V2 -> f a -> NonEmpty (Fit V2)
- levenbergMarquardt2WithXYerrors :: Foldable f => (V2 -> a -> (Double, Double, V2, Double, Double, Double)) -> V2 -> f a -> NonEmpty (Fit V2)
- levenbergMarquardt3 :: Foldable f => (V3 -> a -> (Double, Double, V3)) -> V3 -> f a -> NonEmpty (Fit V3)
- levenbergMarquardt3WithWeights :: Foldable f => (V3 -> a -> (Double, Double, V3, Double)) -> V3 -> f a -> NonEmpty (Fit V3)
- levenbergMarquardt3WithYerrors :: Foldable f => (V3 -> a -> (Double, Double, V3, Double)) -> V3 -> f a -> NonEmpty (Fit V3)
- levenbergMarquardt3WithXYerrors :: Foldable f => (V3 -> a -> (Double, Double, V3, Double, Double, Double)) -> V3 -> f a -> NonEmpty (Fit V3)
- data Fit v = Fit {}
- data V2 = V2 !Double !Double
- data V3 = V3 !Double !Double !Double
Linear regression
linear :: Foldable f => (a -> (Double, Double)) -> f a -> V2 Source #
Linear regression.
>>>
let input1 = [(0, 1), (1, 3), (2, 5)]
>>>
PP $ linear id input1
V2 2.0000 1.00000
>>>
let input2 = [(0.1, 1.2), (1.3, 3.1), (1.9, 4.9), (3.0, 7.1), (4.1, 9.0)]
>>>
PP $ linear id input2
V2 2.0063 0.88685
linearFit :: Foldable f => (a -> (Double, Double)) -> f a -> Fit V2 Source #
Like linear
but returns complete Fit
.
To get confidence intervals you should multiply the errors
by quantile (studentT (n - 2)) ci'
from statistics
package
or similar.
For big n
using value 1 gives 68% interval and using value 2 gives 95% confidence interval.
See https://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values
(quantile
calculates one-sided values, you need two-sided, thus adjust ci
value).
The first input is perfect fit:
>>>
let fit = linearFit id input1
>>>
PP fit
Fit (V2 2.0000 1.00000) (V2 0.00000 0.00000) 1 0.00000
The second input is quite good:
>>>
PP $ linearFit id input2
Fit (V2 2.0063 0.88685) (V2 0.09550 0.23826) 3 0.25962
But the third input isn't so much, standard error of a slope parameter is 20%.
>>>
let input3 = [(0, 2), (1, 3), (2, 6), (3, 11)]
>>>
PP $ linearFit id input3
Fit (V2 3.0000 1.00000) (V2 0.63246 1.1832) 2 4.0000
linearWithWeights :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V2 Source #
Weighted linear regression.
>>>
let input2 = [(0.1, 1.2), (1.3, 3.1), (1.9, 4.9), (3.0, 7.1), (4.1, 9.0)]
>>>
PP $ linearFit id input2
Fit (V2 2.0063 0.88685) (V2 0.09550 0.23826) 3 0.25962
>>>
let input2w = [(0.1, 1.2, 1), (1.3, 3.1, 1), (1.9, 4.9, 1), (3.0, 7.1, 1/4), (4.1, 9.0, 1/4)]
>>>
PP $ linearWithWeights id input2w
Fit (V2 2.0060 0.86993) (V2 0.12926 0.23696) 3 0.22074
linearWithYerrors :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V2 Source #
Linear regression with y-errors.
>>>
let input2y = [(0.1, 1.2, 0.12), (1.3, 3.1, 0.31), (1.9, 4.9, 0.49), (3.0, 7.1, 0.71), (4.1, 9.0, 1.9)]
>>>
let fit = linearWithYerrors id input2y
>>>
PP fit
Fit (V2 1.9104 0.98302) (V2 0.13006 0.10462) 3 2.0930
When we know actual y-errors, we can calculate the Q-value using statistics
package:
>>>
import qualified Statistics.Distribution as S
>>>
import qualified Statistics.Distribution.ChiSquared as S
>>>
S.cumulative (S.chiSquared (fitNDF fit)) (fitWSSR fit)
0.446669639443138
or using math-functions
>>>
import Numeric.SpecFunctions (incompleteGamma)
>>>
incompleteGamma (fromIntegral (fitNDF fit) / 2) (fitWSSR fit / 2)
0.446669639443138
It is not uncommon to deem acceptable on equal terms any models with, say, Q > 0.001. If Q is too large, too near to 1 is most likely caused by overestimating the y-errors.
:: Foldable f | |
=> (a -> (Double, Double, Double, Double)) | \(x_i, y_i, \delta x_i, \delta y_i\) |
-> f a | data |
-> NonEmpty (Fit V2) |
Iterative linear regression with x and y errors.
Orear, J. (1982). Least squares when both variables have uncertainties. American Journal of Physics, 50(10), 912–916. doi:10.1119/1.12972
>>>
let input2xy = [(0.1, 1.2, 0.01, 0.12), (1.3, 3.1, 0.13, 0.31), (1.9, 4.9, 0.19, 0.49), (3.0, 7.1, 0.3, 0.71), (4.1, 9.0, 0.41, 1.9)]
>>>
let fit :| fits = linearWithXYerrors id input2xy
First fit is done using linearWithYerrors
:
>>>
PP fit
Fit (V2 1.9104 0.98302) (V2 0.13006 0.10462) 3 2.0930
After that the effective variance is used to refine the fit, just a few iterations is often enough:
>>>
PP $ take 3 fits
Fit (V2 1.9092 0.99251) (V2 0.12417 0.08412) 3 1.2992 Fit (V2 1.9092 0.99250) (V2 0.12418 0.08414) 3 1.2998 Fit (V2 1.9092 0.99250) (V2 0.12418 0.08414) 3 1.2998
Step-by-step interface
Linear regression accumulator.
zeroLinRegAcc :: LinRegAcc Source #
All-zeroes LinRegAcc
.
Add a weighted point to linreg accumulator.
Quadratic regression
quadratic :: Foldable f => (a -> (Double, Double)) -> f a -> V3 Source #
Quadratic regression.
>>>
let input1 = [(0, 1), (1, 3), (2, 5)]
>>>
quadratic id input1
V3 0.0 2.0 1.0
>>>
let input2 = [(0.1, 1.2), (1.3, 3.1), (1.9, 4.9), (3.0, 7.1), (4.1, 9.0)]
>>>
PP $ quadratic id input2
V3 (-0.00589) 2.0313 0.87155
>>>
let input3 = [(0, 2), (1, 3), (2, 6), (3, 11)]
>>>
PP $ quadratic id input3
V3 1.00000 0.00000 2.0000
quadraticWithWeights :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V3 Source #
Weighted quadratic regression.
>>>
let input2w = [(0.1, 1.2, 1), (1.3, 3.1, 1), (1.9, 4.9, 1), (3.0, 7.1, 1/4), (4.1, 9.0, 1/4)]
>>>
PP $ quadraticWithWeights id input2w
Fit (V3 0.02524 1.9144 0.91792) (V3 0.10775 0.42106 0.35207) 2 0.21484
quadraticWithYerrors :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V3 Source #
Quadratic regression with y-errors.
>>>
let input2y = [(0.1, 1.2, 0.12), (1.3, 3.1, 0.31), (1.9, 4.9, 0.49), (3.0, 7.1, 0.71), (4.1, 9.0, 0.9)]
>>>
PP $ quadraticWithYerrors id input2y
Fit (V3 0.08776 1.6667 1.0228) (V3 0.10131 0.31829 0.11917) 2 1.5398
quadraticWithXYerrors Source #
:: Foldable f | |
=> (a -> (Double, Double, Double, Double)) | \(x_i, y_i, \delta x_i, \delta y_i\) |
-> f a | data |
-> NonEmpty (Fit V3) |
Iterative quadratic regression with x and y errors.
Orear, J. (1982). Least squares when both variables have uncertainties. American Journal of Physics, 50(10), 912–916. doi:10.1119/1.12972
Step-by-step interface
quadraticFit' :: QuadRegAcc -> Fit V3 Source #
Calculate quadratic fit from QuadRegAcc
.
data QuadRegAcc Source #
Quadratic regression accumulator.
Instances
Show QuadRegAcc Source # | |
Defined in Math.Regression.Simple showsPrec :: Int -> QuadRegAcc -> ShowS # show :: QuadRegAcc -> String # showList :: [QuadRegAcc] -> ShowS # | |
NFData QuadRegAcc Source # | |
Defined in Math.Regression.Simple rnf :: QuadRegAcc -> () # |
zeroQuadRegAcc :: QuadRegAcc Source #
All-zeroes QuadRegAcc
.
:: QuadRegAcc | |
-> Double | x |
-> Double | y |
-> QuadRegAcc |
Add a point to quadreg accumulator.
:: QuadRegAcc | |
-> Double | x |
-> Double | y |
-> Double | w |
-> QuadRegAcc |
Add a weighted point to quadreg accumulator.
quadRegAccToLin :: QuadRegAcc -> LinRegAcc Source #
Convert QuadRegAcc
to LinRegAcc
.
Using this we can try quadratic and linear fits with a single data scan.
Levenberg–Marquardt algorithm
One parameter
:: Foldable f | |
=> (Double -> a -> (Double, Double, Double)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \partial_\beta f(\beta, x_i)\) |
-> Double | initial parameter, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit Double) | non-empty list of iteration results |
Levenberg–Marquardt for functions with one parameter.
See levenbergMarquardt2
for examples, this is very similar.
For example we can fit \(f = x \mapsto \beta x + 1\), its derivative is \(\partial_\beta f = x \mapsto x\).
>>>
let scale a (x, y) = (y, a * x + 1, x)
>>>
PP $ NE.last $ levenbergMarquardt1 scale 1 input2
Fit 1.9685 0.04735 4 0.27914
Not bad, but worse then linear fit which fits the intercept point too.
levenbergMarquardt1WithWeights Source #
:: Foldable f | |
=> (Double -> a -> (Double, Double, Double, Double)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \partial_\beta f(\beta, x_i), w_i\) |
-> Double | initial parameter, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit Double) | non-empty list of iteration results |
levenbergMarquardt1
with weights.
levenbergMarquardt1WithYerrors Source #
:: Foldable f | |
=> (Double -> a -> (Double, Double, Double, Double)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \partial_\beta f(\beta, x_i), \delta y_i\) |
-> Double | initial parameter, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit Double) | non-empty list of iteration results |
levenbergMarquardt1
with Y-errors.
levenbergMarquardt1WithXYerrors Source #
:: Foldable f | |
=> (Double -> a -> (Double, Double, Double, Double, Double, Double)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \partial_\beta f(\beta, x_i), \partial_x f(\beta, x_i), \delta x_i, \delta y_i\) |
-> Double | initial parameter, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit Double) | non-empty list of iteration results |
levenbergMarquardt1
with XY-errors.
Two parameters
:: Foldable f | |
=> (V2 -> a -> (Double, Double, V2)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i)\) |
-> V2 | initial parameters, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit V2) | non-empty list of iteration results |
Levenberg–Marquardt for functions with two parameters.
You can use this sledgehammer to do a a linear fit:
>>>
let lin (V2 a b) (x, y) = (y, a * x + b, V2 x 1)
We can then use levenbergMarquardt2
to find a fit:
>>>
PP $ levenbergMarquardt2 lin (V2 1 1) input2
Fit (V2 1.00000 1.00000) (V2 1.0175 2.5385) 3 29.470 Fit (V2 1.2782 1.4831) (V2 0.57784 1.4416) 3 9.5041 Fit (V2 1.7254 1.4730) (V2 0.18820 0.46952) 3 1.0082 Fit (V2 1.9796 0.95226) (V2 0.09683 0.24157) 3 0.26687 Fit (V2 2.0060 0.88759) (V2 0.09550 0.23826) 3 0.25962 Fit (V2 2.0063 0.88685) (V2 0.09550 0.23826) 3 0.25962
This is the same result what linearFit
returns:
>>>
PP $ linearFit id input2
Fit (V2 2.0063 0.88685) (V2 0.09550 0.23826) 3 0.25962
Using AD
You can use ad
to calculate derivatives for you.
>>>
import qualified Numeric.AD.Mode.Reverse.Double as AD
We need a (Traversable
) homogenic triple to represent the two parameters and x
:
>>>
data H3 a = H3 a a a deriving (Functor, Foldable, Traversable)
Then we define a function ad
can operate with:
>>>
let linearF (H3 a b x) = a * x + b
which we can use to fit the curve in generic way:
>>>
let lin' (V2 a b) (x, y) = case AD.grad' linearF (H3 a b x) of (f, H3 da db _f') -> (y, f, V2 da db)
>>>
PP $ levenbergMarquardt2 lin' (V2 1 1) input2
Fit (V2 1.00000 1.00000) (V2 1.0175 2.5385) 3 29.470 Fit (V2 1.2782 1.4831) (V2 0.57784 1.4416) 3 9.5041 Fit (V2 1.7254 1.4730) (V2 0.18820 0.46952) 3 1.0082 Fit (V2 1.9796 0.95226) (V2 0.09683 0.24157) 3 0.26687 Fit (V2 2.0060 0.88759) (V2 0.09550 0.23826) 3 0.25962 Fit (V2 2.0063 0.88685) (V2 0.09550 0.23826) 3 0.25962
Non-polynomial example
We can fit other curves too, for example an example from Wikipedia https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm#Example
>>>
let rateF (H3 vmax km s) = (vmax * s) / (km + s)
>>>
let rateF' (V2 vmax km) (x, y) = case AD.grad' rateF (H3 vmax km x) of (f, H3 vmax' km' _) -> (y, f, V2 vmax' km')
>>>
let input = zip [0.038,0.194,0.425,0.626,1.253,2.500,3.740] [0.050,0.127,0.094,0.2122,0.2729,0.2665,0.3317]
>>>
PP $ levenbergMarquardt2 rateF' (V2 0.9 0.2) input
Fit (V2 0.90000 0.20000) (V2 0.43304 0.43936) 5 1.4455 Fit (V2 0.61786 0.36360) (V2 0.23270 0.50259) 5 0.26730 Fit (V2 0.39270 0.49787) (V2 0.05789 0.24170) 5 0.01237 Fit (V2 0.36121 0.54525) (V2 0.04835 0.23315) 5 0.00785 Fit (V2 0.36168 0.55530) (V2 0.04880 0.23790) 5 0.00784 Fit (V2 0.36182 0.55620) (V2 0.04885 0.23826) 5 0.00784 Fit (V2 0.36184 0.55626) (V2 0.04885 0.23829) 5 0.00784
We get the same result as in the article: 0.362 and 0.556
The algorithm terminates when a scaling parameter \(\lambda\) becomes larger than 1e20 or smaller than 1e-20, or relative WSSR change is smaller than 1e-10, or sum-of-squared-residuals candidate becomes NaN
(i.e. when it would start to produce garbage).
You may want to terminate sooner, Numerical Recipes suggest to stop when WSSR decreases by a neglible amount absolutely or fractionally.
levenbergMarquardt2WithWeights Source #
:: Foldable f | |
=> (V2 -> a -> (Double, Double, V2, Double)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), w_i\) |
-> V2 | initial parameters, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit V2) | non-empty list of iteration results |
levenbergMarquardt2
with weights.
Because levenbergMarquardt2
is an iterative algorithm,
not only we can use it to fit curves with known y-errors (levenbergMarquardt2WithYerrors
),
but also with both x and y-errors (levenbergMarquardt2WithXYerrors
).
levenbergMarquardt2WithYerrors Source #
:: Foldable f | |
=> (V2 -> a -> (Double, Double, V2, Double)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), \delta y_i\) |
-> V2 | initial parameters, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit V2) | non-empty list of iteration results |
levenbergMarquardt2
with Y-errors.
levenbergMarquardt2WithXYerrors Source #
:: Foldable f | |
=> (V2 -> a -> (Double, Double, V2, Double, Double, Double)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), \partial_x f(\beta, x_i), \delta x_i, \delta y_i\) |
-> V2 | initial parameters, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit V2) | non-empty list of iteration results |
levenbergMarquardt2
with XY-errors.
Three parameters
:: Foldable f | |
=> (V3 -> a -> (Double, Double, V3)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i)\) |
-> V3 | initial parameters, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit V3) | non-empty list of iteration results |
Levenberg–Marquardt for functions with three parameters.
See levenbergMarquardt2
for examples, this is very similar.
>>>
let quad (V3 a b c) (x, y) = (y, a * x * x + b * x + c, V3 (x * x) x 1)
>>>
PP $ NE.last $ levenbergMarquardt3 quad (V3 2 2 2) input3
Fit (V3 1.00000 (-0.00000) 2.0000) (V3 0.00000 0.00000 0.00000) 1 0.00000
Same as quadratic fit, just less direct:
>>>
PP $ quadraticFit id input3
Fit (V3 1.00000 0.00000 2.0000) (V3 0.00000 0.00000 0.00000) 1 0.00000
levenbergMarquardt3WithWeights Source #
:: Foldable f | |
=> (V3 -> a -> (Double, Double, V3, Double)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), w_i\) |
-> V3 | initial parameters, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit V3) | non-empty list of iteration results |
levenbergMarquardt3
with weights.
levenbergMarquardt3WithYerrors Source #
:: Foldable f | |
=> (V3 -> a -> (Double, Double, V3, Double)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), \delta y_i\) |
-> V3 | initial parameters, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit V3) | non-empty list of iteration results |
levenbergMarquardt3
with Y-errors.
levenbergMarquardt3WithXYerrors Source #
:: Foldable f | |
=> (V3 -> a -> (Double, Double, V3, Double, Double, Double)) | \(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), \partial_x f(\beta, x_i), \delta x_i, \delta y_i\) |
-> V3 | initial parameters, \(\beta_0\) |
-> f a | data, \(d\) |
-> NonEmpty (Fit V3) | non-empty list of iteration results |
levenbergMarquardt3
with XY-errors.
Auxiliary types
Result of a curve fit.
2d vector. Strict pair of Double
s.
Also used to represent linear polynomial: V2 a b
\(= a x + b\).
3d vector. Strict triple of Double
s.
Also used to represent quadratic polynomial: V3 a b c
\(= a x^2 + b x + c\).