Copyright | (c) Levent Erkok |
---|---|
License | BSD3 |
Maintainer | erkokl@gmail.com |
Stability | experimental |
Safe Haskell | None |
Language | Haskell2010 |
Several examples involving IEEE-754 floating point numbers, i.e., single
precision Float
(SFloat
) and double precision Double
(SDouble
) types.
Note that arithmetic with floating point is full of surprises; due to precision
issues associativity of arithmetic operations typically do not hold. Also,
the presence of NaN
is always something to look out for.
Synopsis
- assocPlus :: SFloat -> SFloat -> SFloat -> SBool
- assocPlusRegular :: IO ThmResult
- nonZeroAddition :: IO ThmResult
- multInverse :: IO ThmResult
- roundingAdd :: IO SatResult
FP addition is not associative
assocPlus :: SFloat -> SFloat -> SFloat -> SBool Source #
Prove that floating point addition is not associative. For illustration purposes,
we will require one of the inputs to be a NaN
. We have:
>>>
prove $ assocPlus (0/0)
Falsifiable. Counter-example: s0 = 0.0 :: Float s1 = 0.0 :: Float
Indeed:
>>>
let i = 0/0 :: Float
>>>
i + (0.0 + 0.0)
NaN>>>
((i + 0.0) + 0.0)
NaN
But keep in mind that NaN
does not equal itself in the floating point world! We have:
>>>
let nan = 0/0 :: Float in nan == nan
False
assocPlusRegular :: IO ThmResult Source #
Prove that addition is not associative, even if we ignore NaN
/Infinity
values.
To do this, we use the predicate fpIsPoint
, which is true of a floating point
number (SFloat
or SDouble
) if it is neither NaN
nor Infinity
. (That is, it's a
representable point in the real-number line.)
We have:
>>>
assocPlusRegular
Falsifiable. Counter-example: x = 128.00029 :: Float y = -7.27236e-4 :: Float z = -6.875994e-3 :: Float
Indeed, we have:
>>>
let x = 128.00029 :: Float
>>>
let y = -7.27236e-4 :: Float
>>>
let z = -6.875994e-3 :: Float
>>>
x + (y + z)
127.99268>>>
(x + y) + z
127.99269
Note the difference in the results!
FP addition by non-zero can result in no change
nonZeroAddition :: IO ThmResult Source #
Demonstrate that a+b = a
does not necessarily mean b
is 0
in the floating point world,
even when we disallow the obvious solution when a
and b
are Infinity.
We have:
>>>
nonZeroAddition
Falsifiable. Counter-example: a = 5.060287e28 :: Float b = 3.6780381e19 :: Float
Indeed, we have:
>>>
let a = 5.060287e28 :: Float
>>>
let b = 3.6780381e19 :: Float
>>>
a + b == a
True>>>
b == 0
False
FP multiplicative inverses may not exist
multInverse :: IO ThmResult Source #
This example illustrates that a * (1/a)
does not necessarily equal 1
. Again,
we protect against division by 0
and NaN
/Infinity
.
We have:
>>>
multInverse
Falsifiable. Counter-example: a = 2.4907063e38 :: Float
Indeed, we have:
>>>
let a = 2.4907063e38 :: Float
>>>
a * (1/a)
1.0000001
Effect of rounding modes
roundingAdd :: IO SatResult Source #
One interesting aspect of floating-point is that the chosen rounding-mode
can effect the results of a computation if the exact result cannot be precisely
represented. SBV exports the functions fpAdd
, fpSub
, fpMul
, fpDiv
, fpFMA
and fpSqrt
which allows users to specify the IEEE supported RoundingMode
for
the operation. This example illustrates how SBV can be used to find rounding-modes
where, for instance, addition can produce different results. We have:
>>>
roundingAdd
Satisfiable. Model: rm = RoundTowardPositive :: RoundingMode x = -2.3509886e-38 :: Float y = -6.0e-45 :: Float
(Note that depending on your version of Z3, you might get a different result.)
Unfortunately we can't directly validate this result at the Haskell level, as Haskell only supports
RoundNearestTiesToEven
. We have:
>>>
-2.3509886e-38 + (-6.0e-45) :: Float
-2.3509893e-38
While we cannot directly see the result when the mode is RoundTowardPositive
in Haskell, we can use
SBV to provide us with that result thusly:
>>>
sat $ \z -> z .== fpAdd sRoundTowardPositive (-2.3509886e-38) (-6.0e-45 :: SFloat)
Satisfiable. Model: s0 = -2.350989e-38 :: Float
We can see why these two resuls are indeed different: The RoundTowardPositive
(which rounds towards positive infinity from zero) produces a larger result. Indeed, if we treat these numbers
as Double
values, we get:
> -2.3509886e-38 + (-6.0e-45) :: Double
- 2.3509892e-38
we see that the "more precise" result is larger than what the Float
value is, justifying the
larger value with RoundTowardPositive
. A more detailed study is beyond our current scope, so we'll
merely note that floating point representation and semantics is indeed a thorny
subject, and point to http://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numerics/Double/paper.pdf as
an excellent guide.