Safe Haskell | None |
---|---|
Language | Haskell2010 |
tick
uses the rdtsc chipset to measure time performance of a computation.
The measurement unit - a Cycle
- is one oscillation of the chip crystal as measured by the rdtsc instruction which inspects the TSC register.
For reference, a computer with a frequency of 2 GHz means that one cycle is equivalent to 0.5 nanoseconds.
- type Cycle = Word64
- tick_ :: IO Cycle
- warmup :: Int -> IO Double
- tick :: (a -> b) -> a -> IO (Cycle, b)
- app :: t -> () -> t
- tickIO :: IO a -> IO (Cycle, a)
- ticks :: Int -> (a -> b) -> a -> IO ([Cycle], b)
- qtick :: Int -> (a -> b) -> a -> IO (Double, b)
- ticksIO :: Int -> IO a -> IO ([Cycle], a)
- tickns :: Int -> (a -> b) -> [a] -> IO ([[Cycle]], [b])
- force :: NFData a => a -> a
- replicateM' :: Monad m => Int -> m a -> m [a]
- average :: Foldable f => f Cycle -> Double
- deciles :: (Functor f, Foldable f) => Int -> f Cycle -> [Double]
- percentile :: (Functor f, Foldable f) => Double -> f Cycle -> Double
Documentation
>>>
:set -XNoImplicitPrelude
>>>
import Perf.Cycle
>>>
let n = 1000
>>>
let a = 1000
>>>
let f x = foldl' (+) 0 [1 .. x]
tick_ measures the number of cycles it takes to read the rdtsc chip twice: the difference is then how long it took to read the clock the second time.
Below are indicative measurements using tick_:
>>>
onetick <- tick_
>>>
ticks' <- replicateM 10 tick_
>>>
manyticks <- replicateM 1000000 tick_
>>>
let average = L.fold ((/) <$> L.sum <*> L.genericLength)
>>>
let avticks = average (fromIntegral <$> manyticks)
>>>
let qticks = deciles 10 manyticks
>>>
let tick999 = percentile 0.999 manyticks
one tick_: 78 cycles next 10: [20,18,20,20,20,20,18,16,20,20] average over 1m: 20.08 cycles 99.999% perc: 7,986 99.9% perc: 50.97 99th perc: 24.99 40th perc: 18.37 [min, 10th, 20th, .. 90th, max]: 12.00 16.60 17.39 17.88 18.37 18.86 19.46 20.11 20.75 23.04 5.447e5
The distribution of tick_ measurements is highly skewed, with the maximum being around 50k cycles, which is of the order of a GC. The important point on the distribution is around the 30th to 50th percentile, where you get a clean measure, usually free of GC activity and cache miss-fires
warmup :: Int -> IO Double Source #
Warm up the register, to avoid a high first measurement. Without a warmup, one or more larger values can occur at the start of a measurement spree, and often are in the zone of an L2 miss.
>>>
t <- tick_ -- first measure can be very high
>>>
_ <- warmup 100
>>>
t <- tick_ -- should be around 20 (3k for ghci)
tick :: (a -> b) -> a -> IO (Cycle, b) Source #
`tick f a` strictly applies a to f, and returns a (Cycle, f a)
>>>
_ <- warmup 100
>>>
(cs, _) <- tick f a
one tick: 197012 cycles average over 1000: 10222.79 cycles -- 10 cycles per operation [min, 30th, median, 90th, 99th, max]: 1.002e4 1.011e4 1.013e4 1.044e4 1.051e4 2.623e4
tickIO :: IO a -> IO (Cycle, a) Source #
evaluates and measures an `IO a`
>>>
(cs, _) <- tickIO (pure (f a))
ticks :: Int -> (a -> b) -> a -> IO ([Cycle], b) Source #
n measurements of a tick
returns a list of Cycles and the last evaluated f a
GHC is very good as memoization, and any of the functions that measuring a computation multiple times are fraught. When a computation actually gets memoized is an inexact science. Current readings are:
sum to 1000.0 Perf.ticks n f a 8.37e3 cycles Main.ticks n f a 8.38e3 cycles Perf.ticksIO n (pure $ f a) 8.38e3 cycles Perf.qtick n f a 8.38e3 cycles Main.qtick n f a 8.38e3 cycles replicateM n (tick f a) 8.37e3 cycles replicateM' n (tick f a) 9.74e3 cycles replicateM n (tickIO (pure (f a))) 1.21e4 cycles replicateM n (tick (app (f a)) ()) 9.72e3 cycles replicateM n (tick identity (f n)) 18.2 cycles replicateM n (tick (const (f a)) ()) 9.71e3 cycles (replicateM n . tick f) <$> [1,10,100,1000,10000]: 16.3 16.2 16.3 16.2 16.2 Perf.tickns n f [1,10,100,1000,10000]: 16.2 16.2 16.2 16.2 16.2
>>>
let n = 1000
>>>
(cs, fa) <- ticks n f a
qtick :: Int -> (a -> b) -> a -> IO (Double, b) Source #
returns the 40th percentile measurement and the last evaluated f a
>>>
(c, fa) <- qtick n f a
ticksIO :: Int -> IO a -> IO ([Cycle], a) Source #
n measuremenst of a tickIO
returns an IO tuple; list of Cycles and the last evaluated f a
>>>
(cs, fa) <- ticksIO n (pure $ f a)
tickns :: Int -> (a -> b) -> [a] -> IO ([[Cycle]], [b]) Source #
n measurements on each of a list of a's to be applied to f.
Currently memoizing it's ass off
tickns n f [1,10,100,1000]
replicateM' :: Monad m => Int -> m a -> m [a] Source #
a replicateM with good attributes
average :: Foldable f => f Cycle -> Double Source #
average of a Cycle foldable
cAv <- average <$> ticks n f a
deciles :: (Functor f, Foldable f) => Int -> f Cycle -> [Double] Source #
compute deciles
c5 <- decile 5 <$> ticks n f a
percentile :: (Functor f, Foldable f) => Double -> f Cycle -> Double Source #
compute a percentile
c <- percentoile 0.4 <$> ticks n f a