| Safe Haskell | None |
|---|---|
| Language | Haskell2010 |
Perf.Cycle
Contents
Description
tick uses the rdtsc chipset to measure time performance of a computation.
The measurement unit - a Cycle - is one oscillation of the chip crystal as measured by the rdtsc instruction which inspects the TSC register.
For reference, a computer with a frequency of 2 GHz means that one cycle is equivalent to 0.5 nanoseconds.
- type Cycle = Word64
- tick_ :: IO Cycle
- warmup :: Int -> IO Double
- tick :: (a -> b) -> a -> IO (Cycle, b)
- app :: t -> () -> t
- tickIO :: IO a -> IO (Cycle, a)
- ticks :: Int -> (a -> b) -> a -> IO ([Cycle], b)
- qtick :: Int -> (a -> b) -> a -> IO (Double, b)
- ticksIO :: Int -> IO a -> IO ([Cycle], a)
- tickns :: Int -> (a -> b) -> [a] -> IO ([[Cycle]], [b])
- force :: NFData a => a -> a
- replicateM' :: Monad m => Int -> m a -> m [a]
- average :: Foldable f => f Cycle -> Double
- deciles :: (Functor f, Foldable f) => Int -> f Cycle -> [Double]
- percentile :: (Functor f, Foldable f) => Double -> f Cycle -> Double
Documentation
>>>:set -XNoImplicitPrelude>>>import Perf.Cycle>>>let n = 1000>>>let a = 1000>>>let f x = foldl' (+) 0 [1 .. x]
tick_ measures the number of cycles it takes to read the rdtsc chip twice: the difference is then how long it took to read the clock the second time.
Below are indicative measurements using tick_:
>>>onetick <- tick_>>>ticks' <- replicateM 10 tick_>>>manyticks <- replicateM 1000000 tick_>>>let average = L.fold ((/) <$> L.sum <*> L.genericLength)>>>let avticks = average (fromIntegral <$> manyticks)>>>let qticks = deciles 10 manyticks>>>let tick999 = percentile 0.999 manyticks
one tick_: 78 cycles next 10: [20,18,20,20,20,20,18,16,20,20] average over 1m: 20.08 cycles 99.999% perc: 7,986 99.9% perc: 50.97 99th perc: 24.99 40th perc: 18.37 [min, 10th, 20th, .. 90th, max]: 12.00 16.60 17.39 17.88 18.37 18.86 19.46 20.11 20.75 23.04 5.447e5
The distribution of tick_ measurements is highly skewed, with the maximum being around 50k cycles, which is of the order of a GC. The important point on the distribution is around the 30th to 50th percentile, where you get a clean measure, usually free of GC activity and cache miss-fires
warmup :: Int -> IO Double Source #
Warm up the register, to avoid a high first measurement. Without a warmup, one or more larger values can occur at the start of a measurement spree, and often are in the zone of an L2 miss.
>>>t <- tick_ -- first measure can be very high>>>_ <- warmup 100>>>t <- tick_ -- should be around 20 (3k for ghci)
tick :: (a -> b) -> a -> IO (Cycle, b) Source #
`tick f a` strictly applies a to f, and returns a (Cycle, f a)
>>>_ <- warmup 100>>>(cs, _) <- tick f a
one tick: 197012 cycles average over 1000: 10222.79 cycles -- 10 cycles per operation [min, 30th, median, 90th, 99th, max]: 1.002e4 1.011e4 1.013e4 1.044e4 1.051e4 2.623e4
tickIO :: IO a -> IO (Cycle, a) Source #
evaluates and measures an `IO a`
>>>(cs, _) <- tickIO (pure (f a))
ticks :: Int -> (a -> b) -> a -> IO ([Cycle], b) Source #
n measurements of a tick
returns a list of Cycles and the last evaluated f a
GHC is very good as memoization, and any of the functions that measuring a computation multiple times are fraught. When a computation actually gets memoized is an inexact science. Current readings are:
sum to 1000.0 Perf.ticks n f a 8.37e3 cycles Main.ticks n f a 8.38e3 cycles Perf.ticksIO n (pure $ f a) 8.38e3 cycles Perf.qtick n f a 8.38e3 cycles Main.qtick n f a 8.38e3 cycles replicateM n (tick f a) 8.37e3 cycles replicateM' n (tick f a) 9.74e3 cycles replicateM n (tickIO (pure (f a))) 1.21e4 cycles replicateM n (tick (app (f a)) ()) 9.72e3 cycles replicateM n (tick identity (f n)) 18.2 cycles replicateM n (tick (const (f a)) ()) 9.71e3 cycles (replicateM n . tick f) <$> [1,10,100,1000,10000]: 16.3 16.2 16.3 16.2 16.2 Perf.tickns n f [1,10,100,1000,10000]: 16.2 16.2 16.2 16.2 16.2
>>>let n = 1000>>>(cs, fa) <- ticks n f a
qtick :: Int -> (a -> b) -> a -> IO (Double, b) Source #
returns the 40th percentile measurement and the last evaluated f a
>>>(c, fa) <- qtick n f a
ticksIO :: Int -> IO a -> IO ([Cycle], a) Source #
n measuremenst of a tickIO
returns an IO tuple; list of Cycles and the last evaluated f a
>>>(cs, fa) <- ticksIO n (pure $ f a)
tickns :: Int -> (a -> b) -> [a] -> IO ([[Cycle]], [b]) Source #
n measurements on each of a list of a's to be applied to f.
Currently memoizing it's ass off
tickns n f [1,10,100,1000]
replicateM' :: Monad m => Int -> m a -> m [a] Source #
a replicateM with good attributes
average :: Foldable f => f Cycle -> Double Source #
average of a Cycle foldable
cAv <- average <$> ticks n f a
deciles :: (Functor f, Foldable f) => Int -> f Cycle -> [Double] Source #
compute deciles
c5 <- decile 5 <$> ticks n f a
percentile :: (Functor f, Foldable f) => Double -> f Cycle -> Double Source #
compute a percentile
c <- percentoile 0.4 <$> ticks n f a