tdigest-0.2.1: On-line accumulation of rank-based statistics

Safe HaskellNone
LanguageHaskell2010

Data.TDigest.Tree.Internal

Contents

Description

Internals of TDigest.

Tree implementation is based on Adams’ Trees Revisited by Milan Straka http://fox.ucw.cz/papers/bbtree/bbtree.pdf

Synopsis

Documentation

data TDigest (compression :: Nat) Source #

TDigest is a tree of centroids.

compression is a 1/δ. The greater the value of compression the less likely value merging will happen.

Constructors

Node !Size !Mean !Weight !Weight !(TDigest compression) !(TDigest compression)

Tree node

Nil

Empty tree

Instances
KnownNat comp => Reducer Double (TDigest comp) Source #

Both cons and snoc are insert

Instance details

Defined in Data.TDigest.Tree.Internal

Methods

unit :: Double -> TDigest comp #

snoc :: TDigest comp -> Double -> TDigest comp #

cons :: Double -> TDigest comp -> TDigest comp #

Show (TDigest compression) Source # 
Instance details

Defined in Data.TDigest.Tree.Internal

Methods

showsPrec :: Int -> TDigest compression -> ShowS #

show :: TDigest compression -> String #

showList :: [TDigest compression] -> ShowS #

KnownNat comp => Semigroup (TDigest comp) Source # 
Instance details

Defined in Data.TDigest.Tree.Internal

Methods

(<>) :: TDigest comp -> TDigest comp -> TDigest comp #

sconcat :: NonEmpty (TDigest comp) -> TDigest comp #

stimes :: Integral b => b -> TDigest comp -> TDigest comp #

KnownNat comp => Monoid (TDigest comp) Source # 
Instance details

Defined in Data.TDigest.Tree.Internal

Methods

mempty :: TDigest comp #

mappend :: TDigest comp -> TDigest comp -> TDigest comp #

mconcat :: [TDigest comp] -> TDigest comp #

KnownNat comp => Binary (TDigest comp) Source #

TDigest isn't compressed after de-serialisation, but it can be still smaller.

Instance details

Defined in Data.TDigest.Tree.Internal

Methods

put :: TDigest comp -> Put #

get :: Get (TDigest comp) #

putList :: [TDigest comp] -> Put #

NFData (TDigest comp) Source #

TDigest has only strict fields.

Instance details

Defined in Data.TDigest.Tree.Internal

Methods

rnf :: TDigest comp -> () #

HasHistogram (TDigest comp) Maybe Source # 
Instance details

Defined in Data.TDigest.Tree.Internal

But*, the benefit vs. code explosion is not yet worth.

totalWeight :: TDigest comp -> Weight Source #

Total count of samples.

>>> totalWeight (tdigest [1..100] :: TDigest 5)
100.0

size :: TDigest comp -> Int Source #

minimumValue :: TDigest comp -> Mean Source #

Center of left-most centroid. Note: may be different than min element inserted.

>>> minimumValue (tdigest [1..100] :: TDigest 3)
1.0

maximumValue :: TDigest comp -> Mean Source #

Center of right-most centroid. Note: may be different than max element inserted.

>>> maximumValue (tdigest [1..100] :: TDigest 3)
99.0

combineDigest :: KnownNat comp => TDigest comp -> TDigest comp -> TDigest comp Source #

insertCentroid :: forall comp. KnownNat comp => Centroid -> TDigest comp -> TDigest comp Source #

node :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp Source #

Constructor which calculates size and total weight.

balanceR :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp Source #

Balance after right insertion.

balanceL :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp Source #

Balance after left insertion.

node' :: Int -> Mean -> Weight -> Weight -> TDigest comp -> TDigest comp -> TDigest comp Source #

Alias to Node

singNode :: Mean -> Weight -> TDigest comp Source #

Create singular node.

combinedCentroid :: Mean -> Weight -> Mean -> Weight -> Centroid Source #

Add two weighted means together.

threshold Source #

Arguments

:: Double

total weight

-> Double

quantile

-> Double

compression (1/δ)

-> Double 

Calculate the threshold, i.e. maximum weight of centroid.

compress :: forall comp. KnownNat comp => TDigest comp -> TDigest comp Source #

Compress TDigest.

Reinsert the centroids in "better" order (in original paper: in random) so they have opportunity to merge.

Compression will happen only if size is both: bigger than relMaxSize * comp and bigger than absMaxSize.

forceCompress :: forall comp. KnownNat comp => TDigest comp -> TDigest comp Source #

Perform compression, even if current size says it's not necessary.

toMVector Source #

Arguments

:: KnownNat comp 
=> TDigest comp

t-Digest

-> ST s (MVector s (Centroid, Double))

return also a "space left in the centroid" value for "shuffling"

relMaxSize :: Int Source #

Relative size parameter. Hard-coded value: 25.

absMaxSize :: Int Source #

Absolute size parameter. Hard-coded value: 1000.

debugPrint :: TDigest comp -> IO () Source #

Output the TDigest tree.

validate :: TDigest comp -> Either String (TDigest comp) Source #

Check various invariants in the TDigest tree.

insert Source #

Arguments

:: KnownNat comp 
=> Double

element

-> TDigest comp 
-> TDigest comp 

Insert single value into TDigest.

insert' Source #

Arguments

:: KnownNat comp 
=> Double

element

-> TDigest comp 
-> TDigest comp 

Insert single value, don't compress TDigest even if needed.

For sensibly bounded input, it makes sense to let TDigest grow (it might grow linearly in size), and after that compress it once.

singleton :: KnownNat comp => Double -> TDigest comp Source #

Make a TDigest of a single data point.

tdigest :: (Foldable f, KnownNat comp) => f Double -> TDigest comp Source #

Strict foldl' over Foldable structure.

>>> :set -XDataKinds