tdigest: On-line accumulation of rank-based statistics

[ bsd3, library, numeric ] [ Propose Tags ]

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl for more details https://github.com/tdunning/t-digest/blob/07b8f2ca2be8d0a9f04df2feadad5ddc1bb73c88/docs/t-digest-paper/histo.pdf.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0, 0.1, 0.2, 0.2.1, 0.2.1.1, 0.3, 0.3.1
Change log CHANGELOG.md
Dependencies base (>=4.12.0.0 && <4.22), binary (>=0.8.6.0 && <0.10), deepseq (>=1.4.4.0 && <1.6), foldable1-classes-compat (>=0.1 && <0.2), reducers (>=3.12.2 && <3.13), transformers (>=0.5.6.2 && <0.7), vector (>=0.12.0.1 && <0.14), vector-algorithms (>=0.7.0.1 && <0.10) [details]
License BSD-3-Clause
Author Oleg Grenrus <oleg.grenrus@iki.fi>
Maintainer Oleg Grenrus <oleg.grenrus@iki.fi>
Category Numeric
Home page https://github.com/phadej/haskell-tdigest#readme
Bug tracker https://github.com/phadej/haskell-tdigest/issues
Source repo head: git clone https://github.com/phadej/haskell-tdigest(tdigest)
Uploaded by phadej at 2024-07-03T13:03:21Z
Distributions Arch:0.3, LTSHaskell:0.3.1, NixOS:0.3, Stackage:0.3.1
Reverse Dependencies 11 direct, 19 indirect [details]
Downloads 11290 total (158 in the last 30 days)
Rating 2.0 (votes: 1) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2024-07-03 [all 1 reports]

Readme for tdigest-0.3.1

[back to package description]

tdigest

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl

Synopsis

λ *Data.TDigest > median (tdigest [1..1000] :: TDigest 3)
Just 499.0090729817737

Benchmarks

Using 50M exponentially distributed numbers:

  • average: 16s; incorrect approximation of median, mostly to measure prng speed
  • sorting using vector-algorithms: 33s; using 1000MB of memory
  • sparking t-digest (using some par): 53s
  • buffered t-digest: 68s
  • sequential t-digest: 65s

Example histogram

tdigest-simple -m tdigest -d standard -s 100000 -c 10 -o output.svg -i 34
cp output.svg example.svg
inkscape --export-png=example.png --export-dpi=80 --export-background-opacity=0 --without-gui example.svg

Example