tdigest: On-line accumulation of rank-based statistics

[ bsd3, library, numeric ] [ Propose Tags ]

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl for more details https://github.com/tdunning/t-digest/blob/07b8f2ca2be8d0a9f04df2feadad5ddc1bb73c88/docs/t-digest-paper/histo.pdf.


[Skip to Readme]

Downloads

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0, 0.1, 0.2, 0.2.1, 0.2.1.1, 0.3
Change log CHANGELOG.md
Dependencies base (>=4.7 && <4.21), base-compat (>=0.10.1 && <0.14), binary (>=0.7.1.0 && <0.10), deepseq (>=1.3.0.2 && <1.6), foldable1-classes-compat (>=0.1 && <0.2), reducers (>=3.12.2 && <3.13), semigroups (>=0.18.4 && <0.21), transformers (>=0.3 && <0.7), vector (>=0.12.0.1 && <0.14), vector-algorithms (>=0.7.0.1 && <0.10) [details]
License BSD-3-Clause
Author Oleg Grenrus <oleg.grenrus@iki.fi>
Maintainer Oleg Grenrus <oleg.grenrus@iki.fi>
Revised Revision 1 made by phadej at 2023-11-21T21:03:37Z
Category Numeric
Home page https://github.com/phadej/haskell-tdigest#readme
Bug tracker https://github.com/phadej/haskell-tdigest/issues
Source repo head: git clone https://github.com/phadej/haskell-tdigest(tdigest)
Uploaded by phadej at 2023-04-11T17:54:42Z
Distributions Arch:0.3, LTSHaskell:0.3, NixOS:0.3, Stackage:0.3
Reverse Dependencies 11 direct, 19 indirect [details]
Downloads 10778 total (119 in the last 30 days)
Rating 2.0 (votes: 1) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2023-04-11 [all 1 reports]

Readme for tdigest-0.3

[back to package description]

tdigest

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl

Synopsis

λ *Data.TDigest > median (tdigest [1..1000] :: TDigest 3)
Just 499.0090729817737

Benchmarks

Using 50M exponentially distributed numbers:

  • average: 16s; incorrect approximation of median, mostly to measure prng speed
  • sorting using vector-algorithms: 33s; using 1000MB of memory
  • sparking t-digest (using some par): 53s
  • buffered t-digest: 68s
  • sequential t-digest: 65s

Example histogram

tdigest-simple -m tdigest -d standard -s 100000 -c 10 -o output.svg -i 34
cp output.svg example.svg
inkscape --export-png=example.png --export-dpi=80 --export-background-opacity=0 --without-gui example.svg

Example