datasets: Classical data sets for statistics and machine learning

[ data, data-mining, library, machine-learning, mit, statistics ] [ Propose Tags ]

Classical machine learning and statistics datasets from the UCI Machine Learning Repository and other sources.

The datasets package defines two different kinds of datasets:

  • small data sets which are directly (or indirectly with `file-embed`) embedded in the package as pure values and do not require network or IO to download the data set. This includes Iris, Anscombe and OldFaithful.

  • other data sets which need to be fetched over the network with Numeric.Datasets.getDataset and are cached in a local temporary directory.

The datafiles/ directory of this package includes copies of a few famous datasets, such as Titanic, Nightingale and Michelson.

Example :

import Numeric.Datasets (getDataset)
import Numeric.Datasets.Iris (iris)
import Numeric.Datasets.Abalone (abalone)

main = do
  -- The Iris data set is embedded
  print (length iris)
  print (head iris)
  -- The Abalone dataset is fetched
  abas <- getDataset abalone
  print (length abas)
  print (head abas)

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0, 0.1.0.1, 0.2, 0.2.0.1, 0.2.0.2, 0.2.0.3, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.3.0, 0.4.0
Change log changelog.md
Dependencies aeson (>=1.4.2.0), attoparsec (>=0.13), base (>=4.6 && <5), bytestring (>=0.10.8.2), cassava (>=0.5.1.0), deepseq (>=1.4.4.0), directory (>=1.3.3.0), exceptions (>=0.10.0), file-embed (>=0.0.11), filepath (>=1.4.2.1), hashable (>=1.2.7.0), JuicyPixels (>=3.3.3), microlens (>=0.4.10), mtl (>=2.2.2), mwc-random (>=0.14.0.0), parallel (>=3.2.2.0), req (>=2.0.0), safe-exceptions (>=0.1.7.0), streaming (>=0.2.2.0), streaming-attoparsec (>=1.0.0), streaming-bytestring (>=0.1.6), streaming-cassava (>=0.1.0.1), streaming-commons (>=0.2.1.0), stringsearch (>=0.3.6.6), tar (>=0.5.1.0), text (>=1.2.3.1), time (>=1.8.0.2), transformers (>=0.5.5.0), vector (>=0.12.0.2), zlib (>=0.6.2) [details]
License MIT
Author Tom Nielsen <tanielsen@gmail.com>
Maintainer Marco Zocca <ocramz fripost org>
Category Statistics, Machine Learning, Data Mining, Data
Home page https://github.com/DataHaskell/dh-core
Bug tracker https://github.com/DataHaskell/dh-core/issues
Source repo head: git clone https://github.com/DataHaskell/dh-core/datasets
Uploaded by ocramz at 2019-02-12T21:11:06Z
Distributions
Reverse Dependencies 1 direct, 1 indirect [details]
Downloads 10036 total (14 in the last 30 days)
Rating 2.0 (votes: 1) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2019-02-12 [all 1 reports]