discrimination-0: Fast generic linear-time sorting, joins and container construction.

Safe HaskellTrustworthy
LanguageHaskell2010

Data.Discrimination.Grouping

Contents

Synopsis

Documentation

newtype Group a Source

Discriminator

Constructors

Group 

Fields

runGroup :: forall b. [(a, b)] -> [[b]]
 

class Grouping a where Source

Eq equipped with a compatible stable unordered discriminator.

Minimal complete definition

Nothing

Methods

grouping :: Group a Source

For every surjection f,

contramap f groupinggrouping

class Grouping1 f where Source

Minimal complete definition

Nothing

Methods

grouping1 :: Group a -> Group (f a) Source

Combinators

nub :: Grouping a => [a] -> [a] Source

O(n). This upgrades nub from Data.List from O(n^2) to O(n) by using unordered discrimination.

nub = nubWith id
nub as = head <$> group as

nubWith :: Grouping b => (a -> b) -> [a] -> [a] Source

O(n). nub with a Schwartzian transform.

nubWith f as = head <$> groupWith f as

group :: Grouping a => [a] -> [[a]] Source

O(n). Similar to group, except we do not require groups to be clustered.

This combinator still operates in linear time, at the expense of productivity.

The result equivalence classes are _not_ sorted, but the grouping is stable.

group = groupWith id

groupWith :: Grouping b => (a -> b) -> [a] -> [[a]] Source

O(n). This is a replacement for groupWith using discrimination.

The result equivalence classes are _not_ sorted, but the grouping is stable.

groupingEq :: Grouping a => a -> a -> Bool Source

Valid definition for (==) in terms of Grouping.

Internals

groupingBag :: Foldable f => Group k -> Group (f k) Source

Construct an stable unordered discriminator that partitions into equivalence classes based on the equivalence of keys as a multiset.

groupingSet :: Foldable f => Group k -> Group (f k) Source

Construct an stable unordered discriminator that partitions into equivalence classes based on the equivalence of keys as a set.

groupingShort :: Group Int Source

Shared bucket set for small integers

groupingNat :: Int -> Group Int Source

Perform stable unordered discrimination by bucket.

This reuses arrays unlike the more obvious ST implementation, so it wins by a huge margin in a race, especially when we have a large keyspace, sparsely used, with low contention. This will leak a number of arrays equal to the maximum concurrent contention for this resource. If this becomes a bottleneck we can make multiple stacks of working pads and index the stack with the hash of the current thread id to reduce contention at the expense of taking more memory.

You should create a thunk that holds the discriminator from groupingNat n for a known n and then reuse it.