Safe Haskell | None |
---|---|
Language | Haskell2010 |
This library provides an implementation of Vantage Point Trees [1], a data structure useful for indexing data points that exist in some metric space.
The current implementation is not particolarly optimized and assumes the data resides entirely in memory but it seems to work decently well for index sizes in the 10's of thousands.
Usage
build
: construct an index from a dataset and a distance functionrange
: find points in the index that lie within a given distance from the query
Additionally, small trees can be rendered to screen with draw
for debugging purposes.
References
1) P. N. Yianilos - Data structures and algorithms for nearest neighbor search in general metric spaces - http://web.cs.iastate.edu/~honavar/nndatastructures.pdf
Documentation
Vantage point trees
Instances
Foldable (VPTree d) Source # | |
Defined in Data.VPTree.Internal fold :: Monoid m => VPTree d m -> m # foldMap :: Monoid m => (a -> m) -> VPTree d a -> m # foldMap' :: Monoid m => (a -> m) -> VPTree d a -> m # foldr :: (a -> b -> b) -> b -> VPTree d a -> b # foldr' :: (a -> b -> b) -> b -> VPTree d a -> b # foldl :: (b -> a -> b) -> b -> VPTree d a -> b # foldl' :: (b -> a -> b) -> b -> VPTree d a -> b # foldr1 :: (a -> a -> a) -> VPTree d a -> a # foldl1 :: (a -> a -> a) -> VPTree d a -> a # elem :: Eq a => a -> VPTree d a -> Bool # maximum :: Ord a => VPTree d a -> a # minimum :: Ord a => VPTree d a -> a # | |
(Eq d, Eq a) => Eq (VPTree d a) Source # | |
(Show d, Show a) => Show (VPTree d a) Source # | |
Generic (VPTree d a) Source # | |
(NFData d, NFData a) => NFData (VPTree d a) Source # | |
Defined in Data.VPTree.Internal | |
type Rep (VPTree d a) Source # | |
Defined in Data.VPTree.Internal type Rep (VPTree d a) = D1 ('MetaData "VPTree" "Data.VPTree.Internal" "vp-tree-0.1.0.1-EcwP6uYkoBw2ZyJEMHHmtu" 'False) (C1 ('MetaCons "VPT" 'PrefixI 'True) (S1 ('MetaSel ('Just "vpTree") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (VT d a)) :*: S1 ('MetaSel ('Just "vptDistFun") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (a -> a -> d)))) |
Construction
:: (RealFrac p, Floating d, Ord d, Eq a) | |
=> (a -> a -> d) | distance function |
-> p | proportion of remaining dataset to sample at each level, \(0 < p <= 1 \) |
-> Vector a | dataset used for constructing the index |
-> VPTree d a |
Build a VPTree
The supplied distance function d
must satisfy the definition of a metric, i.e.
- identity of indiscernible elements : \( d(x, y) = 0 \leftrightarrow x \equiv y \)
- symmetry : \( d(x, y) = d(y, x) \)
- triangle inequality : \( d(x, y) + d(y, z) >= d(x, z) \)
The current implementation makes multiple passes over the whole dataset, which is why the entire indexing dataset must be present in memory (packed as a Vector
).
Implementation detail : construction of a VP-tree requires a randomized algorithm, but we run that in the ST monad so the result is pure.
Query
Range query : find all points in the tree closer to the query point than a given threshold