Copyright | (c) Eben Kadile 2018 |
---|---|
License | BSD 3 Clause |
Maintainer | eben.cowley42@gmail.com |
Stability | experimental |
Safe Haskell | None |
Language | Haskell2010 |
This module contains functions for constructing filtrations and computing persistent homology, persistence landscapes, and computing bottleneck distance between barcode diagrams.
A filtration is a finite sequence of simplicial complexes where each complex is a subset of the next. This means that a filtration can be thought of as a single simplicial complex where each of the simplices is labeled with a "filtration index" that represents the index in the sequence where that simplex enters the filtration.
One way to create a filtration, given a simplicial complex, a metric for the vertices, and a list of distances, is to loop through the distances from greatest to least: create a simplicial complex each iteration which excludes simplices that contain pairs of vertices which are further than the current distance apart. This method will produce a filtration of Vietoris-Rips complexes - each filtration index will correspond to a Rips complex whose scale is the corresponding distance. This filtration represents the topology of the data at each of the scales with which it was constructed.
NOTE: It's important that, even though the smallest filtration index represents the smallest scale at which the data is being anaylzed, all functions in this library receive your list of scales sorted in *decreasing* order.
An essential thing to note in this library is the distinction between "fast" and "light" functions. Light functions call the metric every time distance between two points is required, which is a lot. Fast functions store the distances between points and access them in constant time, BUT this means they use O(n^2) memory with respect to the number of data points, so it's a really bad idea to use this optimization on substantially large data if you don't have a lot of RAM.
Persistent homology is the main event of topological data analysis. It allows one to identify clusters, tunnels, cavities, and higher dimensional holes that persist in the data throughout many scales. The output of the persistence algorithm is a barcode diagram. A single barcode represents the filtration index where a feature appears and the index where it disappears (if it does). Alternatively, a barcode can represent the scale at which a feature and the scale at which it ends. Thus, short barcodes are typically interpretted as sampling irregularities and long barcodes are interpretted as actual features of whatever the underlying data set represents. In this context, what a feature *is* depends on which dimension the barcode diagram is; 0-dimensional features are connected components, 1-dimensional features are loops or tunnels, 2-dimensional features are hollow volumes, and higher dimensional features correspond to heigher-dimensional cavities.
After you've got the barcodes of a data set, you might want to compare it with that of a different data set. This is the purpose of bottleneck distance, which corresponds to the Hausdorff distance between barcode diagrams.
Another way to compare barcode diagrams is by using persistence landscapes. The peristence landscape of a barcode diagram is a finite sequence of piecewise-linear, real-valued functions. This means they can be used to take averages and compute distances between barcode diagrams. See "A Persistence Landscapes Toolbox For Topological Statistics" by Bubenik and Dlotko for more information.
WARNING: The persistence landscape functions have not been fully tested. Use them with caution. If you get any errors or unexpected output, please don't hesitate to email me.
Synopsis
- type FilterSimplex = (Int, Vector Int, Vector Int)
- type SimpleFiltration = (Int, Vector (Vector FilterSimplex))
- type Filtration = Vector (Vector FilterSimplex)
- data Extended a
- = Finite a
- | Infinity
- | MinusInfty
- type BarCode a = (a, Extended a)
- type Landscape = Vector (Vector (Extended Double, Extended Double))
- sim2String :: FilterSimplex -> String
- filtr2String :: Either SimpleFiltration Filtration -> String
- getComplex :: Int -> Either SimpleFiltration Filtration -> SimplicialComplex
- getDimension :: Either SimpleFiltration Filtration -> Int
- simple2Filtr :: SimpleFiltration -> Filtration
- filterByWeightsFast :: Ord a => Either (Vector a) [a] -> (SimplicialComplex, Graph a) -> SimpleFiltration
- makeRipsFiltrationFast :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration
- makeRipsFiltrationFastPar :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration
- filterByWeightsLight :: Ord a => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimplicialComplex -> SimpleFiltration
- makeRipsFiltrationLight :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration
- makeRipsFiltrationLightPar :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration
- indexBarCodes :: Filtration -> Vector (Vector (BarCode Int))
- indexBarCodesSimple :: SimpleFiltration -> Vector (Vector (BarCode Int))
- scaleBarCodes :: Either (Vector a) [a] -> Filtration -> Vector (Vector (BarCode a))
- scaleBarCodesSimple :: Either (Vector a) [a] -> SimpleFiltration -> Vector (Vector (BarCode a))
- indexMetric :: BarCode Int -> BarCode Int -> Extended Double
- bottleNeckDistance :: Ord b => (BarCode a -> BarCode a -> Extended b) -> Vector (BarCode a) -> Vector (BarCode a) -> Maybe (Extended b)
- bottleNeckDistances :: Ord b => (BarCode a -> BarCode a -> Extended b) -> Vector (Vector (BarCode a)) -> Vector (Vector (BarCode a)) -> Vector (Maybe (Extended b))
- calcLandscape :: Vector (BarCode Int) -> Landscape
- evalLandscape :: Landscape -> Int -> Extended Double -> Extended Double
- evalLandscapeAll :: Landscape -> Extended Double -> Vector (Extended Double)
- linearComboLandscapes :: [Double] -> [Landscape] -> Landscape
- avgLandscapes :: [Landscape] -> Landscape
- diffLandscapes :: Landscape -> Landscape -> Landscape
- normLp :: Extended Double -> (Double, Double) -> Double -> Landscape -> Maybe Double
- metricLp :: Extended Double -> (Double, Double) -> Double -> Landscape -> Landscape -> Maybe Double
Types
type FilterSimplex = (Int, Vector Int, Vector Int) Source #
This type synonym exists to make other synonyms more concise. Each simplex in a filtration is represented as a triple: its filtration index, the indices of its vertices in the original data, and the indices of its faces in the next lowest dimension. Edges do not have reference to their faces, as it would be redundant with their vertices. All simplices are sorted according to filtration index upon construction of the filtration. In each dimension, all simplices are sorted in increasing order of filtration index, and every simplices face indices are sorted in decreasing order; both of these facts are critical to the computation of persistent homology.
type SimpleFiltration = (Int, Vector (Vector FilterSimplex)) Source #
A type representing a filtration whose vertices all have filtration index 0. Slightly faster and slightly less memory usage. The first component is simply the number of vertices. The second component is a vector with an entry for each dimension of simplices, starting at dimension 1 for edges.
type Filtration = Vector (Vector FilterSimplex) Source #
Representation of a filtration which, unlike SimpleFiltration, can cope with vertices that have a non-zero filtration index. Vertices of the filtration are represented like all other simplices except that they don't their own have vertices or faces.
Note that, since this library currently only deals with static pointcloud data, all of the filtration construction functions produce vertices whose filtration index is 0. Thus, if you want to use this type you will have to construct the instances yourself.
A type extending the number line to positive and negative infinity. Used for representing infinite barcodes, bottleneck distance, and persistence landscapes.
Instances
Eq a => Eq (Extended a) Source # | |
Num a => Num (Extended a) Source # | Arithmetic is defined in the canonical way based on the arithmetic of |
Defined in Persistence.Filtration | |
(Ord a, Eq a) => Ord (Extended a) Source # | The ordering is inherited from the type a, Infinity is greater than everything else and MinusInfty is less than everything else. |
Show a => Show (Extended a) Source # | |
type BarCode a = (a, Extended a) Source #
(x, Finite y) is a topological feature that appears at the index or scale x and disappears at the index or scale y. (x, Infinity) begins at x and doesn't disappear.
type Landscape = Vector (Vector (Extended Double, Extended Double)) Source #
A Persistence landscape is a certain type of piecewise linear function based on a barcode diagram. It can be represented as a list of critical points paired with critical values. Useful for taking averages and differences between barcode diagrams.
Utilities
sim2String :: FilterSimplex -> String Source #
Shows all the information in a simplex.
filtr2String :: Either SimpleFiltration Filtration -> String Source #
Shows all the information in a filtration.
getComplex :: Int -> Either SimpleFiltration Filtration -> SimplicialComplex Source #
Gets the simplicial complex specified by the filtration index. This is O(n) with respect to the number of simplices.
getDimension :: Either SimpleFiltration Filtration -> Int Source #
Return the dimension of the highest dimensional simplex in the filtration (constant time).
simple2Filtr :: SimpleFiltration -> Filtration Source #
Convert a simple filtration into an ordinary filtration.
Construction
:: Ord a | |
=> Either (Vector a) [a] | Scales in decreasing order |
-> (SimplicialComplex, Graph a) | Simplicial complex and a graph encoding the distance between every data point as well as whether or not they are within the largest scale of each other. |
-> SimpleFiltration |
This function creates a filtration out of a simplicial complex by removing simplices that contain edges that are too long for each scale in the list. This is really a helper function to be called by makeRipsFiltrationFast, but I decided to expose it in case you have a simplicial complex and weighted graph lying around. The scales MUST be in decreasing order.
makeRipsFiltrationFast Source #
:: (Ord a, Eq b) | |
=> Either (Vector a) [a] | Scales in decreasing order |
-> (b -> b -> a) | Metric |
-> Either (Vector b) [b] | Data set |
-> SimpleFiltration |
This function constructs a filtration of the Vietoris-Rips complexes associated with the scales. Note that this a fast function, meaning it uses O(n^2) memory to quickly access distances where n is the number of data points.
makeRipsFiltrationFastPar Source #
:: (Ord a, Eq b) | |
=> Either (Vector a) [a] | Scales in decreasing order |
-> (b -> b -> a) | Metric |
-> Either (Vector b) [b] | Data set |
-> SimpleFiltration |
Same as above except it uses parallelism when computing the Vietoris-Rips complex of the largest scale.
:: Ord a | |
=> Either (Vector a) [a] | Scales in decreasing order |
-> (b -> b -> a) | Metric |
-> Either (Vector b) [b] | Data set |
-> SimplicialComplex | Vietoris-Rips complex of the data at the largest scale. |
-> SimpleFiltration |
The same as filterbyWeightsFast except it uses far less memory at the cost of speed. Note that the scales must be in decreasing order.
makeRipsFiltrationLight Source #
:: (Ord a, Eq b) | |
=> Either (Vector a) [a] | List of scales in decreasing order |
-> (b -> b -> a) | Metric |
-> Either (Vector b) [b] | Data set |
-> SimpleFiltration |
Constructs the filtration of Vietoris-Rips complexes corresponding to each of the scales.
makeRipsFiltrationLightPar Source #
:: (Ord a, Eq b) | |
=> Either (Vector a) [a] | List of scales in decreasing order |
-> (b -> b -> a) | Metric |
-> Either (Vector b) [b] | Data set |
-> SimpleFiltration |
Same as above except it uses parallelism when computing the Vietoris-Rips complex of the largest scale.
Persistent homology
indexBarCodes :: Filtration -> Vector (Vector (BarCode Int)) Source #
The nth entry in the list will describe the n-dimensional topology of the filtration. That is, the first list will represent clusters, the second list will represent tunnels or punctures, the third will represent hollow volumes, and the nth index list will represent n-dimensional holes in the data. Features are encoded by the filtration indices where they appear and disappear.
indexBarCodesSimple :: SimpleFiltration -> Vector (Vector (BarCode Int)) Source #
Same as above except this function acts on filtrations whose vertices all have filtration index zero (for a very slight speedup).
scaleBarCodes :: Either (Vector a) [a] -> Filtration -> Vector (Vector (BarCode a)) Source #
The nth entry in the list will describe the n-dimensional topology of the filtration. However, features are encoded by the scales where they appear and disappear. For consistency, scales must be in decreasing order.
scaleBarCodesSimple :: Either (Vector a) [a] -> SimpleFiltration -> Vector (Vector (BarCode a)) Source #
Same as above except acts only on filtrations whose vertices all have filtration index 0. Note that scales must be in decreasing order.
Comparing barcode diagrams
indexMetric :: BarCode Int -> BarCode Int -> Extended Double Source #
The standard (Euclidean) metric between index barcodes. The distance between infinite and finite barcodes is infinite, and the distance between two infinite barcodes is the absolute value of the difference of their fst component.
bottleNeckDistance :: Ord b => (BarCode a -> BarCode a -> Extended b) -> Vector (BarCode a) -> Vector (BarCode a) -> Maybe (Extended b) Source #
Given a metric, return the Hausdorff distance (referred to as bottleneck distance in TDA) between the two sets. Returns nothing if either list of barcodes is empty.
bottleNeckDistances :: Ord b => (BarCode a -> BarCode a -> Extended b) -> Vector (Vector (BarCode a)) -> Vector (Vector (BarCode a)) -> Vector (Maybe (Extended b)) Source #
Get's all the bottleneck distances; a good way to determine the similarity of the topology of two filtrations.
calcLandscape :: Vector (BarCode Int) -> Landscape Source #
Compute the persistence landscape of the barcodes for a single dimension.
evalLandscape :: Landscape -> Int -> Extended Double -> Extended Double Source #
Evaluate the nth function in the landscape for the given point.
evalLandscapeAll :: Landscape -> Extended Double -> Vector (Extended Double) Source #
Evaluate all the real-valued functions in the landscape.
linearComboLandscapes :: [Double] -> [Landscape] -> Landscape Source #
Compute a linear combination of the landscapes. If the coefficient list is too short, the rest of the coefficients are assumed to be zero. If it is too long, the extra coefficients are discarded.
avgLandscapes :: [Landscape] -> Landscape Source #
Average the persistence landscapes.
diffLandscapes :: Landscape -> Landscape -> Landscape Source #
Subtract the second landscape from the first.
:: Extended Double | p, the power of the norm |
-> (Double, Double) | Interval to compute the integral over |
-> Double | Step size |
-> Landscape | Persistence landscape whose norm is to be computed |
-> Maybe Double |
If p>=1 then it will compute the L^p norm on the given interval. Uses trapezoidal approximation. You should ensure that the stepsize partitions the interval evenly.
:: Extended Double | p, power of the metric |
-> (Double, Double) | Interval on which the integral will be computed |
-> Double | Step size |
-> Landscape | First landscape |
-> Landscape | Second landscape |
-> Maybe Double |
Given the same information as above, computes the L^p distance between the two landscapes. One way to compare the topologies of two filtrations.