Persistence-1.0: Quickly detect clusters and holes in data.

Copyright(c) Eben Cowley 2018
LicenseBSD 3 Clause
Safe HaskellNone



This module contains functions for constructing filtrations and computing persistent homology, as well as a few utility functions for working with filtrations.

A filtration is a finite sequence of simplicial complexes where each complex is a subset of the next. This means that a filtration can be thought of as a single simplicial complex where each of the simplices is labeled with a "filtration index" that represents the index in the sequence where that simplex enters the filtration.

One way to create a filtration, given a simplicial complex, a metric for the vertices, and a list of distances, is to loop through the distances from greatest to least: create a simplicial complex each iteration which excludes simplices that contain pairs of vertices which are further than the current distance apart. This method will produce a filtration of Vietoris-Rips complexes - each filtration index will correspond to a VR complex whose scale is the corresponding distance.

An essential thing to note about the way this library is set up is the distinction between "fast" and "light" functions. Light functions call the metric every time distance between two points is required, which is a lot. Fast functions store the distances between points and access them in constant time, BUT this means they use O(n^2) memory with respect to the number of data points, so it's a really bad idea to use this optimization on substantially large data.

IMPORTANT NOTE: This library currently only handles filtrations where all the vertices have filtration index 0. Since I haven't thought of any way filtrations that don't respect this property could come up in applications, I have chosen to exclude them in favor of less memory usage and a slight speedup.

Persistent homology is the main event of topological data analysis. It allows one to identify clusters, holes, and voids that persist in the data throughout many scales. The output of the persistence algorithm is a barcode diagram, which currently have a somewhat crude representation in this library. A single barcode represents the filtration index where a feature appears and the index where it disappears (if it does). Thus, short barcodes are typically interpretted as noise and long barcodes are interpretted as actual features.



type Filtration = (Int, Vector (Vector (Int, Vector Int, Vector Int))) Source #

The first component is simply the number of vertices (all vertices are assumed to have filtration index 0). The second component is a vector with an entry for each dimension of simplices, starting at dimension 1 for edges. Each simplex is represented as a triple: its filtration index, the indices of its vertices in the original data, and the indices of its faces in the next lowest dimension. Edges do not have reference to their faces, as it would be redundant with their vertices. All simplices are sorted according to filtration index upon construction of the filtration.

type BarCode = (Int, Maybe Int) Source #

(i, Just j) is a feature that appears at filtration index i and disappears at index j, (i, Nothing) begins at i and doesn't disappear.

sim2String :: (Int, Vector Int, Vector Int) -> String Source #

Shows all the information in a simplex.

filtr2String :: Filtration -> String Source #

Shows all the information in a filtration.

getComplex :: Int -> Filtration -> SimplicialComplex Source #

Gets the simplicial complex specified by the filtration index. This is O(n) with respect to the number of simplices.

getNumSimplices :: [Int] -> [Int] -> Filtration -> Int Source #

The first argument is a list of dimensions, the second argument is a list of filtration indices. The function returns the number of simplices in the filtration whose dimension and index exist in the respective lists.

getDimension :: Filtration -> Int Source #

Return the dimension of the highest dimensional simplex in the filtration (constant time).

filterByWeightsFast :: Ord a => [a] -> (SimplicialComplex, Graph a) -> Filtration Source #

Given a list of scales, a simplicial complex, and a weighted graph (see SimplicialComplex) which encodes a metric on the vertices, this function creates a filtration out of a simplicial complex by removing simplices that contain edges that are too long for each scale in the list. Really, this is a helper function to be called by makeVRFiltrationFast, but I decided to expose it in case you have a simplicial complex and weighted graph lying around. The scales MUST be in decreasing order for this function.

makeVRFiltrationFast :: (Ord a, Eq b) => [a] -> (b -> b -> a) -> [b] -> Filtration Source #

Given a list of scales, a metric, and a data set, this function constructs a filtration of the Vietoris-Rips complexes associated with the scales. The scales MUST be in decreasing order. Note that this a fast function, meaning it uses O(n^2) memory to quickly access distances where n is the number of data points.

filterByWeightsLight :: Ord a => [a] -> (b -> b -> a) -> [b] -> SimplicialComplex -> Filtration Source #

The same as filterbyWeightsFast except it uses far less memory at the cost of speed. Note that the scales must be in decreasing order.

makeVRFiltrationLight :: (Ord a, Eq b) => [a] -> (b -> b -> a) -> [b] -> Filtration Source #

Given a list of scales in decreasing order, a metric, and a data set, this constructs the filtration of Vietoris-Rips complexes corresponding to the scales.

persistentHomology :: Filtration -> [[BarCode]] Source #

Each index in the list is a list of barcodes whose dimension corresponds to the index. That is, the first list will represent clusters, the second list will represent tunnels or punctures, the third will represent hollow volumes, and the nth index list will represent n-dimensional holes in the data.