hmatrix-nipals-0.2: NIPALS method for Principal Components Analysis on large data-sets.

Numeric.LinearAlgebra.NIPALS

Contents

Description

Nonlinear Iterative Partial Least Squares

Synopsis

Simplified Interface

firstPC :: Matrix Double -> (Vector Double, Vector Double, Matrix Double)Source

Calculate the first principal component of a set of samples.

Each row in the matrix is one sample. Note that this is transposed compared to the implementation of principal components using svd or leftSV

Example:

 let (pc,scores,residuals) = firstPC $ fromRows samples

This is calculated by providing a default estimate of the scores to firstPCFromScores

firstPCFromScores :: Matrix Double -> Vector Double -> (Vector Double, Vector Double, Matrix Double)Source

Calculate the first principal component of a set of samples given a starting estimate of the scores.

Each row in the matrix is one sample. Note that this is transposed compared to the implementation of principal components using svd or leftSV

The second argument is a starting guess for the score vector. If this is close to the actual score vector, then this will cause the algorthm to converge much faster.

Example:

 let (pc,scores,residuals) = firstPCFromScores (fromRows samples) scoresGuess

Monadic interface

firstPCFromScoresM :: Monad m => m [Vector Double] -> Vector Double -> m (Vector Double, Vector Double)Source

Calculate the first principal component -- calculating the samples fresh on every pass.

This function calculates the exact same results as firstPCFromScores (minus the residual), but instead of an input Matrix, it takes a monad action that yields the list of samples, and it guarantees that the list returned by the action will be consumed in a single pass. However the action may be demanded many times.

The residual can't be calculated lazily, like it is in firstPCFromScores, because the samples would need to be demanded. Instead, to calculate the residual use residual.

There is no corresponding firstPCM that guesses the initial score vector for you because if you need to use this function instead of firstPC, then you really should come up with a reasonable starting point or it will take forever.

residualSource

Arguments

:: [Vector Double]

The samples

-> Vector Double

The component (also called the loading)

-> Vector Double

The scores

-> [Vector Double]

The residuals for each sample

Calculate the residuals of a series of samples given a component and score vector.

 (p,t) <- firstPCFromScoresM samplesM (randomVector 0 Gaussian numSamples)
 samples <- samplesM
 let r = residual samples p t