Portability	non-portable
Stability	experimental
Safe Haskell	None

TimeSeries.Scratch

Contents

Notes
Scratches

Description

Scratches and notes.

Synopsis

Notes

From: 2.2.3 Random Projection

sketch_of_t :: [Double]Source

Size 2 sketch vector of t constructed from random vectors v1 and v2.

>>> sketch_of_t
[0.30000000000000004,-4.58]

dotp :: Num a => [a] -> [a] -> aSource

From: 2.3 Matching Pursuit

Quote:

"Given a vector pool containing n time series V = (v_1,v_2,...,v_n), a target vector v_t, tolerated error \eps, and approximating vector set V_a = \theta. Define cos\theta a_1 = ṽ_t * ṽ_1 as the cosine between v_t and a vector v_1 in V. Here vector v = v / ‖v‖.

Set i = 1;
Search the pool V and find the vector v_1 whose |costheta_1| with respect to v_t is maximal;
Compute the residue r = v_t - c_1v_1 where c_1 = (‖v_t‖/‖v_1‖) cos\theta. V_a = V_a ⋃ {v_1}
If ‖r‖ < epsilon terminate, return V_a.
Else set i = i + 1 and v_t = r, go back to 2."

From: 3.4.1 The Sketch Approach

Quote from the pdf:

" ... Quantitatively, given a point x ∈ Rᵐ, we compute its dot product with d random vectors rᵢ ∈ {1, -1}ᵐ. The first random projection of x is given by:

y1 = (x*r_1, x*r_2, ..., x*r_d)

We compute 2b more such random projections y_1,...,y_2b_{+1}. If w is another point in Rᵐ and z_1,...,z_2b_{+1} are its projections using dot products with the same random vectors then the median of

‖y_1-z_1‖, ‖y_2-z_2‖, ..., ‖y_2b_{+1}-z_2b_{+1}‖

is a good estimate of ‖x-w‖. It lies within a theta(1/d) factor of ‖x-w‖ with probability 1 - (1/2)ᵇ. ... "

And,

"...Our approach is to use a \"structured\" random vector. The apparently oxymoronic idea is to form each structured random vector r from the concatenation of nb = sw/nb random vectors: r = s_1,...s_{nb}, where each si has length bw. Further each si is either u or -u, and u is a random vector in {1, -1}ᵐ. This choice is determined by a random binary nb-vector b: if b = 1, si = u and if bi = 0, si = -u. The structured approach leads to an asymptonic performance of O(nb) integer additions and O(log bw) floating point operations per datum and per random vector..."

vx :: [Double]Source

vw :: [Double]Source

vx' :: [Double]Source

vw' :: [Double]Source

norm :: Floating a => [a] -> aSource

dist_direct :: [Double] -> [Double] -> Double Source

Direct distance.

dist_sketch Source

Arguments

:: Int	Sketch size d.
-> [Double]	First vector.
-> [Double]	Second vector.
-> Double

Sketch distance.

chunks :: Int -> [a] -> [[a]]Source

direct_div_by_sketch_distance :: Integer -> Int -> Double Source

d / s, where d is direct distance and s is sketch distance.

print_dist_ratios :: IO ()Source

Print direct_div_by_sketch with different sketch sizes.

From: 3.4.2 Partitioning Sketch Vectors

Quote from the pdf:

"... Note that we use correlation and distance more or less interchangebly because one can be computed from the other once the data is normalized. Pearson correlation is related to Euclidean distance as follows:

 D^2(x̂,ŷ) = 2(1 - corr(x,y))

Here x̂ and ŷ are obtained from the raw time sereis by computing:"

      x - avg(x)
 x̂ = ———————————–
        var(x)

type DistanceFunc = [Double] -> [Double] -> Double Source

type CorrFunc = [Double] -> [Double] -> Double Source

corr_from_direct_distance :: CorrFunc Source

From above:

                 D^2(x̂,ŷ)
 corr(x,y) = 1 - ———————
                    2

corr_from_sketch_distance Source

Arguments

:: Int	Sketch size.
-> CorrFunc

Like corr_from_direct_distance, but using dist_sketch.

corr_from_distance :: DistanceFunc -> CorrFunc Source

Using normalized input vector X and Y.

direct_diff_with_sketch_corr :: Integer -> Int -> (Double, Double)Source

print_corr_diffs :: IO ()Source

From: 3.4.3 Algorithmic Framework

"... Suppose we are seeking points within some distance d in the original time series space.

Partition each sketch vector s of size N into groups of some size g.
The ith group of each sketch vector s is placed in the ith grid structure of dimension g.
If two sketch vectors s1 and s2 are within distance c ✕ d in more than a fraction f of the groups, then the corresponding windows are candidate highly correlated windows and will be checked exactly."

From: 3.5 The Issues in Implementation

"... data normalization and its sketch within a sliding window are as follows (k = sw):

       X_k - avg(X_k)
 X̂_k = ——————————————
         var(X_k)

 Xsk^k = X̂_k⋅R_k

... difficulty lies in that avg(X_k) and var(X_k) change over each basic window .... we will show that the updating is trivial and the sketch needs to be computed only once."

incremental_avg_and_var Source

Arguments

:: Integer	Random seed
-> Int	bw, basic window size
-> Int	sw, sliding window size
-> [Double]	Input data
-> [([Double], Double, Double)]	Sliding window contents, avg(Xₖ), and var(Xₖ) (k = sw)

Update avg and var in a manner described in section 3.5.

compare_naive_and_incremental_avg_and_var :: Integer -> Int -> Int -> [Double] -> IO [Double]Source

Self descriptive function to show the comparison of naive and incremental avg and var.

comparison_of_avg_and_var :: IO ()Source

Sample comparison of avg and var, as shown in the function name.

From: 3.6.3 Performance Tests

"... To make the comparison concrete, we should specify our software architecture a bit more. The multi-dimentional search structure we use is in fact a grid structure. The reason we have rejected more sophisticated structures is that we are asking a radius query: which windows (represented as points) are within a certain distance of a given point? A multi-scale structure such as a quadtree or R-tree would not help in this case. Moreover, the grid structure can be stored densely in a hash table so empty cells take up no space. .."

From: Appendix B: Structured Random Projection for Sliding Window

"Given a sliding window length sw and a basic window length bw, we show how to compute the sketches for the sliding windows starting at each timepoint ... we will use a far more efficient approach based on the convolution between "structured random vector" and a data vector of length sw."

And from 3.5: The Issues in Implementation

$ "... To normalize the whole sliding window based on the same average and variance, the sketch of the basic window Xbwⁱ,i=0,1,...,nb will be updated as follows.

                          bw-1
         (Xbwⁱ⋅R) - avgsw  ∑  Ri
                          i=0
 Xskⁱ = ——————————————————————————
                varsw

where avg will be updated by removing the oldest basic window and adding the new arrival Xnb, ...."

unit_random_vector_and_control_vector Source

Arguments

:: Integer	Random seed.
-> Int	nb, sliding window size / basic window size.
-> Int	bw, basic window size.
-> RandomVector

Random vector rbw:

rbw = (r₀,r₁,...,rbw-1

and control vector b:

b = (b₀,b₁,...,bnb-1

whole_random_vector Source

Arguments

:: Integer	Random seed.
-> Int	nb, sliding window size / basic window size.
-> Int	bw, basic window size.
-> [Double]	r.

Random vector for sliding window, built from unit random vector and control vector:

r = (rbw⋅b₀,rbw⋅b₁,...,rbw⋅bnb-1

data RandomVector Source

Random vector used for structured sketches.

Constructors

RandomVector
Fields rvUnit :: [Double] Random values treated as unit. rvControl :: [Double] Random values treated as controlling vector.

Instances

Eq RandomVector
Show RandomVector

seq_one_sketch :: Integer -> Int -> Int -> [Double] -> [([Double], Double)]Source

print_comparisons_of_single_sketch :: IO ()Source

Print comparisions of single sketch.

Showing sliding window contents, sketch from structured vector, sketch from direct dot product, diff of the two, and ratio of the two.

sequence_of_sketches Source

Arguments

:: Integer	Random seed.
-> Int	Size of sketch.
-> Int	bw, basic window size.
-> Int	sw, sliding window size.
-> [Double]	Input data stream.
-> [([Double], [Double])]	Sliding window contents and sketches.

Scratch of: Figure B.6: Structured convolution procedure every basic window. Still has "Curse of Dimensionality".

This time, sketch size is controlled with given argument.

XXX: Normalization is improper.

print_convolved_sketches :: Int -> IO ()Source

push :: [a] -> [a] -> [a]Source

Push the contents of first list to second list, returning list having same length as second list.

print_comparisons_incr :: IO ()Source

Show ratio of correlation value computed with direct function and computed with standardized convolved sketch distance.

From: Appendix C: An Upper Bound of the Grid Size

Quote:

"Now let's examine how to embed this sketch filter in a grid structure. At first, we assume our parameter group is (N,g,c,f) and therfore there are totally ng = N/g groups. We will assign one grid structure to each sketch group. For each grid structure the critical parameter is the largest value A and diameter a. Now let's bound the size of A...".

Scratches

w0 :: Window Source

w1 :: Window Source

w1_copied_from_w0 :: Window Source

len :: Num b => [a] -> bSource

mean :: Fractional a => [a] -> aSource

stddev :: [Double] -> Double Source

variance_direct :: [Double] -> Double Source

covariance_direct :: [Double] -> [Double] -> Double Source

correlation_direct_01 :: [Double] -> [Double] -> Double Source

correlation_direct_02 :: [Double] -> [Double] -> Double Source

standardize :: [Double] -> [Double]Source

prg64Doubles :: Integer -> [Double]Source

Generate list of doubles between 0 to 1 with PRG64.

bwins01 :: BasicWindows Source

swins01 :: IntMap Window Source

sq :: Floating a => a -> aSource