Portability | non-portable |
---|---|
Stability | experimental |
Safe Haskell | None |
- Notes
- From: 2.2.3 Random Projection
- From: 2.3 Matching Pursuit
- From: 3.4.1 The Sketch Approach
- From: 3.4.2 Partitioning Sketch Vectors
- From: 3.4.3 Algorithmic Framework
- From: 3.5 The Issues in Implementation
- From: 3.6.3 Performance Tests
- From: Appendix B: Structured Random Projection for Sliding Window
- From: Appendix C: An Upper Bound of the Grid Size
- Scratches
Scratches and notes.
- sketch_of_t :: [Double]
- dotp :: Num a => [a] -> [a] -> a
- vx :: [Double]
- vw :: [Double]
- vx' :: [Double]
- vw' :: [Double]
- norm :: Floating a => [a] -> a
- dist_direct :: [Double] -> [Double] -> Double
- dist_sketch :: Int -> [Double] -> [Double] -> Double
- chunks :: Int -> [a] -> [[a]]
- direct_div_by_sketch_distance :: Integer -> Int -> Double
- print_dist_ratios :: IO ()
- type DistanceFunc = [Double] -> [Double] -> Double
- type CorrFunc = [Double] -> [Double] -> Double
- corr_from_direct_distance :: CorrFunc
- corr_from_sketch_distance :: Int -> CorrFunc
- corr_from_distance :: DistanceFunc -> CorrFunc
- direct_diff_with_sketch_corr :: Integer -> Int -> (Double, Double)
- print_corr_diffs :: IO ()
- incremental_avg_and_var :: Integer -> Int -> Int -> [Double] -> [([Double], Double, Double)]
- compare_naive_and_incremental_avg_and_var :: Integer -> Int -> Int -> [Double] -> IO [Double]
- comparison_of_avg_and_var :: IO ()
- unit_random_vector_and_control_vector :: Integer -> Int -> Int -> RandomVector
- whole_random_vector :: Integer -> Int -> Int -> [Double]
- data RandomVector = RandomVector {}
- seq_one_sketch :: Integer -> Int -> Int -> [Double] -> [([Double], Double)]
- print_comparisons_of_single_sketch :: IO ()
- sequence_of_sketches :: Integer -> Int -> Int -> Int -> [Double] -> [([Double], [Double])]
- print_convolved_sketches :: Int -> IO ()
- push :: [a] -> [a] -> [a]
- print_comparisons_incr :: IO ()
- w0 :: Window
- w1 :: Window
- w1_copied_from_w0 :: Window
- len :: Num b => [a] -> b
- mean :: Fractional a => [a] -> a
- stddev :: [Double] -> Double
- variance_direct :: [Double] -> Double
- covariance_direct :: [Double] -> [Double] -> Double
- correlation_direct_01 :: [Double] -> [Double] -> Double
- correlation_direct_02 :: [Double] -> [Double] -> Double
- standardize :: [Double] -> [Double]
- prg64Doubles :: Integer -> [Double]
- bwins01 :: BasicWindows
- swins01 :: IntMap Window
- sq :: Floating a => a -> a
Notes
From: 2.2.3 Random Projection
sketch_of_t :: [Double]Source
Size 2 sketch vector of t constructed from random vectors v1 and v2.
>>>
sketch_of_t
[0.30000000000000004,-4.58]
From: 2.3 Matching Pursuit
Quote:
"Given a vector pool containing n time series V = (v_1,v_2,...,v_n), a target vector v_t, tolerated error \eps, and approximating vector set V_a = \theta. Define cos\theta a_1 = ṽ_t * ṽ_1 as the cosine between v_t and a vector v_1 in V. Here vector v = v / ‖v‖.
- Set i = 1;
- Search the pool V and find the vector v_1 whose |costheta_1| with respect to v_t is maximal;
- Compute the residue r = v_t - c_1v_1 where c_1 = (‖v_t‖/‖v_1‖) cos\theta. V_a = V_a ⋃ {v_1}
- If ‖r‖ < epsilon terminate, return V_a.
- Else set i = i + 1 and v_t = r, go back to 2."
From: 3.4.1 The Sketch Approach
Quote from the pdf:
" ... Quantitatively, given a point x ∈ Rᵐ, we compute its dot product with d random vectors rᵢ ∈ {1, -1}ᵐ. The first random projection of x is given by:
- y1 = (x*r_1, x*r_2, ..., x*r_d)
We compute 2b more such random projections y_1,...,y_2b_{+1}. If w is another point in Rᵐ and z_1,...,z_2b_{+1} are its projections using dot products with the same random vectors then the median of
- ‖y_1-z_1‖, ‖y_2-z_2‖, ..., ‖y_2b_{+1}-z_2b_{+1}‖
is a good estimate of ‖x-w‖. It lies within a theta(1/d) factor of ‖x-w‖ with probability 1 - (1/2)ᵇ. ... "
And,
"...Our approach is to use a \"structured\" random vector. The apparently oxymoronic idea is to form each structured random vector r from the concatenation of nb = sw/nb random vectors: r = s_1,...s_{nb}, where each si has length bw. Further each si is either u or -u, and u is a random vector in {1, -1}ᵐ. This choice is determined by a random binary nb-vector b: if b = 1, si = u and if bi = 0, si = -u. The structured approach leads to an asymptonic performance of O(nb) integer additions and O(log bw) floating point operations per datum and per random vector..."
dist_direct :: [Double] -> [Double] -> DoubleSource
Direct distance.
Sketch distance.
direct_div_by_sketch_distance :: Integer -> Int -> DoubleSource
d / s, where d is direct distance and s is sketch distance.
print_dist_ratios :: IO ()Source
Print direct_div_by_sketch
with different sketch sizes.
From: 3.4.2 Partitioning Sketch Vectors
Quote from the pdf:
"... Note that we use correlation and distance more or less interchangebly because one can be computed from the other once the data is normalized. Pearson correlation is related to Euclidean distance as follows:
D^2(x̂,ŷ) = 2(1 - corr(x,y))
Here x̂ and ŷ are obtained from the raw time sereis by computing:"
x - avg(x) x̂ = ———————————– var(x)
type DistanceFunc = [Double] -> [Double] -> DoubleSource
corr_from_direct_distance :: CorrFuncSource
From above:
D^2(x̂,ŷ) corr(x,y) = 1 - ——————— 2
corr_from_sketch_distanceSource
Like corr_from_direct_distance
, but using dist_sketch
.
corr_from_distance :: DistanceFunc -> CorrFuncSource
Using normalized input vector X and Y.
print_corr_diffs :: IO ()Source
From: 3.4.3 Algorithmic Framework
"... Suppose we are seeking points within some distance d in the original time series space.
- Partition each sketch vector s of size N into groups of some size g.
- The ith group of each sketch vector s is placed in the ith grid structure of dimension g.
- If two sketch vectors s1 and s2 are within distance c ✕ d in more than a fraction f of the groups, then the corresponding windows are candidate highly correlated windows and will be checked exactly."
From: 3.5 The Issues in Implementation
"... data normalization and its sketch within a sliding window are as follows (k = sw):
X_k - avg(X_k) X̂_k = —————————————— var(X_k) Xsk^k = X̂_k⋅R_k
... difficulty lies in that avg(X_k) and var(X_k) change over each basic window .... we will show that the updating is trivial and the sketch needs to be computed only once."
:: Integer | Random seed |
-> Int | bw, basic window size |
-> Int | sw, sliding window size |
-> [Double] | Input data |
-> [([Double], Double, Double)] | Sliding window contents, avg(Xₖ), and var(Xₖ) (k = sw) |
Update avg and var in a manner described in section 3.5.
compare_naive_and_incremental_avg_and_var :: Integer -> Int -> Int -> [Double] -> IO [Double]Source
Self descriptive function to show the comparison of naive and incremental avg and var.
comparison_of_avg_and_var :: IO ()Source
Sample comparison of avg and var, as shown in the function name.
From: 3.6.3 Performance Tests
"... To make the comparison concrete, we should specify our software architecture a bit more. The multi-dimentional search structure we use is in fact a grid structure. The reason we have rejected more sophisticated structures is that we are asking a radius query: which windows (represented as points) are within a certain distance of a given point? A multi-scale structure such as a quadtree or R-tree would not help in this case. Moreover, the grid structure can be stored densely in a hash table so empty cells take up no space. .."
From: Appendix B: Structured Random Projection for Sliding Window
"Given a sliding window length sw and a basic window length bw, we show how to compute the sketches for the sliding windows starting at each timepoint ... we will use a far more efficient approach based on the convolution between "structured random vector" and a data vector of length sw."
And from 3.5: The Issues in Implementation
$ "... To normalize the whole sliding window based on the same average and variance, the sketch of the basic window Xbwⁱ,i=0,1,...,nb will be updated as follows.
bw-1 (Xbwⁱ⋅R) - avgsw ∑ Ri i=0 Xskⁱ = —————————————————————————— varsw
where avg will be updated by removing the oldest basic window and adding the new arrival Xnb, ...."
unit_random_vector_and_control_vectorSource
:: Integer | Random seed. |
-> Int | nb, sliding window size / basic window size. |
-> Int | bw, basic window size. |
-> RandomVector |
Random vector rbw:
- rbw = (r₀,r₁,...,rbw-1
and control vector b:
- b = (b₀,b₁,...,bnb-1
:: Integer | Random seed. |
-> Int | nb, sliding window size / basic window size. |
-> Int | bw, basic window size. |
-> [Double] | r. |
Random vector for sliding window, built from unit random vector and control vector:
- r = (rbw⋅b₀,rbw⋅b₁,...,rbw⋅bnb-1
print_comparisons_of_single_sketch :: IO ()Source
Print comparisions of single sketch.
Showing sliding window contents, sketch from structured vector, sketch from direct dot product, diff of the two, and ratio of the two.
:: Integer | Random seed. |
-> Int | Size of sketch. |
-> Int | bw, basic window size. |
-> Int | sw, sliding window size. |
-> [Double] | Input data stream. |
-> [([Double], [Double])] | Sliding window contents and sketches. |
Scratch of: Figure B.6: Structured convolution procedure every basic window. Still has "Curse of Dimensionality".
This time, sketch size is controlled with given argument.
XXX: Normalization is improper.
print_convolved_sketches :: Int -> IO ()Source
push :: [a] -> [a] -> [a]Source
Push the contents of first list to second list, returning list having same length as second list.
print_comparisons_incr :: IO ()Source
Show ratio of correlation value computed with direct function and computed with standardized convolved sketch distance.
From: Appendix C: An Upper Bound of the Grid Size
Quote:
"Now let's examine how to embed this sketch filter in a grid structure. At first, we assume our parameter group is (N,g,c,f) and therfore there are totally ng = N/g groups. We will assign one grid structure to each sketch group. For each grid structure the critical parameter is the largest value A and diameter a. Now let's bound the size of A...".
Scratches
mean :: Fractional a => [a] -> aSource
variance_direct :: [Double] -> DoubleSource
covariance_direct :: [Double] -> [Double] -> DoubleSource
correlation_direct_01 :: [Double] -> [Double] -> DoubleSource
correlation_direct_02 :: [Double] -> [Double] -> DoubleSource
standardize :: [Double] -> [Double]Source
prg64Doubles :: Integer -> [Double]Source
Generate list of doubles between 0 to 1 with PRG64
.