# Geomancy > Linear is nice, but slow. Those are naughty, but a bit faster. * All data types are monomorphic, unpacked and specialized. * `Mat4` and `Vec4` are `ByteArray#`. * `Mat4`x`Mat4` and `Mat4`x`Vec4` is done with SIMD. ## Matrix layout CPU-side matrices compose in MVP order, optimized for `mconcat (local1 : local2 : ... : root)` operation. GPU-side, in GLSL, it is `PVM * v`. ## The Numbers Storing a list of 1000 transformations (e.g. rendering instance data): ``` benchmarking 4x4 poke/1000/geomancy time 11.76 μs (11.66 μs .. 11.92 μs) 0.999 R² (0.998 R² .. 1.000 R²) mean 11.75 μs (11.69 μs .. 11.86 μs) std dev 283.4 ns (199.0 ns .. 399.0 ns) variance introduced by outliers: 26% (moderately inflated) ``` If you're willing to adjust your shaders, it's only 2.4 times slower. ``` benchmarking 4x4 poke/1000/linear time 28.29 μs (28.21 μs .. 28.38 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 28.40 μs (28.34 μs .. 28.50 μs) std dev 267.4 ns (145.5 ns .. 419.9 ns) ``` Keeping your shaders straight make the affair 6.1x slower. ``` benchmarking 4x4 poke/1000/linear/T time 73.70 μs (73.06 μs .. 74.49 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 72.77 μs (72.50 μs .. 73.22 μs) std dev 1.129 μs (793.5 ns .. 1.580 μs) ``` Folding down a `gloss`-style scene graph is where it is all started: ``` benchmarking 4x4 multiply/1000/geomancy time 20.79 μs (20.77 μs .. 20.83 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 20.80 μs (20.78 μs .. 20.83 μs) std dev 76.71 ns (60.01 ns .. 99.06 ns) benchmarking 4x4 multiply/1000/linear time 173.9 μs (173.6 μs .. 174.4 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 173.5 μs (173.2 μs .. 174.4 μs) std dev 1.733 μs (727.8 ns .. 3.422 μs) ``` Add that time to the poking that'll follow. Sure, it is in the lower microseconds range, but this budget can be used elsewhere.