gl-block
Using
Primitive types should have the Block instance.
With that in place, you can build structures and get Storable instances derived generically, according to intented usage.
import GHC.Generics (Generic)
import Graphics.Gl.Block (Block, Packed(..), Std140(..), Std430(..))
-- | Attribute streams can be packed tightly together
data VertexAttrs = VertexAttrs
{ color :: Vec4
, texCoords :: Vec2
}
deriving stock (Eq, Ord, Show, Generic) -- The regular stuff and Generic
deriving anyclass Block -- The layout class, with Generic defaults
deriving Storable via (Packed VertexAttrs) -- Free goodies!
-- | Uniform data require jumping through padding and alignment flaming hoop.
-- You can use derive-storable-plugin or hs2c instead, but there are gotchas.
data SceneUniform = SceneUniform
{ projection :: Mat4
, viewPosition :: Vec3 -- Here comes the jazz
, viewDirection :: Vec3
}
deriving stock (Eq, Ord, Show, Generic)
deriving anyclass Block
deriving Storable via (Std140 VertexAttrs) -- With comfy padding
-- | Shader buffer objects are less vacuous, but the rules are specific to the domain.
data Material = Material
{ baseColor :: Vec4
, metallicRoughness :: Vec2
, emission :: Vec4
}
deriving stock (Eq, Ord, Show, Generic)
deriving anyclass Block
deriving Storable via (Std430 VertexAttrs) -- Less alignment, less calculations
Benchmarks
The benchmark consists of filling a Storable vector.
- Packed layout is on par with manual instances.
- Std140 is slower, but not catastrophically so.
- Std430 seems to regain some performance due to being a tad simpler.
There's only one "manual" case standing for all the layouts since it would only be different in pointer offsets.
And no way in hell I'm going to calculate them by hand!
struct
10
manual: OK (0.36s)
19.0 ns ± 1.9 ns, 325 B allocated, 0 B copied, 117 MB peak memory
packed: OK (0.68s)
18.5 ns ± 782 ps, 327 B allocated, 0 B copied, 117 MB peak memory, 0.98x
std140: OK (0.58s)
64.9 ns ± 4.0 ns, 827 B allocated, 0 B copied, 117 MB peak memory, 3.42x
std430: OK (0.28s)
62.1 ns ± 3.3 ns, 658 B allocated, 0 B copied, 117 MB peak memory, 3.28x
1000
manual: OK (0.41s)
334 ns ± 30 ns, 23 KB allocated, 0 B copied, 135 MB peak memory
packed: OK (0.40s)
320 ns ± 23 ns, 23 KB allocated, 0 B copied, 135 MB peak memory, 0.96x
std140: OK (0.44s)
733 ns ± 64 ns, 47 KB allocated, 0 B copied, 137 MB peak memory, 2.19x
std430: OK (0.55s)
460 ns ± 30 ns, 32 KB allocated, 0 B copied, 137 MB peak memory, 1.38x
1000000
manual: OK (0.42s)
344 μs ± 27 μs, 23 MB allocated, 167 B copied, 141 MB peak memory
packed: OK (0.42s)
350 μs ± 14 μs, 23 MB allocated, 164 B copied, 141 MB peak memory, 1.02x
std140: OK (0.92s)
817 μs ± 36 μs, 46 MB allocated, 241 B copied, 163 MB peak memory, 2.38x
std430: OK (1.20s)
523 μs ± 20 μs, 30 MB allocated, 157 B copied, 164 MB peak memory, 1.52x
Caveat: nested structures have degraded performance.