gl-block
Using
Primitive types should have the Block
instance.
With that in place, you can build structures and get Storable instances derived generically, according to intented usage.
import GHC.Generics (Generic)
import Graphics.Gl.Block (Block, Packed(..), Std140(..), Std430(..))
-- | Attribute streams can be packed tightly together
data VertexAttrs = VertexAttrs
{ color :: Vec4
, texCoords :: Vec2
}
deriving stock (Eq, Ord, Show, Generic) -- The regular stuff and Generic
deriving anyclass Block -- The layout class, with Generic defaults
deriving Storable via (Packed VertexAttrs) -- Free goodies!
-- | Uniform data require jumping through padding and alignment flaming hoop.
-- You can use derive-storable-plugin or hs2c instead, but there are gotchas.
data SceneUniform = SceneUniform
{ projection :: Mat4
, viewPosition :: Vec3 -- Here comes the jazz
, viewDirection :: Vec3
}
deriving stock (Eq, Ord, Show, Generic)
deriving anyclass Block
deriving Storable via (Std140 VertexAttrs) -- With comfy padding
-- | Shader buffer objects are less vacuous, but the rules are specific to the domain.
data Material = Material
{ baseColor :: Vec4
, metallicRoughness :: Vec2
, emission :: Vec4
}
deriving stock (Eq, Ord, Show, Generic)
deriving anyclass Block
deriving Storable via (Std430 VertexAttrs) -- Less alignment, less calculations
Benchmarks
The benchmark consists of filling a Storable vector.
- Packed layout is on par with manual instances.
- Std140 is slower, but not catastrophically so.
- Std430 seems to regain some performance due to being a tad simpler.
There's only one "manual" case standing for all the layouts since it would only be different in pointer offsets.
And no way in hell I'm going to calculate them by hand!
struct
10
manual: OK (2.26s)
60.6 ns ± 5.4 ns
packed: OK (1.21s)
63.7 ns ± 5.8 ns
std140: OK (1.19s)
256 ns ± 26 ns
std430: OK (0.13s)
269 ns ± 24 ns
1000
manual: OK (1.95s)
3.34 μs ± 244 ns
packed: OK (1.96s)
3.34 μs ± 235 ns
std140: OK (1.97s)
6.70 μs ± 425 ns
std430: OK (1.44s)
4.89 μs ± 93 ns
1000000
manual: OK (2.59s)
2.22 ms ± 176 μs
packed: OK (1.50s)
2.47 ms ± 24 μs
std140: OK (2.70s)
4.54 ms ± 431 μs
std430: OK (1.08s)
3.28 ms ± 244 μs
Caveat: nested structures have degraded performance.