# text-builder-linear [![Hackage](http://img.shields.io/hackage/v/text-builder-linear.svg)](https://hackage.haskell.org/package/text-builder-linear) [![Stackage LTS](http://stackage.org/package/text-builder-linear/badge/lts)](http://stackage.org/lts/package/text-builder-linear) [![Stackage Nightly](http://stackage.org/package/text-builder-linear/badge/nightly)](http://stackage.org/nightly/package/text-builder-linear) _Linear types for linear times!_ Builder for strict `Text` and `ByteString`, based on linear types. It consistently outperforms lazy `Builder` from `text` as well as a strict builder from `text-builder`, and scales better. ## Example ```haskell > :set -XOverloadedStrings > import Data.Text.Builder.Linear > fromText "foo" <> fromChar '_' <> fromDec (42 :: Int) "foo_42" ``` ## Design String builders in Haskell serve the same purpose as `StringBuilder` in Java to prevent quadratic slow down in concatenation. Classic builders such as `Data.Text.Lazy.Builder` are lazy and fundamentally are [`dlist`](https://hackage.haskell.org/package/dlist) with bells and whistles: instead of actually concatenating substrings we compose actions, which implement concatenation, building a tree of thunks. The tree can be forced partially, left-to-right, producing chunks of strict `Text`, combined into a lazy one. Neither input, nor output need to be materialized in full, which potentially allows for fusion. Such builders allow linear time complexity, but constant factors are relatively high, because thunks are expensive. To a certain degree this is mitigated by inlining, which massively reduces number of nodes. Strict builders such as [`text-builder`](https://hackage.haskell.org/package/text-builder) offer another design: they first inspect their input in full to determine output length, then allocate a buffer of required size and fill it in one go. If everything inlines nicely, the length may be known in compile time, which gives blazingly fast runtime. In more complex cases it still builds a tree of thunks and forces all inputs to be materialized. This package offers two interfaces. One is a mutable `Buffer` with linear API, which operates very similar to `StringBuilder` in Java. It allocates a buffer with extra space at the ends to append new strings. If there is not enough free space to insert new data, it allocates a twice larger buffer and copies itself there. The dispatch happens in runtime, so we do not need to inspect and materialize all inputs beforehand; and inlining is mostly irrelevant. Exponential growth provides for amortized linear time. Such structure can be implemented without linear types, but that would greatly affect user experience by polluting everything with `ST` monad. Users are encouraged to use `Buffer` API, and built-in benchmarks refer to it. The second interface is more traditional `newtype Builder = Builder (Buffer ⊸ Buffer)` with `Monoid` instance. This type provides easy migration from other builders, but may suffer from insufficient inlining, allocating a tree of thunks. It is still significantly faster than `Data.Text.Lazy.Builder`, as witnessed by benchmarks for `blaze-builder` below. ## Case study Let's benchmark builders, which concatenate all `Char` from `minBound` to `maxBound`, producing a large `Text`: ```haskell #!/usr/bin/env cabal {- cabal: build-depends: base, tasty-bench, text, text-builder, text-builder-linear ghc-options: -O2 -} import qualified Data.Text as T import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.Builder as TLB import qualified Text.Builder as TB import qualified Data.Text.Builder.Linear as TBL import System.Environment (getArgs) import Test.Tasty.Bench mkBench :: Monoid a => String -> (Char -> a) -> (a -> Int) -> Benchmark mkBench s f g = bench s $ nf (g . foldMap f . enumFromTo minBound) maxBound {-# INLINE mkBench #-} main :: IO () main = defaultMain [ mkBench "text, lazy" TLB.singleton (fromIntegral . TL.length . TLB.toLazyText) , mkBench "text, strict" TLB.singleton (T.length . TL.toStrict . TLB.toLazyText) , mkBench "text-builder" TB.char (T.length . TB.run) , mkBench "text-builder-linear" TBL.fromChar (T.length . TBL.runBuilder) ] ``` Running this program with `cabal run Main.hs -- +RTS -T` yields following results: ``` text, lazy: 4.25 ms ± 107 μs, 11 MB allocated, 912 B copied text, strict: 7.18 ms ± 235 μs, 24 MB allocated, 10 MB copied text-builder: 80.1 ms ± 3.0 ms, 218 MB allocated, 107 MB copied text-builder-linear: 5.37 ms ± 146 μs, 44 MB allocated, 78 KB copied ``` The first result seems the best both in time and memory and corresponds to the usual `Text` builder, where we do not materialize the entire result at all. It builds chunks of lazy `Text` lazily and consumes them at once by `TL.length`. Thus there are 11 MB of allocations in nursery, none of which survive generation 0 garbage collector, so nothing is copied. The second result is again the usual `Text` builder, but emulates a strict consumer: we materialize a strict `Text` before computing length. Allocation are doubled, and half of them (corresponding to the strict `Text`) survive to the heap. Time is also almost twice longer, but still quite good. The third result is for `text-builder` and demonstrates how bad things could go with strict builders, aiming to precompute the precise length of the buffer: allocating a thunk per char is tremendously slow and expensive. The last result corresponds to the current package. We generate a strict `Text` by growing and reallocating the buffer, thus allocations are quite high. Nevertheless, it is already faster than the usual `Text` builder with strict consumer and does not strain the garbage collector. Things get very different if we remove `{-# INLINE mkBench #-}`: ``` text, lazy: 36.9 ms ± 599 μs, 275 MB allocated, 30 KB copied text, strict: 44.7 ms ± 1.3 ms, 287 MB allocated, 25 MB copied text-builder: 77.6 ms ± 2.2 ms, 218 MB allocated, 107 MB copied text-builder-linear: 5.35 ms ± 212 μs, 44 MB allocated, 79 KB copied ``` Builders from `text` package degrade rapidly, 6-8x slower and 10-20x more allocations. That's because their constant factors rely crucially on everything getting inlined, which makes their performance fragile and unreliable in large-scale applications. On the bright side of things, our builder remains as fast as before and now is a clear champion. ## Benchmarks for `Text` Measured with GHC 9.6 on aarch64: |Group / size|`text`|`text-builder`| |This package| | |------------|-----:|-------------:|-:|-----------:|-:| | **Text** |||||| |1|47.4 ns|24.2 ns|0.51x|35.2 ns|0.74x| |10|509 ns|195 ns|0.38x|197 ns|0.39x| |100|4.94 μs|1.74 μs|0.35x|1.66 μs|0.34x| |1000|52.6 μs|17.0 μs|0.32x|15.0 μs|0.28x| |10000|646 μs|206 μs|0.32x|155 μs|0.24x| |100000|12.2 ms|3.34 ms|0.27x|2.60 ms|0.21x| |1000000|159 ms|55.3 ms|0.35x|16.1 ms|0.10x| | **Char** |||||| |1|46.9 ns|21.1 ns|0.45x|22.3 ns|0.48x| |10|229 ns|152 ns|0.66x|79.9 ns|0.35x| |100|2.00 μs|1.23 μs|0.61x|618 ns|0.31x| |1000|21.9 μs|10.3 μs|0.47x|6.28 μs|0.29x| |10000|285 μs|153 μs|0.54x|68.5 μs|0.24x| |100000|7.70 ms|4.08 ms|0.53x|992 μs|0.13x| |1000000|110 ms|106 ms|0.96x|9.19 ms|0.08x| | **Decimal** |||||| |1|97.7 ns|872 ns|8.92x|80.2 ns|0.82x| |10|864 ns|8.72 μs|10.09x|684 ns|0.79x| |100|9.07 μs|93.5 μs|10.32x|7.25 μs|0.80x| |1000|92.4 μs|1.06 ms|11.44x|67.5 μs|0.73x| |10000|1.13 ms|13.4 ms|11.88x|667 μs|0.59x| |100000|18.7 ms|141 ms|7.57x|7.57 ms|0.41x| |1000000|229 ms|1.487 s|6.48x|67.8 ms|0.30x| | **Hexadecimal** |||||| |1|403 ns|749 ns|1.86x|43.9 ns|0.11x| |10|3.94 μs|7.66 μs|1.94x|308 ns|0.08x| |100|42.8 μs|89.0 μs|2.08x|2.88 μs|0.07x| |1000|486 μs|986 μs|2.03x|27.7 μs|0.06x| |10000|7.10 ms|12.6 ms|1.77x|283 μs|0.04x| |100000|80.1 ms|133 ms|1.65x|3.53 ms|0.04x| |1000000|867 ms|1.340 s|1.55x|28.9 ms|0.03x| | **Double** |||||| |1|7.56 μs|18.3 μs|2.42x|414 ns|0.05x| |10|76.5 μs|188 μs|2.46x|4.23 μs|0.06x| |100|754 μs|2.35 ms|3.11x|44.4 μs|0.06x| |1000|7.94 ms|25.8 ms|3.25x|436 μs|0.05x| |10000|79.1 ms|285 ms|3.60x|4.90 ms|0.06x| |100000|796 ms|2.938 s|3.69x|45.1 ms|0.06x| |1000000|8.003 s|32.411 s|4.05x|436 ms|0.05x| If you are not convinced by synthetic data, here are benchmarks for [`blaze-markup` after migration to `Data.Text.Builder.Linear`](https://github.com/Bodigrim/blaze-markup): ``` bigTable 992 μs ± 80 μs, 49% less than baseline basic 4.35 μs ± 376 ns, 47% less than baseline wideTree 1.26 ms ± 85 μs, 53% less than baseline wideTreeEscaping 217 μs ± 7.8 μs, 58% less than baseline deepTree 242 μs ± 23 μs, 48% less than baseline manyAttributes 811 μs ± 79 μs, 58% less than baseline customAttribute 1.68 ms ± 135 μs, 56% less than baseline ``` ## Benchmarks for `ByteString` Somewhat surprisingly, `text-builder-linear` now offers rendering to strict `ByteString` as well. It is consistently faster than `bytestring` when a string gets over 32k (which is `defaultChunkSize` for `bytestring` builder). For mid-sized strings `bytestring` is slightly faster in certain disciplines, mostly by virtue of using `cbits` via FFI, while this package remains 100% native Haskell. Benchmarks below were measured with GHC 9.6 on aarch64 and include comparison to [`bytestring-strict-builder`](https://hackage.haskell.org/package/bytestring-strict-builder): |Group / size|`bytestring`|`…-strict-builder`| |This package| | |------------|-----------:|-----------------:|-:|-----------:|-:| | **Text** |||||| |1|106 ns|33.5 ns|0.32x|35.2 ns|0.33x| |10|322 ns|217 ns|0.68x|197 ns|0.61x| |100|2.49 μs|1.89 μs|0.76x|1.66 μs|0.67x| |1000|21.8 μs|18.5 μs|0.85x|15.0 μs|0.69x| |10000|231 μs|212 μs|0.92x|155 μs|0.67x| |100000|3.97 ms|3.54 ms|0.89x|2.60 ms|0.66x| |1000000|81.2 ms|51.5 ms|0.63x|16.1 ms|0.20x| | **Char** |||||| |1|99.0 ns|19.4 ns|0.20x|22.3 ns|0.23x| |10|270 ns|82.9 ns|0.31x|79.9 ns|0.30x| |100|1.77 μs|723 ns|0.41x|618 ns|0.35x| |1000|20.4 μs|8.37 μs|0.41x|6.28 μs|0.31x| |10000|322 μs|129 μs|0.40x|68.5 μs|0.21x| |100000|10.4 ms|2.50 ms|0.24x|992 μs|0.10x| |1000000|143 ms|67.4 ms|0.47x|9.19 ms|0.06x| | **Decimal** |||||| |1|152 ns|174 ns|1.14x|80.2 ns|0.53x| |10|685 ns|1.55 μs|2.26x|684 ns|1.00x| |100|5.88 μs|17.2 μs|2.93x|7.25 μs|1.23x| |1000|60.3 μs|196 μs|3.25x|67.5 μs|1.12x| |10000|648 μs|4.25 ms|6.57x|667 μs|1.03x| |100000|11.2 ms|62.8 ms|5.62x|7.57 ms|0.68x| |1000000|150 ms|655 ms|4.37x|67.8 ms|0.45x| | **Hexadecimal** |||||| |1|94.7 ns|||43.9 ns|0.46x| |10|255 ns|||308 ns|1.21x| |100|1.72 μs|||2.88 μs|1.67x| |1000|18.9 μs|||27.7 μs|1.46x| |10000|250 μs|||283 μs|1.13x| |100000|6.94 ms|||3.53 ms|0.51x| |1000000|93.2 ms|||28.9 ms|0.31x| | **Double** |||||| |1|457 ns|||414 ns|0.91x| |10|3.94 μs|||4.23 μs|1.07x| |100|40.3 μs|||44.4 μs|1.10x| |1000|398 μs|||436 μs|1.10x| |10000|5.65 ms|||4.90 ms|0.87x| |100000|63.3 ms|||45.1 ms|0.71x| |1000000|673 ms|||436 ms|0.65x|