Migration Guide for 1.0 ---- Between the last major version (0.3.2.1) and the current major epoch (1.0), many API-related constructs have changed, and I'd like to justify them here and now so that users may have an immortalized explanation for what is most likely a disruptive change to their code. ## A faster loop First, I'd like to say that I don't *like* breaking people's code. As an author and maintainer, I try and make sure that any API breakages are justified either by a significant UX improvement, or by a measurable performance increase large enough to warrant such a breakage. As such, I believe both of these criteria are met by the 0.3.x -> 1.0 upgrade: not only is the API safer to use, but the use of type data to establish the provenance of values encoded by this library also allows the performance-sensitive loops to be much cleaner, eschewing error checking where type data suffices. To prove this point, I've benchmarked the library between these last two epochs. The benchmarks say it all (all benchmarks are done on a Thinkpad P15 Gen 2 Intel i9-11950H, 64GB DDR4, Ubuntu 22.04 with GHC 8.10.7 stock, -O2): In `base16-0.3.2.1`: ``` benchmarking decode/25/base16-bytestring time 31.66 ns (31.64 ns .. 31.69 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 31.62 ns (31.60 ns .. 31.67 ns) std dev 113.1 ps (71.94 ps .. 207.9 ps) benchmarking decode/25/base16 time 32.31 ns (32.27 ns .. 32.35 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 32.33 ns (32.30 ns .. 32.42 ns) std dev 178.2 ps (84.80 ps .. 340.1 ps) benchmarking decode/100/base16-bytestring time 74.31 ns (74.27 ns .. 74.35 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 74.37 ns (74.34 ns .. 74.41 ns) std dev 122.0 ps (102.0 ps .. 147.8 ps) benchmarking decode/100/base16 time 83.74 ns (83.70 ns .. 83.78 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 83.57 ns (83.43 ns .. 83.68 ns) std dev 380.3 ps (273.2 ps .. 473.3 ps) benchmarking decode/1k/base16-bytestring time 582.5 ns (582.3 ns .. 582.8 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 582.8 ns (582.6 ns .. 583.1 ns) std dev 791.4 ps (632.0 ps .. 1.101 ns) benchmarking decode/1k/base16 time 686.1 ns (685.7 ns .. 686.4 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 686.2 ns (685.9 ns .. 686.6 ns) std dev 1.086 ns (910.3 ps .. 1.357 ns) benchmarking decode/10k/base16-bytestring time 5.640 μs (5.633 μs .. 5.649 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 5.663 μs (5.656 μs .. 5.671 μs) std dev 25.71 ns (21.29 ns .. 29.83 ns) benchmarking decode/10k/base16 time 6.628 μs (6.609 μs .. 6.649 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 6.614 μs (6.609 μs .. 6.622 μs) std dev 20.78 ns (12.02 ns .. 37.27 ns) benchmarking decode/100k/base16-bytestring time 58.41 μs (58.38 μs .. 58.45 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 58.41 μs (58.38 μs .. 58.45 μs) std dev 111.5 ns (90.90 ns .. 152.0 ns) benchmarking decode/100k/base16 time 66.40 μs (66.21 μs .. 66.64 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 66.55 μs (66.47 μs .. 66.62 μs) std dev 264.0 ns (209.8 ns .. 332.3 ns) benchmarking decode/1mm/base16-bytestring time 577.4 μs (576.6 μs .. 578.1 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 577.0 μs (576.5 μs .. 577.6 μs) std dev 1.997 μs (1.661 μs .. 2.474 μs) benchmarking decode/1mm/base16 time 670.9 μs (670.3 μs .. 671.5 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 671.1 μs (670.7 μs .. 671.9 μs) std dev 2.003 μs (1.211 μs .. 3.227 μs) ``` vs in `base64-1.0.0.0`: ``` benchmarking decode/25/base16-bytestring time 24.29 ns (24.27 ns .. 24.32 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 24.27 ns (24.24 ns .. 24.30 ns) std dev 95.03 ps (76.90 ps .. 125.9 ps) benchmarking decode/25/base16 time 25.64 ns (25.49 ns .. 25.81 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 25.64 ns (25.57 ns .. 25.72 ns) std dev 262.9 ps (220.7 ps .. 312.9 ps) benchmarking decode/100/base16-bytestring time 75.10 ns (74.95 ns .. 75.31 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 75.33 ns (75.25 ns .. 75.40 ns) std dev 267.5 ps (202.6 ps .. 340.3 ps) benchmarking decode/100/base16 time 60.99 ns (60.92 ns .. 61.05 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 60.95 ns (60.89 ns .. 61.03 ns) std dev 238.1 ps (186.6 ps .. 325.0 ps) benchmarking decode/1k/base16-bytestring time 606.2 ns (605.3 ns .. 607.4 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 606.4 ns (605.5 ns .. 609.2 ns) std dev 4.832 ns (1.865 ns .. 9.636 ns) benchmarking decode/1k/base16 time 472.5 ns (472.0 ns .. 473.0 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 471.6 ns (471.3 ns .. 472.0 ns) std dev 1.165 ns (965.8 ps .. 1.434 ns) benchmarking decode/10k/base16-bytestring time 5.885 μs (5.881 μs .. 5.890 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 5.891 μs (5.888 μs .. 5.895 μs) std dev 13.03 ns (10.87 ns .. 16.58 ns) benchmarking decode/10k/base16 time 4.560 μs (4.551 μs .. 4.567 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 4.549 μs (4.544 μs .. 4.554 μs) std dev 16.61 ns (14.04 ns .. 19.41 ns) benchmarking decode/100k/base16-bytestring time 58.71 μs (58.56 μs .. 58.84 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 58.59 μs (58.54 μs .. 58.66 μs) std dev 201.4 ns (163.3 ns .. 251.0 ns) benchmarking decode/100k/base16 time 45.74 μs (45.69 μs .. 45.80 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 45.72 μs (45.67 μs .. 45.78 μs) std dev 172.5 ns (146.4 ns .. 209.1 ns) benchmarking decode/1mm/base16-bytestring time 584.6 μs (583.1 μs .. 586.7 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 587.8 μs (586.7 μs .. 589.0 μs) std dev 3.931 μs (3.108 μs .. 5.364 μs) benchmarking decode/1mm/base16 time 459.0 μs (458.5 μs .. 459.7 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 458.9 μs (458.4 μs .. 459.5 μs) std dev 1.839 μs (1.355 μs .. 2.951 μs) ``` Benchmarks are included in this repo for you to reproduce these results on your own. You can see a parity in the `encode` step between the previous library iterations and the new epoch, with a *marked* improvement in decode speed (up to 25% faster on average between the old and new versions in the optimal case, and up to 40% in the suboptimal case) which justifies the performance aspect to me. Without deferring to pipelining instructions, hex encoding can only get so fast. In the future, this change also opens the library up to an optimal SIMD implementations. ## A sounder api Second, I do not believe that these changes are unsound or overburdensome to the point that a migration to the new paradigm would be untenable. While it may be inconvenient to unwrap `Base16` types, in the `encode` case (all one must do is call `extractBase16` to extract the value from its wrapper, all caveats implied), and in the case of `decode`, an untyped variant is supplied, and is semantically consistent with the old behavior (the loop is the same). Hence, a migration is fairly easy to sketch out: ``` "encodeBase16'" -> "extractBase16 . encodeBase16'" "encodeBase16" -> "extractBase16 . encodeBase16" "decodebase16" -> "decodeBase16Untyped" "decodeBase16Unpadded" -> "decodeBase16UnpaddedUntyped" "decodeBase16Padded" -> "decodeBase16PaddedUntyped" "decodeBase16W*With" -> "decodeBase16*WithUntyped" ``` And that is all. In order to make use of the new loops, one must only `assertBase16` and proceed with using `decodeBase16` as usual in order to decode. You'll note that an untyped `encodeBase16` is not supplied, and this is due to the fact that it's trivial to extract a `Base16` encoded value once you have it. However, I want to encourage people to use the new API, so I have only supplied a decode with error checking in the untyped case, because sometimes we deal with other people's data and cannot establish provenance. In the encode case, I would rather keep that provenance a part of the API, and the user may opt to strip that data upon sending to others or around their systems. It's not my problem at that point!