| Copyright | (c) 2010 Jasper Van der Jeugt (c) 2010 - 2011 Simon Meier | 
|---|---|
| License | BSD3-style (see LICENSE) | 
| Maintainer | Simon Meier <iridcode@gmail.com> | 
| Portability | GHC | 
| Safe Haskell | Trustworthy | 
| Language | Haskell98 | 
Data.ByteString.Builder
Description
Builders are used to efficiently construct sequences of bytes from
  smaller parts.
Typically,
  such a construction is part of the implementation of an encoding, i.e.,
  a function for converting Haskell values to sequences of bytes.
Examples of encodings are the generation of the sequence of bytes
  representing a HTML document to be sent in a HTTP response by a
  web application or the serialization of a Haskell value using
  a fixed binary format.
For an efficient implementation of an encoding,
  it is important that (a) little time is spent on converting
  the Haskell values to the resulting sequence of bytes and
  (b) that the representation of the resulting sequence
  is such that it can be consumed efficiently.
Builders support (a) by providing an O(1) concatentation operation
  and efficient implementations of basic encodings for Chars, Ints,
  and other standard Haskell values.
They support (b) by providing their result as a lazy ByteString,
  which is internally just a linked list of pointers to chunks
  of consecutive raw memory.
Lazy ByteStrings can be efficiently consumed by functions that
  write them to a file or send them over a network socket.
Note that each chunk boundary incurs expensive extra work (e.g., a system call)
  that must be amortized over the work spent on consuming the chunk body.
Builders therefore take special care to ensure that the
  average chunk size is large enough.
The precise meaning of large enough is application dependent.
The current implementation is tuned
  for an average chunk size between 4kb and 32kb,
  which should suit most applications.
As a simple example of an encoding implementation, we show how to efficiently convert the following representation of mixed-data tables to an UTF-8 encoded Comma-Separated-Values (CSV) table.
data Cell = StringC String
          | IntC Int
          deriving( Eq, Ord, Show )
type Row   = [Cell]
type Table = [Row]We use the following imports and abbreviate mappend to simplify reading.
import qualified Data.ByteString.Lazy as L import Data.ByteString.Builder import Data.Monoid import Data.Foldable (foldMap) import Data.List (intersperse) infixr 4 <> (<>) ::Monoidm => m -> m -> m (<>) =mappend
CSV is a character-based representation of tables. For maximal modularity,
we could first render Tables as Strings and then encode this String
using some Unicode character encoding. However, this sacrifices performance
due to the intermediate String representation being built and thrown away
right afterwards. We get rid of this intermediate String representation by
fixing the character encoding to UTF-8 and using Builders to convert
Tables directly to UTF-8 encoded CSV tables represented as lazy
ByteStrings.
encodeUtf8CSV :: Table -> L.ByteString encodeUtf8CSV =toLazyByteString. renderTable renderTable :: Table -> Builder renderTable rs =mconcat[renderRow r <>charUtf8'\n' | r <- rs] renderRow :: Row -> Builder renderRow [] =memptyrenderRow (c:cs) = renderCell c <> mconcat [ charUtf8 ',' <> renderCell c' | c' <- cs ] renderCell :: Cell -> Builder renderCell (StringC cs) = renderString cs renderCell (IntC i) =intDeci renderString :: String -> Builder renderString cs = charUtf8 '"' <> foldMap escape cs <> charUtf8 '"' where escape '\\' = charUtf8 '\\' <> charUtf8 '\\' escape '\"' = charUtf8 '\\' <> charUtf8 '\"' escape c = charUtf8 c
Note that the ASCII encoding is a subset of the UTF-8 encoding,
  which is why we can use the optimized function intDec to
  encode an Int as a decimal number with UTF-8 encoded digits.
Using intDec is more efficient than stringUtf8 . showString.
Avoiding this intermediate data structure significantly improves
  performance because encoding Cells is the core operation
  for rendering CSV-tables.
See Data.ByteString.Builder.Prim for further
  information on how to improve the performance of renderString.
We demonstrate our UTF-8 CSV encoding function on the following table.
strings :: [String] strings = ["hello", "\"1\"", "λ-wörld"] table :: Table table = [map StringC strings, map IntC [-3..3]]
The expression encodeUtf8CSV table results in the following lazy
ByteString.
Chunk "\"hello\",\"\\\"1\\\"\",\"\206\187-w\195\182rld\"\n-3,-2,-1,0,1,2,3\n" Empty
We can clearly see that we are converting to a binary format. The 'λ' and 'ö' characters, which have a Unicode codepoint above 127, are expanded to their corresponding UTF-8 multi-byte representation.
We use the criterion library (http://hackage.haskell.org/package/criterion)
  to benchmark the efficiency of our encoding function on the following table.
import Criterion.Main     -- add this import to the ones above
maxiTable :: Table
maxiTable = take 1000 $ cycle table
main :: IO ()
main = defaultMain
  [ bench "encodeUtf8CSV maxiTable (original)" $
      whnf (L.length . encodeUtf8CSV) maxiTable
  ]On a Core2 Duo 2.20GHz on a 32-bit Linux,
  the above code takes 1ms to generate the 22'500 bytes long lazy ByteString.
Looking again at the definitions above,
  we see that we took care to avoid intermediate data structures,
  as otherwise we would sacrifice performance.
For example,
  the following (arguably simpler) definition of renderRow is about 20% slower.
renderRow :: Row -> Builder renderRow = mconcat . intersperse (charUtf8 ',') . map renderCell
Similarly, using O(n) concatentations like ++ or the equivalent concat
  operations on strict and lazy ByteStrings should be avoided.
The following definition of renderString is also about 20% slower.
renderString :: String -> Builder
renderString cs = charUtf8 $ "\"" ++ concatMap escape cs ++ "\""
  where
    escape '\\' = "\\"
    escape '\"' = "\\\""
    escape c    = return cApart from removing intermediate data-structures, encodings can be optimized further by fine-tuning their execution parameters using the functions in Data.ByteString.Builder.Extra and their "inner loops" using the functions in Data.ByteString.Builder.Prim.
Synopsis
- data Builder
- toLazyByteString :: Builder -> ByteString
- hPutBuilder :: Handle -> Builder -> IO ()
- byteString :: ByteString -> Builder
- lazyByteString :: ByteString -> Builder
- shortByteString :: ShortByteString -> Builder
- int8 :: Int8 -> Builder
- word8 :: Word8 -> Builder
- int16BE :: Int16 -> Builder
- int32BE :: Int32 -> Builder
- int64BE :: Int64 -> Builder
- word16BE :: Word16 -> Builder
- word32BE :: Word32 -> Builder
- word64BE :: Word64 -> Builder
- floatBE :: Float -> Builder
- doubleBE :: Double -> Builder
- int16LE :: Int16 -> Builder
- int32LE :: Int32 -> Builder
- int64LE :: Int64 -> Builder
- word16LE :: Word16 -> Builder
- word32LE :: Word32 -> Builder
- word64LE :: Word64 -> Builder
- floatLE :: Float -> Builder
- doubleLE :: Double -> Builder
- char7 :: Char -> Builder
- string7 :: String -> Builder
- char8 :: Char -> Builder
- string8 :: String -> Builder
- charUtf8 :: Char -> Builder
- stringUtf8 :: String -> Builder
- int8Dec :: Int8 -> Builder
- int16Dec :: Int16 -> Builder
- int32Dec :: Int32 -> Builder
- int64Dec :: Int64 -> Builder
- intDec :: Int -> Builder
- integerDec :: Integer -> Builder
- word8Dec :: Word8 -> Builder
- word16Dec :: Word16 -> Builder
- word32Dec :: Word32 -> Builder
- word64Dec :: Word64 -> Builder
- wordDec :: Word -> Builder
- floatDec :: Float -> Builder
- doubleDec :: Double -> Builder
- word8Hex :: Word8 -> Builder
- word16Hex :: Word16 -> Builder
- word32Hex :: Word32 -> Builder
- word64Hex :: Word64 -> Builder
- wordHex :: Word -> Builder
- int8HexFixed :: Int8 -> Builder
- int16HexFixed :: Int16 -> Builder
- int32HexFixed :: Int32 -> Builder
- int64HexFixed :: Int64 -> Builder
- word8HexFixed :: Word8 -> Builder
- word16HexFixed :: Word16 -> Builder
- word32HexFixed :: Word32 -> Builder
- word64HexFixed :: Word64 -> Builder
- floatHexFixed :: Float -> Builder
- doubleHexFixed :: Double -> Builder
- byteStringHex :: ByteString -> Builder
- lazyByteStringHex :: ByteString -> Builder
The Builder type
Builders denote sequences of bytes.
 They are Monoids where
   mempty is the zero-length sequence and
   mappend is concatenation, which runs in O(1).
Executing Builders
Internally, Builders are buffer-filling functions. They are
 executed by a driver that provides them with an actual buffer to
 fill. Once called with a buffer, a Builder fills it and returns a
 signal to the driver telling it that it is either done, has filled the
 current buffer, or wants to directly insert a reference to a chunk of
 memory. In the last two cases, the Builder also returns a
 continutation Builder that the driver can call to fill the next
 buffer. Here, we provide the two drivers that satisfy almost all use
 cases. See Data.ByteString.Builder.Extra, for information
 about fine-tuning them.
toLazyByteString :: Builder -> ByteString Source #
Execute a Builder and return the generated chunks as a lazy ByteString.
 The work is performed lazy, i.e., only when a chunk of the lazy ByteString
 is forced.
hPutBuilder :: Handle -> Builder -> IO () Source #
Output a Builder to a Handle.
 The Builder is executed directly on the buffer of the Handle. If the
 buffer is too small (or not present), then it is replaced with a large
 enough buffer.
It is recommended that the Handle is set to binary and
 BlockBuffering mode. See hSetBinaryMode and
 hSetBuffering.
This function is more efficient than hPut .  because in
 many cases no buffer allocation has to be done. Moreover, the results of
 several executions of short toLazyByteStringBuilders are concatenated in the Handles
 buffer, therefore avoiding unnecessary buffer flushes.
Creating Builders
Binary encodings
byteString :: ByteString -> Builder Source #
Create a Builder denoting the same sequence of bytes as a strict
 ByteString.
 The Builder inserts large ByteStrings directly, but copies small ones
 to ensure that the generated chunks are large on average.
lazyByteString :: ByteString -> Builder Source #
Create a Builder denoting the same sequence of bytes as a lazy
 ByteString.
 The Builder inserts large chunks of the lazy ByteString directly,
 but copies small ones to ensure that the generated chunks are large on
 average.
shortByteString :: ShortByteString -> Builder Source #
Construct a Builder that copies the ShortByteString.
Big-endian
Little-endian
Character encodings
ASCII (Char7)
The ASCII encoding is a 7-bit encoding. The Char7 encoding implemented here works by truncating the Unicode codepoint to 7-bits, prefixing it with a leading 0, and encoding the resulting 8-bits as a single byte. For the codepoints 0-127 this corresponds the ASCII encoding.
ISO/IEC 8859-1 (Char8)
The ISO/IEC 8859-1 encoding is an 8-bit encoding often known as Latin-1. The Char8 encoding implemented here works by truncating the Unicode codepoint to 8-bits and encoding them as a single byte. For the codepoints 0-255 this corresponds to the ISO/IEC 8859-1 encoding.
UTF-8
The UTF-8 encoding can encode all Unicode codepoints. We recommend
 using it always for encoding Chars and Strings unless an application
 really requires another encoding.
Formatting numbers as text
Formatting of numbers as ASCII text.
Note that you can also use these functions for the ISO/IEC 8859-1 and UTF-8 encodings, as the ASCII encoding is equivalent on the codepoints 0-127.
Decimal numbers
Decimal encoding of numbers using ASCII encoded characters.
int8Dec :: Int8 -> Builder Source #
Decimal encoding of an Int8 using the ASCII digits.
e.g.
toLazyByteString (int8Dec 42) = "42" toLazyByteString (int8Dec (-1)) = "-1"
Hexadecimal numbers
Encoding positive integers as hexadecimal numbers using lower-case ASCII characters. The shortest possible representation is used. For example,
>>>toLazyByteString (word16Hex 0x0a10)Chunk "a10" Empty
Note that there is no support for using upper-case characters. Please contact the maintainer, if your application cannot work without hexadecimal encodings that use upper-case characters.
word8Hex :: Word8 -> Builder Source #
Shortest hexadecimal encoding of a Word8 using lower-case characters.
word16Hex :: Word16 -> Builder Source #
Shortest hexadecimal encoding of a Word16 using lower-case characters.
word32Hex :: Word32 -> Builder Source #
Shortest hexadecimal encoding of a Word32 using lower-case characters.
word64Hex :: Word64 -> Builder Source #
Shortest hexadecimal encoding of a Word64 using lower-case characters.
wordHex :: Word -> Builder Source #
Shortest hexadecimal encoding of a Word using lower-case characters.
Fixed-width hexadecimal numbers
byteStringHex :: ByteString -> Builder Source #
Encode each byte of a ByteString using its fixed-width hex encoding.
lazyByteStringHex :: ByteString -> Builder Source #
Encode each byte of a lazy ByteString using its fixed-width hex encoding.