Copyright	(c) 2010 - 2011 Simon Meier
License	BSD3-style (see LICENSE)
Maintainer	Simon Meier <iridcode@gmail.com>
Stability	unstable, private
Portability	GHC
Safe Haskell	Unsafe
Language	Haskell2010

Data.ByteString.Builder.Internal

Contents

Buffer management
Build signals and steps
The Builder monoid
- Primitive combinators
- Execution
The Put monad
- Execution
- Conversion to and from Builders

Description

Warning:* this module is internal. If you find that you need it then please contact the maintainers and explain what you are trying to do and discuss what you would need in the public API. It is important that you do this as the module may not be exposed at all in future releases.

Core types and functions for the Builder monoid and its generalization, the Put monad.

The design of the Builder monoid is optimized such that

buffers of arbitrary size can be filled as efficiently as possible and
sequencing of Builders is as cheap as possible.

We achieve (1) by completely handing over control over writing to the buffer to the BuildStep implementing the Builder. This BuildStep is just told the start and the end of the buffer (represented as a BufferRange). Then, the BuildStep can write to as big a prefix of this BufferRange in any way it desires. If the BuildStep is done, the BufferRange is full, or a long sequence of bytes should be inserted directly, then the BuildStep signals this to its caller using a BuildSignal.

We achieve (2) by requiring that every Builder is implemented by a BuildStep that takes a continuation BuildStep, which it calls with the updated BufferRange after it is done. Therefore, only two pointers have to be passed in a function call to implement concatenation of Builders. Moreover, many Builders are completely inlined, which enables the compiler to sequence them without a function call and with no boxing at all.

This design gives the implementation of a Builder full access to the IO monad. Therefore, utmost care has to be taken to not overwrite anything outside the given BufferRanges. Moreover, further care has to be taken to ensure that Builders and Puts are referentially transparent. See the comments of the builder and put functions for further information. Note that there are no safety belts at all, when implementing a Builder using an IO action: you are writing code that might enable the next buffer-overflow attack on a Haskell server!

Synopsis

data Buffer = Buffer !(ForeignPtr Word8) !BufferRange
data BufferRange = BufferRange !(Ptr Word8) !(Ptr Word8)
newBuffer :: Int -> IO Buffer
bufferSize :: Buffer -> Int
byteStringFromBuffer :: Buffer -> ByteString
data ChunkIOStream a
- = Finished Buffer a
- | Yield1 ByteString (IO (ChunkIOStream a))
buildStepToCIOS :: AllocationStrategy -> BuildStep a -> IO (ChunkIOStream a)
ciosUnitToLazyByteString :: AllocationStrategy -> ByteString -> ChunkIOStream () -> ByteString
ciosToLazyByteString :: AllocationStrategy -> (a -> (b, ByteString)) -> ChunkIOStream a -> (b, ByteString)
data BuildSignal a
type BuildStep a = BufferRange -> IO (BuildSignal a)
finalBuildStep :: BuildStep ()
done :: Ptr Word8 -> a -> BuildSignal a
bufferFull :: Int -> Ptr Word8 -> BuildStep a -> BuildSignal a
insertChunk :: Ptr Word8 -> ByteString -> BuildStep a -> BuildSignal a
fillWithBuildStep :: BuildStep a -> (Ptr Word8 -> a -> IO b) -> (Ptr Word8 -> Int -> BuildStep a -> IO b) -> (Ptr Word8 -> ByteString -> BuildStep a -> IO b) -> BufferRange -> IO b
data Builder
builder :: (forall r. BuildStep r -> BuildStep r) -> Builder
runBuilder :: Builder -> BuildStep ()
runBuilderWith :: Builder -> BuildStep a -> BuildStep a
empty :: Builder
append :: Builder -> Builder -> Builder
flush :: Builder
ensureFree :: Int -> Builder
byteStringCopy :: ByteString -> Builder
byteStringInsert :: ByteString -> Builder
byteStringThreshold :: Int -> ByteString -> Builder
lazyByteStringCopy :: ByteString -> Builder
lazyByteStringInsert :: ByteString -> Builder
lazyByteStringThreshold :: Int -> ByteString -> Builder
shortByteString :: ShortByteString -> Builder
maximalCopySize :: Int
byteString :: ByteString -> Builder
lazyByteString :: ByteString -> Builder
toLazyByteStringWith :: AllocationStrategy -> ByteString -> Builder -> ByteString
data AllocationStrategy
safeStrategy :: Int -> Int -> AllocationStrategy
untrimmedStrategy :: Int -> Int -> AllocationStrategy
customStrategy :: (Maybe (Buffer, Int) -> IO Buffer) -> Int -> (Int -> Int -> Bool) -> AllocationStrategy
smallChunkSize :: Int
defaultChunkSize :: Int
chunkOverhead :: Int
data Put a
put :: (forall r. (a -> BuildStep r) -> BuildStep r) -> Put a
runPut :: Put a -> BuildStep a
putToLazyByteString :: Put a -> (a, ByteString)
putToLazyByteStringWith :: AllocationStrategy -> (a -> (b, ByteString)) -> Put a -> (b, ByteString)
hPut :: forall a. Handle -> Put a -> IO a
putBuilder :: Builder -> Put ()
fromPut :: Put () -> Builder

Buffer management

data Buffer Source #

A Buffer together with the BufferRange of free bytes. The filled space starts at offset 0 and ends at the first free byte.

Constructors

Buffer !(ForeignPtr Word8) !BufferRange

data BufferRange Source #

A range of bytes in a buffer represented by the pointer to the first byte of the range and the pointer to the first byte after the range.

Constructors

BufferRange !(Ptr Word8) !(Ptr Word8)

newBuffer :: Int -> IO Buffer Source #

Allocate a new buffer of the given size.

bufferSize :: Buffer -> Int Source #

Combined size of the filled and free space in the buffer.

byteStringFromBuffer :: Buffer -> ByteString Source #

Convert the filled part of a Buffer to a strict ByteString.

data ChunkIOStream a Source #

A stream of chunks that are constructed in the IO monad.

This datatype serves as the common interface for the buffer-by-buffer execution of a BuildStep by buildStepToCIOS. Typical users of this interface are ciosToLazyByteString or iteratee-style libraries like enumerator.

Constructors

Finished Buffer a	The partially filled last buffer together with the result.
Yield1 ByteString (IO (ChunkIOStream a))	Yield a non-empty strict `ByteString`.

buildStepToCIOS Source #

Arguments

:: AllocationStrategy	Buffer allocation strategy to use
-> BuildStep a	`BuildStep` to execute
-> IO (ChunkIOStream a)

Convert a BuildStep to a ChunkIOStream stream by executing it on Buffers allocated according to the given AllocationStrategy.

ciosUnitToLazyByteString :: AllocationStrategy -> ByteString -> ChunkIOStream () -> ByteString Source #

Convert a ChunkIOStream () to a lazy ByteString using unsafeDupablePerformIO.

ciosToLazyByteString :: AllocationStrategy -> (a -> (b, ByteString)) -> ChunkIOStream a -> (b, ByteString) Source #

Convert a ChunkIOStream to a lazy tuple of the result and the written ByteString using unsafeDupablePerformIO.

Build signals and steps

data BuildSignal a Source #

BuildSignals abstract signals to the caller of a BuildStep. There are three signals: done, bufferFull, or 'insertChunks signals

type BuildStep a = BufferRange -> IO (BuildSignal a) Source #

BuildSteps may be called *multiple times* and they must not rise an async. exception.

finalBuildStep :: BuildStep () Source #

The final build step that returns the done signal.

done Source #

Arguments

:: Ptr Word8	Next free byte in current `BufferRange`
-> a	Computed value
-> BuildSignal a

Signal that the current BuildStep is done and has computed a value.

bufferFull Source #

Arguments

:: Int	Minimal size of next `BufferRange`.
-> Ptr Word8	Next free byte in current `BufferRange`.
-> BuildStep a	`BuildStep` to run on the next `BufferRange`. This `BuildStep` may assume that it is called with a `BufferRange` of at least the required minimal size; i.e., the caller of this `BuildStep` must guarantee this.
-> BuildSignal a

Signal that the current buffer is full.

insertChunk Source #

Arguments

:: Ptr Word8	Next free byte in current `BufferRange`
-> ByteString	Chunk to insert.
-> BuildStep a	`BuildStep` to run on next `BufferRange`
-> BuildSignal a

Signal that a ByteString chunk should be inserted directly.

fillWithBuildStep Source #

Arguments

:: BuildStep a	Build step to use for filling the `BufferRange`.
-> (Ptr Word8 -> a -> IO b)	Handling the `done` signal
-> (Ptr Word8 -> Int -> BuildStep a -> IO b)	Handling the `bufferFull` signal
-> (Ptr Word8 -> ByteString -> BuildStep a -> IO b)	Handling the `insertChunk` signal
-> BufferRange	Buffer range to fill.
-> IO b	Value computed while filling this `BufferRange`.

Fill a BufferRange using a BuildStep.

The Builder monoid

data Builder Source #

Builders denote sequences of bytes. They are Monoids where mempty is the zero-length sequence and mappend is concatenation, which runs in O(1).

Instances

Instances details

Show Builder Source #	Since: 0.11.1.0
Instance details Defined in Data.ByteString.Builder Methods showsPrec :: Int -> Builder -> ShowS # show :: Builder -> String # showList :: [Builder] -> ShowS #
IsString Builder Source #
Instance details Defined in Data.ByteString.Builder Methods fromString :: String -> Builder #
Semigroup Builder Source #
Instance details Defined in Data.ByteString.Builder.Internal Methods (<>) :: Builder -> Builder -> Builder # sconcat :: NonEmpty Builder -> Builder # stimes :: Integral b => b -> Builder -> Builder #
Monoid Builder Source #
Instance details Defined in Data.ByteString.Builder.Internal Methods mempty :: Builder # mappend :: Builder -> Builder -> Builder # mconcat :: [Builder] -> Builder #

builder Source #

Arguments

:: (forall r. BuildStep r -> BuildStep r)

A function that fills a BufferRange, calls the continuation with the updated BufferRange once its done, and signals its caller how to proceed using done, bufferFull, or insertChunk.

This function must be referentially transparent; i.e., calling it multiple times with equally sized BufferRanges must result in the same sequence of bytes being written. If you need mutable state, then you must allocate it anew upon each call of this function. Moroever, this function must call the continuation once its done. Otherwise, concatenation of Builders does not work. Finally, this function must write to all bytes that it claims it has written. Otherwise, the resulting Builder is not guaranteed to be referentially transparent and sensitive data might leak.

-> Builder

Construct a Builder. In contrast to BuildSteps, Builders are referentially transparent.

runBuilder Source #

Arguments

:: Builder	`Builder` to run
-> BuildStep ()	`BuildStep` that writes the byte stream of this `Builder` and signals `done` upon completion.

Run a Builder with the finalBuildStep.

runBuilderWith Source #

Arguments

:: Builder	`Builder` to run
-> BuildStep a	Continuation `BuildStep`
-> BuildStep a

Run a Builder.

Primitive combinators

empty :: Builder Source #

The Builder denoting a zero-length sequence of bytes. This function is only exported for use in rewriting rules. Use mempty otherwise.

append :: Builder -> Builder -> Builder Source #

Concatenate two Builders. This function is only exported for use in rewriting rules. Use mappend otherwise.

flush :: Builder Source #

Flush the current buffer. This introduces a chunk boundary.

ensureFree :: Int -> Builder Source #

ensureFree n ensures that there are at least n free bytes for the following Builder.

byteStringCopy :: ByteString -> Builder Source #

Construct a Builder that copies the strict ByteString.

Use this function to create Builders from smallish (<= 4kb) ByteStrings or if you need to guarantee that the ByteString is not shared with the chunks generated by the Builder.

byteStringInsert :: ByteString -> Builder Source #

Construct a Builder that always inserts the strict ByteString directly as a chunk.

This implies flushing the output buffer, even if it contains just a single byte. You should therefore use byteStringInsert only for large (> 8kb) ByteStrings. Otherwise, the generated chunks are too fragmented to be processed efficiently afterwards.

byteStringThreshold :: Int -> ByteString -> Builder Source #

Construct a Builder that copies the strict ByteStrings, if it is smaller than the treshold, and inserts it directly otherwise.

For example, byteStringThreshold 1024 copies strict ByteStrings whose size is less or equal to 1kb, and inserts them directly otherwise. This implies that the average chunk-size of the generated lazy ByteString may be as low as 513 bytes, as there could always be just a single byte between the directly inserted 1025 byte, strict ByteStrings.

lazyByteStringCopy :: ByteString -> Builder Source #

Construct a Builder that copies the lazy ByteString.

lazyByteStringInsert :: ByteString -> Builder Source #

Construct a Builder that inserts all chunks of the lazy ByteString directly.

lazyByteStringThreshold :: Int -> ByteString -> Builder Source #

Construct a Builder that uses the thresholding strategy of byteStringThreshold for each chunk of the lazy ByteString.

shortByteString :: ShortByteString -> Builder Source #

Construct a Builder that copies the ShortByteString.

maximalCopySize :: Int Source #

The maximal size of a ByteString that is copied. 2 * smallChunkSize to guarantee that on average a chunk is of smallChunkSize.

byteString :: ByteString -> Builder Source #

Create a Builder denoting the same sequence of bytes as a strict ByteString. The Builder inserts large ByteStrings directly, but copies small ones to ensure that the generated chunks are large on average.

lazyByteString :: ByteString -> Builder Source #

Create a Builder denoting the same sequence of bytes as a lazy ByteString. The Builder inserts large chunks of the lazy ByteString directly, but copies small ones to ensure that the generated chunks are large on average.

Execution

toLazyByteStringWith Source #

Arguments

:: AllocationStrategy	Buffer allocation strategy to use
-> ByteString	Lazy `ByteString` to use as the tail of the generated lazy `ByteString`
-> Builder	`Builder` to execute
-> ByteString	Resulting lazy `ByteString`

Heavy inlining. Execute a Builder with custom execution parameters.

This function is inlined despite its heavy code-size to allow fusing with the allocation strategy. For example, the default Builder execution function toLazyByteString is defined as follows.

{-# NOINLINE toLazyByteString #-}
toLazyByteString =
  toLazyByteStringWith (safeStrategy smallChunkSize defaultChunkSize) L.empty

where L.empty is the zero-length lazy ByteString.

In most cases, the parameters used by toLazyByteString give good performance. A sub-performing case of toLazyByteString is executing short (<128 bytes) Builders. In this case, the allocation overhead for the first 4kb buffer and the trimming cost dominate the cost of executing the Builder. You can avoid this problem using

toLazyByteStringWith (safeStrategy 128 smallChunkSize) L.empty

This reduces the allocation and trimming overhead, as all generated ByteStrings fit into the first buffer and there is no trimming required, if more than 64 bytes and less than 128 bytes are written.

data AllocationStrategy Source #

A buffer allocation strategy for executing Builders.

safeStrategy Source #

Arguments

:: Int	Size of first buffer
-> Int	Size of successive buffers
-> AllocationStrategy	An allocation strategy that guarantees that at least half of the allocated memory is used for live data

Use this strategy for generating lazy ByteStrings whose chunks are likely to survive one garbage collection. This strategy trims buffers that are filled less than half in order to avoid spilling too much memory.

untrimmedStrategy Source #

Arguments

:: Int	Size of the first buffer
-> Int	Size of successive buffers
-> AllocationStrategy	An allocation strategy that does not trim any of the filled buffers before converting it to a chunk

Use this strategy for generating lazy ByteStrings whose chunks are discarded right after they are generated. For example, if you just generate them to write them to a network socket.

customStrategy Source #

Arguments

:: (Maybe (Buffer, Int) -> IO Buffer)	Buffer allocation function. If `Nothing` is given, then a new first buffer should be allocated. If `Just (oldBuf, minSize)` is given, then a buffer with minimal size `minSize` must be returned. The strategy may reuse the `oldBuf`, if it can guarantee that this referentially transparent and `oldBuf` is large enough.
-> Int	Default buffer size.
-> (Int -> Int -> Bool)	A predicate `trim used allocated` returning `True`, if the buffer should be trimmed before it is returned.
-> AllocationStrategy

Create a custom allocation strategy. See the code for safeStrategy and untrimmedStrategy for examples.

smallChunkSize :: Int Source #

The recommended chunk size. Currently set to 4k, less the memory management overhead

defaultChunkSize :: Int Source #

The chunk size used for I/O. Currently set to 32k, less the memory management overhead

chunkOverhead :: Int Source #

The memory management overhead. Currently this is tuned for GHC only.

The Put monad

data Put a Source #

A Put action denotes a computation of a value that writes a stream of bytes as a side-effect. Puts are strict in their side-effect; i.e., the stream of bytes will always be written before the computed value is returned.

Puts are a generalization of Builders. The typical use case is the implementation of an encoding that might fail (e.g., an interface to the zlib compression library or the conversion from Base64 encoded data to 8-bit data). For a Builder, the only way to handle and report such a failure is ignore it or call error. In contrast, Put actions are expressive enough to allow reportng and handling such a failure in a pure fashion.

Put () actions are isomorphic to Builders. The functions putBuilder and fromPut convert between these two types. Where possible, you should use Builders, as sequencing them is slightly cheaper than sequencing Puts because they do not carry around a computed value.

Instances

Instances details

Monad Put Source #
Instance details Defined in Data.ByteString.Builder.Internal Methods (>>=) :: Put a -> (a -> Put b) -> Put b # (>>) :: Put a -> Put b -> Put b # return :: a -> Put a #
Functor Put Source #
Instance details Defined in Data.ByteString.Builder.Internal Methods fmap :: (a -> b) -> Put a -> Put b # (<$) :: a -> Put b -> Put a #
Applicative Put Source #
Instance details Defined in Data.ByteString.Builder.Internal Methods pure :: a -> Put a # (<>) :: Put (a -> b) -> Put a -> Put b # liftA2 :: (a -> b -> c) -> Put a -> Put b -> Put c # (>) :: Put a -> Put b -> Put b # (<*) :: Put a -> Put b -> Put a #

put Source #

Arguments

:: (forall r. (a -> BuildStep r) -> BuildStep r)

A function that fills a BufferRange, calls the continuation with the updated BufferRange and its computed value once its done, and signals its caller how to proceed using done, bufferFull, or insertChunk signals.

This function must be referentially transparent; i.e., calling it multiple times with equally sized BufferRanges must result in the same sequence of bytes being written and the same value being computed. If you need mutable state, then you must allocate it anew upon each call of this function. Moroever, this function must call the continuation once its done. Otherwise, monadic sequencing of Puts does not work. Finally, this function must write to all bytes that it claims it has written. Otherwise, the resulting Put is not guaranteed to be referentially transparent and sensitive data might leak.

-> Put a

Construct a Put action. In contrast to BuildSteps, Puts are referentially transparent in the sense that sequencing the same Put multiple times yields every time the same value with the same side-effect.

runPut Source #

Arguments

:: Put a	Put to run
-> BuildStep a	`BuildStep` that first writes the byte stream of this `Put` and then yields the computed value using the `done` signal.

Run a Put.

Execution

putToLazyByteString Source #

Arguments

:: Put a	`Put` to execute
-> (a, ByteString)	Result and lazy `ByteString` written as its side-effect

Execute a Put and return the computed result and the bytes written during the computation as a lazy ByteString.

This function is strict in the computed result and lazy in the writing of the bytes. For example, given

infinitePut = sequence_ (repeat (putBuilder (word8 1))) >> return 0

evaluating the expression

fst $ putToLazyByteString infinitePut

does not terminate, while evaluating the expression

L.head $ snd $ putToLazyByteString infinitePut

does terminate and yields the value 1 :: Word8.

An illustrative example for these strictness properties is the implementation of Base64 decoding (http://en.wikipedia.org/wiki/Base64).

type DecodingState = ...

decodeBase64 :: ByteString -> DecodingState -> Put (Maybe DecodingState)
decodeBase64 = ...

The above function takes a strict ByteString supposed to represent Base64 encoded data and the current decoding state. It writes the decoded bytes as the side-effect of the Put and returns the new decoding state, if the decoding of all data in the ByteString was successful. The checking if the strict ByteString represents Base64 encoded data and the actual decoding are fused. This makes the common case, where all data represents Base64 encoded data, more efficient. It also implies that all data must be decoded before the final decoding state can be returned. Puts are intended for implementing such fused checking and decoding/encoding, which is reflected in their strictness properties.

putToLazyByteStringWith Source #

Arguments

:: AllocationStrategy	Buffer allocation strategy to use
-> (a -> (b, ByteString))	Continuation to use for computing the final result and the tail of its side-effect (the written bytes).
-> Put a	`Put` to execute
-> (b, ByteString)	Resulting lazy `ByteString`

Execute a Put with a buffer-allocation strategy and a continuation. For example, putToLazyByteString is implemented as follows.

putToLazyByteString = putToLazyByteStringWith
    (safeStrategy smallChunkSize defaultChunkSize) (x -> (x, L.empty))

hPut :: forall a. Handle -> Put a -> IO a Source #

Run a Put action redirecting the produced output to a Handle.

The output is buffered using the Handles associated buffer. If this buffer is too small to execute one step of the Put action, then it is replaced with a large enough buffer.

Conversion to and from Builders

putBuilder :: Builder -> Put () Source #

Run a Builder as a side-effect of a Put () action.

fromPut :: Put () -> Builder Source #

Convert a Put () action to a Builder.