arx-0.3.0: Archive execution tool.

Safe HaskellNone
LanguageHaskell98

System.Posix.ARX.HEREDat

Description

Utilities for encoding arbitrary data as Bourne shell fragments that stream the data to standard output, using HERE documents and simple shell decoders.

Synopsis

Documentation

data Chunk Source #

A chunk describes a block of binary data ready for inclusion in a shell script. For many data blocks, no encoding or decoding is necessary; these are stored in a SafeChunk. Those blocks needing byte-translation are stored in an EncodedChunk.

chunk :: ByteString -> Chunk Source #

Converts a ByteString into a string safe for inclusion in a shell HERE document and annotates with information to construct a shell decoder for that document, if necessary.

A ByteString with nulls is rewritten in a complicated way. Two escape characters are chosen from a class of ASCII printable characters that look like reasonable escape characters; the two that show up least frequently in the document (including 0 times) become the null replacer and the escaper. All instances of these two characters are rewritten to escape sequences formed with the escaper, while nulls are rewritten to the null replacer. Given the two characters thus chosen, a command line with tr and sed in sequence can be constructed to decode the document.

This encoding doubles the amount of space consumed by the escape characters. In the worst case, where the data is made of all 20 potential escapes, evenly distributed, and one null (so we can't punt on escaping), the data will grow in size by 10 percent. For data that is more evenly distributed over the bytes -- as we might expect of compressed tarballs -- we expect a size growth of two 256ths, or less than 0.8 percent.

encode :: Word8 -> Word8 -> ByteString -> ByteString Source #

Given a byte to replace nulls and an escape byte, rewrites the data such that nulls are mapped to the replace byte, replace bytes are mapped to a pair of escape bytes and the escape byte is is mapped to an escape byte followed by an underscore. For example, if the null replace byte is ! and the escape byte is # then all nulls become !, any ! become ## and all # become #_.

This escaping scheme is dictated by the needs of our Sed decoder, which is just two global substitions, one after another. If the escaping were such that, with our characters above, # escaped to ## and ! to #_, then #_ in the input becomes ##_. We want to run the subsitution for # first, to catch this; it produces #_; then Sed feeds the input to the second substitution which unfortunately renders !. In the alternate scheme, the input is encoded #__, the ! decoder runs first and ignores it, then the # decoder runs and catches it. When using a pipeline of stream processors to interpret escape sequences, it seems best to ensure that only the very last processor inserts escape characters, to prevent their further interpretation.

decode :: Word8 -> Word8 -> ByteString -> ByteString Source #

Given the byte used to replace nulls and the escape byte, undoes the result of the encode operation -- rewriting null replacers to literal nulls and escape patterns to the original bytes. This function is not intended to be used in practice -- it will be shell commands that unpack the data -- but serves to document the ideas behind decoding as well as offering a way to check the encoder.

escapes :: [EscapeChar] Source #

The candidate escape characters, with the forms to be used in constructed tr and sed commands.

safeForHereDoc :: ByteString -> Bool Source #

Many binary strings can be embedded as-is in a HEREDOC, without escaping.

encoded :: Chunk -> Bool Source #

Predicate to determine whether data is represented as an encoded chunk or is unencoded.