Copyright | (c) Don Stewart 2006 (c) Duncan Coutts 2006-2011 (c) Michael Thompson 2015 (c) Udo Stenzel 2018 |
---|---|
License | BSD-style |
Maintainer | u.stenzel@web.de |
Stability | experimental |
Portability | portable |
Safe Haskell | None |
Language | Haskell2010 |
See the simple examples of use here. We begin with a slight modification of the documentation to Data.ByteStream.Lazy:
A time and space-efficient implementation of effectful byte streams
using a stream of packed Word8
arrays, suitable for high performance
use, both in terms of large data quantities, or high speed
requirements. ByteStreams are encoded as streams of strict chunks
of bytes.
A key feature of ByteStreams is the means to manipulate large or unbounded streams of data without requiring the entire sequence to be resident in memory. To take advantage of this you have to write your functions in a streaming style, e.g. classic pipeline composition. The default I/O chunk size is 32k, which should be good in most circumstances.
Some operations, such as concat
, append
, reverse
and cons
, have
better complexity than their Data.ByteStream equivalents, due to
optimisations resulting from the list spine structure. For other
operations streaming, like lazy, ByteStreams are usually within a few percent of
strict ones.
This module is intended to be imported qualified
, to avoid name
clashes with Prelude functions. eg.
import qualified Bio.Streaming.Bytes as B
Original GHC implementation by Bryan O'Sullivan.
Rewritten to use UArray
by Simon Marlow.
Rewritten to support slices and use ForeignPtr
by David Roundy.
Rewritten again and extended by Don Stewart and Duncan Coutts.
Lazy variant by Duncan Coutts and Don Stewart.
Streaming variant by Michael Thompson, following the ideas of Gabriel Gonzales' pipes-bytestring
Adapted for use in biohazard by Udo Stenzel.
Synopsis
- data ByteStream m r
- = Empty r
- | Chunk !Bytes !Int64 (ByteStream m r)
- | Go (m (ByteStream m r))
- empty :: ByteStream m ()
- singleton :: Word8 -> ByteStream m ()
- fromLazy :: LazyBytes -> ByteStream m ()
- fromChunks :: Monad m => Stream (Of Bytes) m r -> ByteStream m r
- toLazy :: Monad m => ByteStream m r -> m (Of LazyBytes r)
- toStrict :: Monad m => ByteStream m r -> m (Of Bytes r)
- effects :: Monad m => ByteStream m r -> m r
- mwrap :: m (ByteStream m r) -> ByteStream m r
- cons :: Word8 -> ByteStream m r -> ByteStream m r
- nextByte :: Monad m => ByteStream m r -> m (Either r (Word8, ByteStream m r))
- nextByteOff :: Monad m => ByteStream m r -> m (Either r (Word8, Int64, ByteStream m r))
- break :: Monad m => (Word8 -> Bool) -> ByteStream m r -> ByteStream m (ByteStream m r)
- drop :: Monad m => Int64 -> ByteStream m r -> ByteStream m r
- dropWhile :: Monad m => (Word8 -> Bool) -> ByteStream m r -> ByteStream m r
- splitAt :: Monad m => Int64 -> ByteStream m r -> ByteStream m (ByteStream m r)
- splitAt' :: Monad m => Int -> ByteStream m r -> m (Of Bytes (ByteStream m r))
- trim :: Monad m => Int64 -> ByteStream m () -> ByteStream m ()
- lines :: Monad m => ByteStream m r -> Stream (ByteStream m) m r
- lines' :: Monad m => ByteStream m r -> Stream (Of Bytes) m r
- concat :: Monad m => Stream (ByteStream m) m r -> ByteStream m r
- toByteStream :: MonadIO m => Builder -> ByteStream m ()
- toByteStreamWith :: MonadIO m => AllocationStrategy -> Builder -> ByteStream m ()
- concatBuilders :: Stream (Of Builder) IO () -> Builder
- withOutputFile :: (MonadIO m, MonadMask m) => FilePath -> (Handle -> m a) -> m a
- writeFile :: (MonadIO m, MonadMask m) => FilePath -> ByteStream m r -> m r
- hGetContents :: MonadIO m => Handle -> ByteStream m ()
- hGetContentsN :: MonadIO m => Int -> Handle -> ByteStream m ()
- hPut :: MonadIO m => Handle -> ByteStream m r -> m r
- nextChunk :: Monad m => ByteStream m r -> m (Either r (Bytes, ByteStream m r))
- nextChunkOff :: Monad m => ByteStream m r -> m (Either r (Bytes, Int64, ByteStream m r))
- consChunk :: Bytes -> ByteStream m r -> ByteStream m r
- consChunkOff :: Bytes -> Int64 -> ByteStream m r -> ByteStream m r
- chunk :: Bytes -> ByteStream m ()
- copy :: Monad m => ByteStream m r -> ByteStream (ByteStream m) r
- mapChunksM_ :: Monad m => (Bytes -> m ()) -> ByteStream m r -> m r
- gzip :: MonadIO m => ByteStream m r -> ByteStream m r
- gunzip :: MonadIO m => ByteStream m r -> ByteStream m r
- gunzipWith :: MonadIO m => (ByteStream m r -> ByteStream m r) -> ByteStream m r -> ByteStream m r
The ByteStream
type
data ByteStream m r Source #
A space-efficient representation of a succession of Word8
vectors, supporting many
efficient operations.
An effectful ByteStream
contains 8-bit bytes, or by using certain
operations can be interpreted as containing 8-bit characters. It
also contains an offset, which will be needed to track the virtual
offsets in the BGZF decode.
Empty r | |
Chunk !Bytes !Int64 (ByteStream m r) | |
Go (m (ByteStream m r)) |
Instances
Introducing and eliminating ByteStream
s
empty :: ByteStream m () Source #
O(1) The empty ByteStream
-- i.e. return ()
Note that ByteStream m w
is
generally a monoid for monoidal values of w
, like ()
singleton :: Word8 -> ByteStream m () Source #
O(1) Yield a Word8
as a minimal ByteStream
fromLazy :: LazyBytes -> ByteStream m () Source #
O(c) Transmute a pseudo-pure lazy bytestring to its representation as a monadic stream of chunks.
>>>
Q.putStrLn $ Q.fromLazy "hi"
hi>>>
Q.fromLazy "hi"
Chunk "hi" (Empty (())) -- note: a 'show' instance works in the identity monad>>>
Q.fromLazy $ BL.fromChunks ["here", "are", "some", "chunks"]
Chunk "here" (Chunk "are" (Chunk "some" (Chunk "chunks" (Empty (())))))
fromChunks :: Monad m => Stream (Of Bytes) m r -> ByteStream m r Source #
O(c) Converts a stream of strict bytestrings into a byte stream.
toLazy :: Monad m => ByteStream m r -> m (Of LazyBytes r) Source #
O(n) Convert an effectful byte stream into a single lazy ByteStream
with the same internal chunk structure, retaining the original
return value.
This is the canonical way of breaking streaming (toStrict
and the
like are far more demonic). Essentially one is dividing the interleaved
layers of effects and bytes into one immense layer of effects,
followed by the memory of the succession of bytes.
Because one preserves the return value, toLazy
is a suitable argument
for mapped
B.mapped Q.toLazy :: Stream (ByteStream m) m r -> Stream (Of LazyBytes) m r
>>>
Q.toLazy "hello"
"hello" :> ()>>>
B.toListM $ traverses Q.toLazy $ Q.lines "one\ntwo\nthree\nfour\nfive\n"
["one","two","three","four","five",""] -- [LazyBytes]
toStrict :: Monad m => ByteStream m r -> m (Of Bytes r) Source #
O(n) Convert a monadic byte stream into a single strict ByteStream
,
retaining the return value of the original pair. This operation is
for use with mapped
.
mapped R.toStrict :: Monad m => Stream (ByteStream m) m r -> Stream (Of ByteStream) m r
It is subject to all the objections one makes to Data.ByteStream.Lazy toStrict
;
all of these are devastating.
effects :: Monad m => ByteStream m r -> m r Source #
Perform the effects contained in an effectful bytestring, ignoring the bytes.
mwrap :: m (ByteStream m r) -> ByteStream m r Source #
Reconceive an effect that results in an effectful bytestring as an effectful bytestring. Compare Streaming.mwrap. The closes equivalent of
>>>
Streaming.wrap :: f (Stream f m r) -> Stream f m r
is here consChunk
. mwrap
is the smart constructor for the internal Go
constructor.
Basic interface
cons :: Word8 -> ByteStream m r -> ByteStream m r Source #
O(1) cons
is analogous to '(:)' for lists.
nextByte :: Monad m => ByteStream m r -> m (Either r (Word8, ByteStream m r)) Source #
O(1) Extract the head and tail of a ByteStream
, or its return value
if it is empty. This is the 'natural' uncons for an effectful byte stream.
nextByteOff :: Monad m => ByteStream m r -> m (Either r (Word8, Int64, ByteStream m r)) Source #
Substrings
Breaking strings
break :: Monad m => (Word8 -> Bool) -> ByteStream m r -> ByteStream m (ByteStream m r) Source #
drop :: Monad m => Int64 -> ByteStream m r -> ByteStream m r Source #
dropWhile :: Monad m => (Word8 -> Bool) -> ByteStream m r -> ByteStream m r Source #
splitAt :: Monad m => Int64 -> ByteStream m r -> ByteStream m (ByteStream m r) Source #
splitAt' :: Monad m => Int -> ByteStream m r -> m (Of Bytes (ByteStream m r)) Source #
Strictly splits off a piece. This breaks streaming, so reserve its
use for small strings or when conversion to strict Bytes
is needed
anyway.
trim :: Monad m => Int64 -> ByteStream m () -> ByteStream m () Source #
Breaking into many substrings
lines :: Monad m => ByteStream m r -> Stream (ByteStream m) m r Source #
Turns a ByteStream into a connected stream of ByteStreams that
divide at newline characters. The resulting strings do not contain
newlines. This is the genuinely streaming lines
which only
breaks chunks, and thus never increases the use of memory.
Because ByteStream
s are usually read in binary mode, with no line
ending conversion, this function recognizes both \n
and \r\n
endings (regardless of the current platform).
lines' :: Monad m => ByteStream m r -> Stream (Of Bytes) m r Source #
Turns a ByteStream
into a stream of strict Bytes
that divide at
newline characters. The resulting strings do not contain newlines.
This will cost memory if the lines are very long.
Special folds
concat :: Monad m => Stream (ByteStream m) m r -> ByteStream m r Source #
O(n) Concatenate a stream of byte streams.
Builders
toByteStream :: MonadIO m => Builder -> ByteStream m () Source #
toByteStreamWith :: MonadIO m => AllocationStrategy -> Builder -> ByteStream m () Source #
Take a builder and convert it to a genuine streaming bytestring, using a specific allocation strategy.
I/O with ByteStream
s
Files
writeFile :: (MonadIO m, MonadMask m) => FilePath -> ByteStream m r -> m r Source #
Writes a ByteStream
to a file. Actually writes to a temporary
file and renames it on successful completion. The filename "-"
causes it to write to stdout instead.
I/O with Handles
hGetContents :: MonadIO m => Handle -> ByteStream m () Source #
Read entire handle contents lazily into a ByteStream
. Chunks
are read on demand, using the default chunk size.
Note: the Handle
should be placed in binary mode with
hSetBinaryMode
for hGetContents
to
work correctly.
hGetContentsN :: MonadIO m => Int -> Handle -> ByteStream m () Source #
Read entire handle contents lazily into a ByteStream
. Chunks
are read on demand, in at most k
-sized chunks. It does not block
waiting for a whole k
-sized chunk, so if less than k
bytes are
available then they will be returned immediately as a smaller chunk.
The handle is closed on EOF.
Note: the Handle
should be placed in binary mode with
hSetBinaryMode
for hGetContentsN
to
work correctly.
hPut :: MonadIO m => Handle -> ByteStream m r -> m r Source #
Outputs a ByteStream
to the specified Handle
.
Simple chunkwise operations
nextChunk :: Monad m => ByteStream m r -> m (Either r (Bytes, ByteStream m r)) Source #
nextChunkOff :: Monad m => ByteStream m r -> m (Either r (Bytes, Int64, ByteStream m r)) Source #
consChunk :: Bytes -> ByteStream m r -> ByteStream m r Source #
Smart constructor for Chunk
.
consChunkOff :: Bytes -> Int64 -> ByteStream m r -> ByteStream m r Source #
copy :: Monad m => ByteStream m r -> ByteStream (ByteStream m) r Source #
mapChunksM_ :: Monad m => (Bytes -> m ()) -> ByteStream m r -> m r Source #
compression support
gzip :: MonadIO m => ByteStream m r -> ByteStream m r Source #
Compresses a byte stream using GZip with default parameters.
gunzip :: MonadIO m => ByteStream m r -> ByteStream m r Source #
Decompresses GZip if present. If any GZip stream is found, all such streams are decompressed and any remaining data is discarded. Else, the input is returned unchanged. If the input is BGZF, the result will contain meaningful virtual offsets. If the input contains exactly one GZip stream, the result will have meaningfull offsets into the uncompressed data. Else, the offsets will be bogus.
gunzipWith :: MonadIO m => (ByteStream m r -> ByteStream m r) -> ByteStream m r -> ByteStream m r Source #
Checks if the input is GZip at all, and runs gunzip if it is. If
it isn't, it runs k
on the input.