biohazard-2.1: bioinformatics support library

Copyright(c) Don Stewart 2006
(c) Duncan Coutts 2006-2011
(c) Michael Thompson 2015
(c) Udo Stenzel 2018
LicenseBSD-style
Maintaineru.stenzel@web.de
Stabilityexperimental
Portabilityportable
Safe HaskellNone
LanguageHaskell2010

Bio.Streaming.Bytes

Contents

Description

See the simple examples of use here. We begin with a slight modification of the documentation to Data.ByteStream.Lazy:

A time and space-efficient implementation of effectful byte streams using a stream of packed Word8 arrays, suitable for high performance use, both in terms of large data quantities, or high speed requirements. ByteStreams are encoded as streams of strict chunks of bytes.

A key feature of ByteStreams is the means to manipulate large or unbounded streams of data without requiring the entire sequence to be resident in memory. To take advantage of this you have to write your functions in a streaming style, e.g. classic pipeline composition. The default I/O chunk size is 32k, which should be good in most circumstances.

Some operations, such as concat, append, reverse and cons, have better complexity than their Data.ByteStream equivalents, due to optimisations resulting from the list spine structure. For other operations streaming, like lazy, ByteStreams are usually within a few percent of strict ones.

This module is intended to be imported qualified, to avoid name clashes with Prelude functions. eg.

import qualified Bio.Streaming.Bytes as B

Original GHC implementation by Bryan O'Sullivan. Rewritten to use UArray by Simon Marlow. Rewritten to support slices and use ForeignPtr by David Roundy. Rewritten again and extended by Don Stewart and Duncan Coutts. Lazy variant by Duncan Coutts and Don Stewart. Streaming variant by Michael Thompson, following the ideas of Gabriel Gonzales' pipes-bytestring Adapted for use in biohazard by Udo Stenzel.

Synopsis

The ByteStream type

data ByteStream m r Source #

A space-efficient representation of a succession of Word8 vectors, supporting many efficient operations.

An effectful ByteStream contains 8-bit bytes, or by using certain operations can be interpreted as containing 8-bit characters. It also contains an offset, which will be needed to track the virtual offsets in the BGZF decode.

Constructors

Empty r 
Chunk !Bytes !Int64 (ByteStream m r) 
Go (m (ByteStream m r)) 
Instances
MonadTrans ByteStream Source # 
Instance details

Defined in Bio.Streaming.Bytes

Methods

lift :: Monad m => m a -> ByteStream m a #

Monad m => Monad (ByteStream m) Source # 
Instance details

Defined in Bio.Streaming.Bytes

Methods

(>>=) :: ByteStream m a -> (a -> ByteStream m b) -> ByteStream m b #

(>>) :: ByteStream m a -> ByteStream m b -> ByteStream m b #

return :: a -> ByteStream m a #

fail :: String -> ByteStream m a #

Monad m => Functor (ByteStream m) Source # 
Instance details

Defined in Bio.Streaming.Bytes

Methods

fmap :: (a -> b) -> ByteStream m a -> ByteStream m b #

(<$) :: a -> ByteStream m b -> ByteStream m a #

Monad m => Applicative (ByteStream m) Source # 
Instance details

Defined in Bio.Streaming.Bytes

Methods

pure :: a -> ByteStream m a #

(<*>) :: ByteStream m (a -> b) -> ByteStream m a -> ByteStream m b #

liftA2 :: (a -> b -> c) -> ByteStream m a -> ByteStream m b -> ByteStream m c #

(*>) :: ByteStream m a -> ByteStream m b -> ByteStream m b #

(<*) :: ByteStream m a -> ByteStream m b -> ByteStream m a #

MonadIO m => MonadIO (ByteStream m) Source # 
Instance details

Defined in Bio.Streaming.Bytes

Methods

liftIO :: IO a -> ByteStream m a #

MFunctor ByteStream Source # 
Instance details

Defined in Bio.Streaming.Bytes

Methods

hoist :: Monad m => (forall a. m a -> n a) -> ByteStream m b -> ByteStream n b #

(m ~ Identity, Show r) => Show (ByteStream m r) Source # 
Instance details

Defined in Bio.Streaming.Bytes

Methods

showsPrec :: Int -> ByteStream m r -> ShowS #

show :: ByteStream m r -> String #

showList :: [ByteStream m r] -> ShowS #

r ~ () => IsString (ByteStream m r) Source # 
Instance details

Defined in Bio.Streaming.Bytes

Methods

fromString :: String -> ByteStream m r #

(Semigroup r, Monad m) => Semigroup (ByteStream m r) Source # 
Instance details

Defined in Bio.Streaming.Bytes

Methods

(<>) :: ByteStream m r -> ByteStream m r -> ByteStream m r #

sconcat :: NonEmpty (ByteStream m r) -> ByteStream m r #

stimes :: Integral b => b -> ByteStream m r -> ByteStream m r #

(Semigroup r, Monoid r, Monad m) => Monoid (ByteStream m r) Source # 
Instance details

Defined in Bio.Streaming.Bytes

Methods

mempty :: ByteStream m r #

mappend :: ByteStream m r -> ByteStream m r -> ByteStream m r #

mconcat :: [ByteStream m r] -> ByteStream m r #

Introducing and eliminating ByteStreams

empty :: ByteStream m () Source #

O(1) The empty ByteStream -- i.e. return () Note that ByteStream m w is generally a monoid for monoidal values of w, like ()

singleton :: Word8 -> ByteStream m () Source #

O(1) Yield a Word8 as a minimal ByteStream

fromLazy :: LazyBytes -> ByteStream m () Source #

O(c) Transmute a pseudo-pure lazy bytestring to its representation as a monadic stream of chunks.

>>> Q.putStrLn $ Q.fromLazy "hi"
hi
>>> Q.fromLazy "hi"
Chunk "hi" (Empty (()))  -- note: a 'show' instance works in the identity monad
>>> Q.fromLazy $ BL.fromChunks ["here", "are", "some", "chunks"]
Chunk "here" (Chunk "are" (Chunk "some" (Chunk "chunks" (Empty (())))))

fromChunks :: Monad m => Stream (Of Bytes) m r -> ByteStream m r Source #

O(c) Converts a stream of strict bytestrings into a byte stream.

toLazy :: Monad m => ByteStream m r -> m (Of LazyBytes r) Source #

O(n) Convert an effectful byte stream into a single lazy ByteStream with the same internal chunk structure, retaining the original return value.

This is the canonical way of breaking streaming (toStrict and the like are far more demonic). Essentially one is dividing the interleaved layers of effects and bytes into one immense layer of effects, followed by the memory of the succession of bytes.

Because one preserves the return value, toLazy is a suitable argument for mapped

  B.mapped Q.toLazy :: Stream (ByteStream m) m r -> Stream (Of LazyBytes) m r
>>> Q.toLazy "hello"
"hello" :> ()
>>> B.toListM $ traverses Q.toLazy $ Q.lines "one\ntwo\nthree\nfour\nfive\n"
["one","two","three","four","five",""]  -- [LazyBytes]

toStrict :: Monad m => ByteStream m r -> m (Of Bytes r) Source #

O(n) Convert a monadic byte stream into a single strict ByteStream, retaining the return value of the original pair. This operation is for use with mapped.

mapped R.toStrict :: Monad m => Stream (ByteStream m) m r -> Stream (Of ByteStream) m r

It is subject to all the objections one makes to Data.ByteStream.Lazy toStrict; all of these are devastating.

effects :: Monad m => ByteStream m r -> m r Source #

Perform the effects contained in an effectful bytestring, ignoring the bytes.

mwrap :: m (ByteStream m r) -> ByteStream m r Source #

Reconceive an effect that results in an effectful bytestring as an effectful bytestring. Compare Streaming.mwrap. The closes equivalent of

>>> Streaming.wrap :: f (Stream f m r) -> Stream f m r

is here consChunk. mwrap is the smart constructor for the internal Go constructor.

Basic interface

cons :: Word8 -> ByteStream m r -> ByteStream m r Source #

O(1) cons is analogous to '(:)' for lists.

nextByte :: Monad m => ByteStream m r -> m (Either r (Word8, ByteStream m r)) Source #

O(1) Extract the head and tail of a ByteStream, or its return value if it is empty. This is the 'natural' uncons for an effectful byte stream.

Substrings

Breaking strings

break :: Monad m => (Word8 -> Bool) -> ByteStream m r -> ByteStream m (ByteStream m r) Source #

break p is equivalent to span (not . p).

drop :: Monad m => Int64 -> ByteStream m r -> ByteStream m r Source #

O(n/c) drop n xs returns the suffix of xs after the first n elements, or [] if n > length xs.

>>> Q.putStrLn $ Q.drop 6 "Wisconsin"
sin
>>> Q.putStrLn $ Q.drop 16 "Wisconsin"
>>> 

dropWhile :: Monad m => (Word8 -> Bool) -> ByteStream m r -> ByteStream m r Source #

dropWhile p xs returns the suffix remaining after takeWhile p xs.

splitAt :: Monad m => Int64 -> ByteStream m r -> ByteStream m (ByteStream m r) Source #

O(n/c) splitAt n xs is equivalent to (take n xs, drop n xs).

>>> rest <- Q.putStrLn $ Q.splitAt 3 "therapist is a danger to good hyphenation, as Knuth notes"
the
>>> Q.putStrLn $ Q.splitAt 19 rest
rapist is a danger

splitAt' :: Monad m => Int -> ByteStream m r -> m (Of Bytes (ByteStream m r)) Source #

Strictly splits off a piece. This breaks streaming, so reserve its use for small strings or when conversion to strict Bytes is needed anyway.

trim :: Monad m => Int64 -> ByteStream m () -> ByteStream m () Source #

Breaking into many substrings

lines :: Monad m => ByteStream m r -> Stream (ByteStream m) m r Source #

Turns a ByteStream into a connected stream of ByteStreams that divide at newline characters. The resulting strings do not contain newlines. This is the genuinely streaming lines which only breaks chunks, and thus never increases the use of memory.

Because ByteStreams are usually read in binary mode, with no line ending conversion, this function recognizes both \n and \r\n endings (regardless of the current platform).

lines' :: Monad m => ByteStream m r -> Stream (Of Bytes) m r Source #

Turns a ByteStream into a stream of strict Bytes that divide at newline characters. The resulting strings do not contain newlines. This will cost memory if the lines are very long.

Special folds

concat :: Monad m => Stream (ByteStream m) m r -> ByteStream m r Source #

O(n) Concatenate a stream of byte streams.

Builders

toByteStreamWith :: MonadIO m => AllocationStrategy -> Builder -> ByteStream m () Source #

Take a builder and convert it to a genuine streaming bytestring, using a specific allocation strategy.

I/O with ByteStreams

Files

withOutputFile :: (MonadIO m, MonadMask m) => FilePath -> (Handle -> m a) -> m a Source #

writeFile :: (MonadIO m, MonadMask m) => FilePath -> ByteStream m r -> m r Source #

Writes a ByteStream to a file. Actually writes to a temporary file and renames it on successful completion. The filename "-" causes it to write to stdout instead.

I/O with Handles

hGetContents :: MonadIO m => Handle -> ByteStream m () Source #

Read entire handle contents lazily into a ByteStream. Chunks are read on demand, using the default chunk size.

Note: the Handle should be placed in binary mode with hSetBinaryMode for hGetContents to work correctly.

hGetContentsN :: MonadIO m => Int -> Handle -> ByteStream m () Source #

Read entire handle contents lazily into a ByteStream. Chunks are read on demand, in at most k-sized chunks. It does not block waiting for a whole k-sized chunk, so if less than k bytes are available then they will be returned immediately as a smaller chunk.

The handle is closed on EOF.

Note: the Handle should be placed in binary mode with hSetBinaryMode for hGetContentsN to work correctly.

hPut :: MonadIO m => Handle -> ByteStream m r -> m r Source #

Outputs a ByteStream to the specified Handle.

Simple chunkwise operations

nextChunk :: Monad m => ByteStream m r -> m (Either r (Bytes, ByteStream m r)) Source #

consChunk :: Bytes -> ByteStream m r -> ByteStream m r Source #

Smart constructor for Chunk.

chunk :: Bytes -> ByteStream m () Source #

Yield-style smart constructor for Chunk.

mapChunksM_ :: Monad m => (Bytes -> m ()) -> ByteStream m r -> m r Source #

compression support

gzip :: MonadIO m => ByteStream m r -> ByteStream m r Source #

Compresses a byte stream using GZip with default parameters.

gunzip :: MonadIO m => ByteStream m r -> ByteStream m r Source #

Decompresses GZip if present. If any GZip stream is found, all such streams are decompressed and any remaining data is discarded. Else, the input is returned unchanged. If the input is BGZF, the result will contain meaningful virtual offsets. If the input contains exactly one GZip stream, the result will have meaningfull offsets into the uncompressed data. Else, the offsets will be bogus.

gunzipWith :: MonadIO m => (ByteStream m r -> ByteStream m r) -> ByteStream m r -> ByteStream m r Source #

Checks if the input is GZip at all, and runs gunzip if it is. If it isn't, it runs k on the input.