streamly-core-0.2.2: Streaming, parsers, arrays, serialization and more
Copyright(c) 2019 Composewell Technologies
LicenseBSD3
Maintainerstreamly@composewell.com
Stabilityreleased
PortabilityGHC
Safe HaskellSafe-Inferred
LanguageHaskell2010

Streamly.Data.Array

Description

Unboxed immutable arrays with streaming interfaces.

Please refer to Streamly.Internal.Data.Array for functions that have not yet been released.

For arrays that work on boxed types, not requiring the Unbox constraint, please refer to Streamly.Data.Array.Generic. For arrays that can be mutated in-place, please see Streamly.Data.MutArray.

Synopsis

Setup

To execute the code examples provided in this module in ghci, please run the following commands first.

>>> :m
>>> :set -XFlexibleContexts
>>> :set -XMagicHash
>>> import Data.Function ((&))
>>> import Data.Functor.Identity (Identity(..))
>>> import System.IO.Unsafe (unsafePerformIO)
>>> import Streamly.Data.Array (Array)
>>> import Streamly.Data.Stream (Stream)
>>> import qualified Streamly.Data.Array as Array
>>> import qualified Streamly.Data.Fold as Fold
>>> import qualified Streamly.Data.ParserK as ParserK
>>> import qualified Streamly.Data.Stream as Stream
>>> import qualified Streamly.Data.StreamK as StreamK

For APIs that have not been released yet.

>>> import qualified Streamly.Internal.Data.Array as Array
>>> import qualified Streamly.Internal.Data.Stream as Stream

Overview

This module provides APIs to create and use unboxed immutable arrays. Once created, their contents cannot be modified. Only types that are unboxable via the Unbox type class can be stored in these arrays. Note that the array memory grows automatically when creating a new array, therefore, an array can be created from a variable length stream.

Folding Arrays

Convert array to stream, and fold the stream:

>>> fold f arr = Array.read arr & Stream.fold f
>>> fold Fold.sum (Array.fromList [1,2,3::Int])
6

Transforming Arrays

Convert array to stream, transform, and fold back to array:

>>> amap f arr = Array.read arr & fmap f & Stream.fold Array.create
>>> amap (+1) (Array.fromList [1,2,3::Int])
fromList [2,3,4]

Pinned and Unpinned Arrays

The array type can use both pinned and unpinned memory under the hood. The default array creation operations create unpinned arrays. IO operations automatically copy an array to pinned memory if the array passed to it is unpinned. Programmers can use appropriate pinned array generation APIs to reduce the copying if it happens.

Unpinned arrays have the advantage of allowing automatic defragmentation of the memory by GC. Whereas pinned arrays have the advantage of not requiring a copy by GC. Normally you would want to use unpinned arrays. However, in some cases, for example, for long lived large data storage, and for interfacing with the operating system or foreign (non-Haskell) consumers you may want to use pinned arrays.

Creating Arrays from Non-IO Streams

Array creation folds require MonadIO otherwise the compiler may incorrectly share the array memory thinking it is pure.

See the fromPureStream unreleased API to generate an array from an Identity stream safely without using MonadIO constraint.

Note that Identity streams can be generalized to IO streams:

>>> pure = Stream.fromList [1,2,3] :: Stream Identity Int
>>> generally = Stream.morphInner (return . runIdentity)
>>> Stream.fold Array.create (generally pure :: Stream IO Int)
fromList [1,2,3]

Programming Tips

This module is designed to be imported qualified:

>>> import qualified Streamly.Data.Array as Array

The Array Type

data Array a Source #

Instances

Instances details
a ~ Char => IsString (Array a) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Methods

fromString :: String -> Array a #

Unbox a => Monoid (Array a) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Methods

mempty :: Array a #

mappend :: Array a -> Array a -> Array a #

mconcat :: [Array a] -> Array a #

Unbox a => Semigroup (Array a) Source #

This should not be used for combining many or N arrays as it would copy the two arrays everytime to a new array. For coalescing multiple arrays use fromChunksK instead.

Instance details

Defined in Streamly.Internal.Data.Array.Type

Methods

(<>) :: Array a -> Array a -> Array a #

sconcat :: NonEmpty (Array a) -> Array a #

stimes :: Integral b => b -> Array a -> Array a #

Unbox a => IsList (Array a) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Associated Types

type Item (Array a) #

Methods

fromList :: [Item (Array a)] -> Array a #

fromListN :: Int -> [Item (Array a)] -> Array a #

toList :: Array a -> [Item (Array a)] #

(Unbox a, Read a, Show a) => Read (Array a) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

(Show a, Unbox a) => Show (Array a) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Methods

showsPrec :: Int -> Array a -> ShowS #

show :: Array a -> String #

showList :: [Array a] -> ShowS #

Eq (Array Int16) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Eq (Array Int32) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Eq (Array Int64) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Eq (Array Int8) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Methods

(==) :: Array Int8 -> Array Int8 -> Bool #

(/=) :: Array Int8 -> Array Int8 -> Bool #

Eq (Array Word16) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Eq (Array Word32) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Eq (Array Word64) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Eq (Array Word8) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Eq (Array Char) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Methods

(==) :: Array Char -> Array Char -> Bool #

(/=) :: Array Char -> Array Char -> Bool #

Eq (Array Int) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Methods

(==) :: Array Int -> Array Int -> Bool #

(/=) :: Array Int -> Array Int -> Bool #

(Unbox a, Eq a) => Eq (Array a) Source #

If the type allows a byte-by-byte comparison this instance can be overlapped by a more specific instance that uses byteCmp. Byte comparison can be significantly faster.

Instance details

Defined in Streamly.Internal.Data.Array.Type

Methods

(==) :: Array a -> Array a -> Bool #

(/=) :: Array a -> Array a -> Bool #

(Unbox a, Ord a) => Ord (Array a) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

Methods

compare :: Array a -> Array a -> Ordering #

(<) :: Array a -> Array a -> Bool #

(<=) :: Array a -> Array a -> Bool #

(>) :: Array a -> Array a -> Bool #

(>=) :: Array a -> Array a -> Bool #

max :: Array a -> Array a -> Array a #

min :: Array a -> Array a -> Array a #

Serialize (Array a) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

type Item (Array a) Source # 
Instance details

Defined in Streamly.Internal.Data.Array.Type

type Item (Array a) = a

Pinning & Unpinning

Arrays are created unpinned by default unless pinned versions of creation APIs are used. Look for APIs with pinned prefix in Streamly.Internal.Data.Array for some unreleased pinned creation APIs. If an array is to be sent to the OS without any further modification then it should be created pinned in the first place instead of pinning it later. Pinning an unpinned array has a copy overhead. OS interfacing APIs create a pinned array directly or convert an unpinned array to pinned array before sending it to the OS.

pin :: Array a -> IO (Array a) Source #

Return a copy of the Array in pinned memory if unpinned, else return the original array.

unpin :: Array a -> IO (Array a) Source #

Return a copy of the Array in unpinned memory if pinned, else return the original array.

isPinned :: Array a -> Bool Source #

Return True if the array is allocated in pinned memory.

Construction

When performance matters, the fastest way to generate an array is createOf. IsList and IsString instances can be used to conveniently construct arrays from literal values. OverloadedLists extension or fromList can be used to construct an array from a list literal. Similarly, OverloadedStrings extension or fromList can be used to construct an array from a string literal.

From Stream

createOf :: forall m a. (MonadIO m, Unbox a) => Int -> Fold m a (Array a) Source #

createOf n folds a maximum of n elements from the input stream to an Array.

create :: forall m a. (MonadIO m, Unbox a) => Fold m a (Array a) Source #

Fold the whole input to a single array.

Caution! Do not use this on infinite streams.

writeLastN :: (Storable a, Unbox a, MonadIO m) => Int -> Fold m a (Array a) Source #

writeLastN n folds a maximum of n elements from the end of the input stream to an Array.

From List

fromListN :: Unbox a => Int -> [a] -> Array a Source #

Create an Array from the first N elements of a list. The array is allocated to size N, if the list terminates before N elements then the array may hold less than N elements.

fromList :: Unbox a => [a] -> Array a Source #

Create an Array from a list. The list must be of finite size.

To List

toList :: Unbox a => Array a -> [a] Source #

Convert an Array into a list.

To Stream

read :: (Monad m, Unbox a) => Array a -> Stream m a Source #

Convert an Array into a stream.

Pre-release

readRev :: (Monad m, Unbox a) => Array a -> Stream m a Source #

Convert an Array into a stream in reverse order.

Pre-release

Unfolds

reader :: forall m a. (Monad m, Unbox a) => Unfold m (Array a) a Source #

Unfold an array into a stream.

readerRev :: forall m a. (Monad m, Unbox a) => Unfold m (Array a) a Source #

Unfold an array into a stream in reverse order.

Casting

cast :: forall a b. Unbox b => Array a -> Maybe (Array b) Source #

Cast an array having elements of type a into an array having elements of type b. The length of the array should be a multiple of the size of the target element otherwise Nothing is returned.

asBytes :: Array a -> Array Word8 Source #

Cast an Array a into an Array Word8.

Random Access

length :: Unbox a => Array a -> Int Source #

O(1) Get the length of the array i.e. the number of elements in the array.

getIndex :: forall a. Unbox a => Int -> Array a -> Maybe a Source #

O(1) Lookup the element at the given index. Index starts from 0.

Serialization

pinnedSerialize :: Serialize a => a -> Array Word8 Source #

Serialize a Haskell type to a pinned byte array. The array is allocated using pinned memory so that it can be used directly in OS APIs for writing to file or sending over the network.

Properties: 1. Identity: deserialize . pinnedSerialize == id 2. Encoded equivalence: pinnedSerialize a == pinnedSerialize a

deserialize :: Serialize a => Array Word8 -> a Source #

Decode a Haskell type from a byte array containing its serialized representation.

Re-exports

class Unbox a where Source #

The Unbox type class provides operations for serialization (unboxing) and deserialization (boxing) of fixed-length, non-recursive Haskell data types to and from their byte stream representation.

Unbox uses fixed size encoding, therefore, size is independent of the value, it must be determined solely by the type. This restriction makes types with Unbox instances suitable for storing in arrays. Note that sum types may have multiple constructors of different sizes, the size of a sum type is computed as the maximum required by any constructor.

The peekAt operation reads as many bytes from the mutable byte array as the size of the data type and builds a Haskell data type from these bytes. pokeAt operation converts a Haskell data type to its binary representation which consists of size bytes and then stores these bytes into the mutable byte array. These operations do not check the bounds of the array, the user of the type class is expected to check the bounds before peeking or poking.

IMPORTANT: The serialized data's byte ordering remains the same as the host machine's byte order. Therefore, it can not be deserialized from host machines with a different byte ordering.

Instances can be derived via Generics, Template Haskell, or written manually. Note that the data type must be non-recursive. WARNING! Generic and Template Haskell deriving, both hang for recursive data types. Deriving via Generics is more convenient but Template Haskell should be preferred over Generics for the following reasons:

  1. Instances derived via Template Haskell provide better and more reliable performance.
  2. Generic deriving allows only 256 fields or constructor tags whereas template Haskell has no limit.

Here is an example, for deriving an instance of this type class using generics:

>>> import GHC.Generics (Generic)
>>> :{
data Object = Object
    { _int0 :: Int
    , _int1 :: Int
    } deriving Generic
:}
>>> import Streamly.Data.MutByteArray (Unbox(..))
>>> instance Unbox Object

To derive the instance via Template Haskell:

import Streamly.Data.MutByteArray (deriveUnbox)
$(deriveUnbox [d|instance Unbox Object|])

See deriveUnbox for more information on deriving using Template Haskell.

If you want to write the instance manually:

>>> :{
instance Unbox Object where
    sizeOf _ = 16
    peekAt i arr = do
       -- Check the array bounds
        x0 <- peekAt i arr
        x1 <- peekAt (i + 8) arr
        return $ Object x0 x1
    pokeAt i arr (Object x0 x1) = do
       -- Check the array bounds
        pokeAt i arr x0
        pokeAt (i + 8) arr x1
:}

Minimal complete definition

Nothing

Methods

sizeOf :: Proxy a -> Int Source #

Get the size. Size cannot be zero, should be at least 1 byte.

default sizeOf :: SizeOfRep (Rep a) => Proxy a -> Int Source #

peekAt :: Int -> MutByteArray -> IO a Source #

peekAt byte-offset array reads an element of type a from the the given the byte offset in the array.

IMPORTANT: The implementation of this interface may not check the bounds of the array, the caller must not assume that.

default peekAt :: (Generic a, PeekRep (Rep a)) => Int -> MutByteArray -> IO a Source #

peekByteIndex :: Int -> MutByteArray -> IO a Source #

Deprecated: Use peekAt.

pokeAt :: Int -> MutByteArray -> a -> IO () Source #

pokeAt byte-offset array writes an element of type a to the the given the byte offset in the array.

IMPORTANT: The implementation of this interface may not check the bounds of the array, the caller must not assume that.

default pokeAt :: (Generic a, PokeRep (Rep a)) => Int -> MutByteArray -> a -> IO () Source #

pokeByteIndex :: Int -> MutByteArray -> a -> IO () Source #

Deprecated: Use pokeAt.

Instances

Instances details
Unbox IntPtr Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox WordPtr Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Fingerprint Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Int16 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Int32 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Int64 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Int8 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox IoSubSystem Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Word16 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Word32 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Word64 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Word8 Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox MicroSecond64 Source # 
Instance details

Defined in Streamly.Internal.Data.Time.Units

Unbox MilliSecond64 Source # 
Instance details

Defined in Streamly.Internal.Data.Time.Units

Unbox NanoSecond64 Source # 
Instance details

Defined in Streamly.Internal.Data.Time.Units

Unbox () Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Bool Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Char Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Double Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Float Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Int Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox Word Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox a => Unbox (Complex a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox a => Unbox (Identity a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox a => Unbox (Down a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox (FunPtr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox (Ptr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox a => Unbox (Ratio a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox (StablePtr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

Unbox a => Unbox (Const a b) Source # 
Instance details

Defined in Streamly.Internal.Data.Unbox

class Serialize a where Source #

The Serialize type class provides operations for serialization and deserialization of general Haskell data types to and from their byte stream representation.

Unlike Unbox, Serialize uses variable length encoding, therefore, it can serialize recursive and variable length data types like lists, or variable length sum types where the length of the value may vary depending on a particular constructor. For variable length data types the length is encoded along with the data.

The deserializeAt operation reads bytes from the mutable byte array and builds a Haskell data type from these bytes, the number of bytes it reads depends on the type and the encoded value it is reading. serializeAt operation converts a Haskell data type to its binary representation which must consist of as many bytes as added by the addSizeTo operation for that value and then stores these bytes into the mutable byte array. The programmer is expected to use the addSizeTo operation and allocate an array of sufficient length before calling serializeAt.

IMPORTANT: The serialized data's byte ordering remains the same as the host machine's byte order. Therefore, it can not be deserialized from host machines with a different byte ordering.

Instances can be derived via Template Haskell, or written manually.

Here is an example, for deriving an instance of this type class using template Haskell:

>>> :{
data Object = Object
    { _obj1 :: [Int]
    , _obj2 :: Int
    }
:}
import Streamly.Data.MutByteArray (deriveSerialize)
$(deriveSerialize [d|instance Serialize Object|])

See deriveSerialize and deriveSerializeWith for more information on deriving using Template Haskell.

Here is an example of a manual instance.

>>> import Streamly.Data.MutByteArray (Serialize(..))
>>> :{
instance Serialize Object where
    addSizeTo acc obj = addSizeTo (addSizeTo acc (_obj1 obj)) (_obj2 obj)
    deserializeAt i arr len = do
         -- Check the array bounds before reading
        (i1, x0) <- deserializeAt i arr len
        (i2, x1) <- deserializeAt i1 arr len
        pure (i2, Object x0 x1)
    serializeAt i arr (Object x0 x1) = do
        i1 <- serializeAt i arr x0
        i2 <- serializeAt i1 arr x1
        pure i2
:}

Methods

addSizeTo :: Int -> a -> Int Source #

addSizeTo accum value returns accum incremented by the size of the serialized representation of value in bytes. Size cannot be zero. It should be at least 1 byte.

deserializeAt :: Int -> MutByteArray -> Int -> IO (Int, a) Source #

deserializeAt byte-offset array arrayLen deserializes a value from the given byte-offset in the array. Returns a tuple consisting of the next byte-offset and the deserialized value.

The arrayLen passed is the entire length of the input buffer. It is to be used to check if we would overflow the input buffer when deserializing.

Throws an exception if the operation would exceed the supplied arrayLen.

serializeAt :: Int -> MutByteArray -> a -> IO Int Source #

serializeAt byte-offset array value writes the serialized representation of the value in the array at the given byte-offset. Returns the next byte-offset.

This is an unsafe operation, the programmer must ensure that the array has enough space available to serialize the value as determined by the addSizeTo operation.

Instances

Instances details
Serialize Int16 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Int32 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Int64 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Int8 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Word16 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Word32 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Word64 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Word8 Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Integer Source # 
Instance details

Defined in Streamly.Internal.Data.MutByteArray

Serialize () Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Methods

addSizeTo :: Int -> () -> Int Source #

deserializeAt :: Int -> MutByteArray -> Int -> IO (Int, ()) Source #

serializeAt :: Int -> MutByteArray -> () -> IO Int Source #

Serialize Bool Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Char Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Double Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Float Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Int Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize Word Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize (FunPtr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize (Ptr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize (StablePtr a) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize (Array a) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Serialize a => Serialize (Maybe a) Source # 
Instance details

Defined in Streamly.Internal.Data.MutByteArray

Serialize a => Serialize [a] Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Methods

addSizeTo :: Int -> [a] -> Int Source #

deserializeAt :: Int -> MutByteArray -> Int -> IO (Int, [a]) Source #

serializeAt :: Int -> MutByteArray -> [a] -> IO Int Source #

(Serialize a, Serialize b) => Serialize (Either a b) Source # 
Instance details

Defined in Streamly.Internal.Data.MutByteArray

Serialize (Proxy a) Source # 
Instance details

Defined in Streamly.Internal.Data.MutByteArray

(Serialize a, Serialize b) => Serialize (a, b) Source # 
Instance details

Defined in Streamly.Internal.Data.Serialize.Type

Methods

addSizeTo :: Int -> (a, b) -> Int Source #

deserializeAt :: Int -> MutByteArray -> Int -> IO (Int, (a, b)) Source #

serializeAt :: Int -> MutByteArray -> (a, b) -> IO Int Source #

Deprecated

writeN :: forall m a. (MonadIO m, Unbox a) => Int -> Fold m a (Array a) Source #

write :: forall m a. (MonadIO m, Unbox a) => Fold m a (Array a) Source #