| Copyright | (c) 2018 Composewell Technologies | 
|---|---|
| License | BSD3 | 
| Maintainer | streamly@composewell.com | 
| Stability | experimental | 
| Portability | GHC | 
| Safe Haskell | Safe-Inferred | 
| Language | Haskell2010 | 
Streamly.Data.Unicode.Stream
Description
Deprecated: Use Streamly.Unicode.Stream instead
Processing Unicode Strings
A Char stream is the canonical representation to process Unicode strings.
 It can be processed efficiently using regular stream processing operations.
 A byte stream of Unicode text read from an IO device or from an
 Array in memory can be decoded into a Char stream
 using the decoding routines in this module.  A String ([Char]) can be
 converted into a Char stream using fromList.  An Array
 Char can be unfolded into a stream using the array
 read unfold.
Storing Unicode Strings
A stream of Char can be encoded into a byte stream using the encoding
 routines in this module and then written to IO devices or to arrays in
 memory.
If you have to store a Char stream in memory you can convert it into a
 String using toList or using the
 toList fold. The String type can be more efficient
 than pinned arrays for short and short lived strings.
For longer or long lived streams you can fold the Char
 stream as Array Char using the array write fold.
 The Array type provides a more compact representation and pinned memory
 reducing GC overhead. If space efficiency is a concern you can use
 encodeUtf8' on the Char stream before writing it to an Array providing
 an even more compact representation.
String Literals
SerialT Identity Char and Array Char are instances of IsString and
 IsList, therefore, OverloadedStrings and OverloadedLists extensions
 can be used for convenience when specifying unicode strings literals using
 these types.
Pitfalls
- Case conversion: Some unicode characters translate to more than one code
 point on case conversion. The toUpperandtoLowerfunctions inbasepackage do not handle such characters. Therefore, operations likemap toUpperon a character stream or character array may not always perform correct conversion.
- String comparison: In some cases, visually identical strings may have different unicode representations, therefore, a character stream or character array cannot be directly compared. A normalized comparison may be needed to check string equivalence correctly.
Experimental APIs
Some experimental APIs to conveniently process text using the
 Array Char represenation directly can be found in
 Streamly.Internal.Memory.Unicode.Array.
Synopsis
- decodeLatin1 :: (IsStream t, Monad m) => t m Word8 -> t m Char
- decodeUtf8 :: (Monad m, IsStream t) => t m Word8 -> t m Char
- encodeLatin1 :: (IsStream t, Monad m) => t m Char -> t m Word8
- encodeUtf8 :: (Monad m, IsStream t) => t m Char -> t m Word8
- decodeUtf8Lax :: (IsStream t, Monad m) => t m Word8 -> t m Char
- encodeLatin1Lax :: (IsStream t, Monad m) => t m Char -> t m Word8
- encodeUtf8Lax :: (IsStream t, Monad m) => t m Char -> t m Word8
Construction (Decoding)
decodeLatin1 :: (IsStream t, Monad m) => t m Word8 -> t m Char Source #
Decode a stream of bytes to Unicode characters by mapping each byte to a
 corresponding Unicode Char in 0-255 range.
Since: 0.7.0 (Streamly.Data.Unicode.Stream)
Since: 0.8.0
decodeUtf8 :: (Monad m, IsStream t) => t m Word8 -> t m Char Source #
Decode a UTF-8 encoded bytestream to a stream of Unicode characters. Any invalid codepoint encountered is replaced with the unicode replacement character.
Since: 0.7.0 (Streamly.Data.Unicode.Stream)
Since: 0.8.0 (Lenient Behaviour)
Elimination (Encoding)
encodeLatin1 :: (IsStream t, Monad m) => t m Char -> t m Word8 Source #
Like encodeLatin1' but silently maps input codepoints beyond 255 to
 arbitrary Latin1 chars in 0-255 range. No error or exception is thrown when
 such mapping occurs.
Since: 0.7.0 (Streamly.Data.Unicode.Stream)
Since: 0.8.0 (Lenient Behaviour)
encodeUtf8 :: (Monad m, IsStream t) => t m Char -> t m Word8 Source #
Encode a stream of Unicode characters to a UTF-8 encoded bytestream. Any Invalid characters (U+D800-U+D8FF) in the input stream are replaced by the Unicode replacement character U+FFFD.
Since: 0.7.0 (Streamly.Data.Unicode.Stream)
Since: 0.8.0 (Lenient Behaviour)
Deprecations
decodeUtf8Lax :: (IsStream t, Monad m) => t m Word8 -> t m Char Source #
Deprecated: Please use decodeUtf8 instead
Same as decodeUtf8
encodeLatin1Lax :: (IsStream t, Monad m) => t m Char -> t m Word8 Source #
Deprecated: Please use encodeLatin1 instead
Same as encodeLatin1
encodeUtf8Lax :: (IsStream t, Monad m) => t m Char -> t m Word8 Source #
Deprecated: Please use encodeUtf8 instead
Same as encodeUtf8