Copyright | © Herbert Valerio Riedel 2017 |
---|---|
License | BSD3 |
Maintainer | hvr@gnu.org |
Stability | stable |
Safe Haskell | Trustworthy |
Language | Haskell2010 |
Memory-efficient representation of Unicode text strings.
- data ShortText
- null :: ShortText -> Bool
- length :: ShortText -> Int
- isAscii :: ShortText -> Bool
- fromString :: String -> ShortText
- toString :: ShortText -> String
- fromText :: Text -> ShortText
- toText :: ShortText -> Text
- fromShortByteString :: ShortByteString -> Maybe ShortText
- toShortByteString :: ShortText -> ShortByteString
- fromByteString :: ByteString -> Maybe ShortText
- toByteString :: ShortText -> ByteString
- toBuilder :: ShortText -> Builder
The ShortText
type
A compact representation of Unicode strings.
This type relates to Text
as ShortByteString
relates to ByteString
by providing a more compact type. Please consult the documentation of Data.ByteString.Short for more information.
Currently, a boxed unshared Text
has a memory footprint of 6 words (i.e. 48 bytes on 64-bit systems) plus 2 or 4 bytes per code-point (due to the internal UTF-16 representation). Each Text
value which can share its payload with another Text
requires only 4 words additionally. Unlike ByteString
, Text
use unpinned memory.
In comparison, the footprint of a boxed ShortText
is only 4 words (i.e. 32 bytes on 64-bit systems) plus 123/4 bytes per code-point (due to the internal UTF-8 representation).
It can be shown that for realistic data UTF-16 has a space overhead of 50% over UTF-8.
Eq ShortText Source # | |
Ord ShortText Source # | |
Read ShortText Source # | |
Show ShortText Source # | |
IsString ShortText Source # | Behaviour for |
Semigroup ShortText Source # | |
Monoid ShortText Source # | |
Binary ShortText Source # | |
NFData ShortText Source # | |
Hashable ShortText Source # | |
Basic operations
isAscii :: ShortText -> Bool Source #
O(n) Test whether ShortText
contains only ASCII code-points (i.e. only U+0000 through U+007F).
Conversions
String
fromString :: String -> ShortText Source #
O(n) Construct/pack from String
Note: This function is total because it replaces the (invalid) code-points U+D800 through U+DFFF with the replacement character U+FFFD.
Text
ByteString
fromShortByteString :: ShortByteString -> Maybe ShortText Source #
O(n) Construct ShortText
from UTF-8 encoded ShortByteString
This operation doesn't copy the input ShortByteString
but it
cannot be O(1) because we need to validate the UTF-8 encoding.
Returns Nothing
in case of invalid UTF-8 encoding.
toShortByteString :: ShortText -> ShortByteString Source #
O(0) Converts to UTF-8 encoded ShortByteString
This operation has effectively no overhead, as it's currently merely a newtype
-cast.
fromByteString :: ByteString -> Maybe ShortText Source #
O(n) Construct ShortText
from UTF-8 encoded ByteString
Returns Nothing
in case of invalid UTF-8 encoding.
toByteString :: ShortText -> ByteString Source #
O(n) Converts to UTF-8 encoded ByteString