| Copyright | © Herbert Valerio Riedel 2017 |
|---|---|
| License | BSD3 |
| Maintainer | hvr@gnu.org |
| Stability | stable |
| Safe Haskell | Trustworthy |
| Language | Haskell2010 |
Data.Text.Short
Description
Memory-efficient representation of Unicode text strings.
- data ShortText
- null :: ShortText -> Bool
- length :: ShortText -> Int
- isAscii :: ShortText -> Bool
- fromString :: String -> ShortText
- toString :: ShortText -> String
- fromText :: Text -> ShortText
- toText :: ShortText -> Text
- fromShortByteString :: ShortByteString -> Maybe ShortText
- toShortByteString :: ShortText -> ShortByteString
- fromByteString :: ByteString -> Maybe ShortText
- toByteString :: ShortText -> ByteString
- toBuilder :: ShortText -> Builder
The ShortText type
A compact representation of Unicode strings.
This type relates to Text as ShortByteString relates to ByteString by providing a more compact type. Please consult the documentation of Data.ByteString.Short for more information.
Currently, a boxed unshared Text has a memory footprint of 6 words (i.e. 48 bytes on 64-bit systems) plus 2 or 4 bytes per code-point (due to the internal UTF-16 representation). Each Text value which can share its payload with another Text requires only 4 words additionally. Unlike ByteString, Text use unpinned memory.
In comparison, the footprint of a boxed ShortText is only 4 words (i.e. 32 bytes on 64-bit systems) plus 123/4 bytes per code-point (due to the internal UTF-8 representation).
It can be shown that for realistic data UTF-16 has a space overhead of 50% over UTF-8.
Instances
| Eq ShortText Source # | |
| Ord ShortText Source # | |
| Read ShortText Source # | |
| Show ShortText Source # | |
| IsString ShortText Source # | Behaviour for |
| Semigroup ShortText Source # | |
| Monoid ShortText Source # | |
| Binary ShortText Source # | |
| NFData ShortText Source # | |
| Hashable ShortText Source # | |
Basic operations
isAscii :: ShortText -> Bool Source #
O(n) Test whether ShortText contains only ASCII code-points (i.e. only U+0000 through U+007F).
Conversions
String
fromString :: String -> ShortText Source #
O(n) Construct/pack from String
Note: This function is total because it replaces the (invalid) code-points U+D800 through U+DFFF with the replacement character U+FFFD.
Text
ByteString
fromShortByteString :: ShortByteString -> Maybe ShortText Source #
O(n) Construct ShortText from UTF-8 encoded ShortByteString
This operation doesn't copy the input ShortByteString but it
cannot be O(1) because we need to validate the UTF-8 encoding.
Returns Nothing in case of invalid UTF-8 encoding.
toShortByteString :: ShortText -> ShortByteString Source #
O(0) Converts to UTF-8 encoded ShortByteString
This operation has effectively no overhead, as it's currently merely a newtype-cast.
fromByteString :: ByteString -> Maybe ShortText Source #
O(n) Construct ShortText from UTF-8 encoded ByteString
Returns Nothing in case of invalid UTF-8 encoding.
toByteString :: ShortText -> ByteString Source #
O(n) Converts to UTF-8 encoded ByteString