Copyright | (c) 2008 2009 Tom Harper (c) 2009 2010 Bryan O'Sullivan (c) 2009 Duncan Coutts |
---|---|
License | BSD-style |
Maintainer | bos@serpentine.com |
Stability | experimental |
Portability | GHC |
Safe Haskell | None |
Language | Haskell2010 |
Warning: this is an internal module, and does not have a stable API or name. Functions in this module may not check or enforce preconditions expected by public modules. Use at your own risk!
Basic UTF-8 validation and character manipulation.
- ord2 :: Char -> (Word8, Word8)
- ord3 :: Char -> (Word8, Word8, Word8)
- ord4 :: Char -> (Word8, Word8, Word8, Word8)
- chr2 :: Word8 -> Word8 -> Char
- chr3 :: Word8 -> Word8 -> Word8 -> Char
- chr4 :: Word8 -> Word8 -> Word8 -> Word8 -> Char
- continuationByte :: Word8 -> Bool
- validate1 :: Word8 -> Bool
- validate2 :: Word8 -> Word8 -> Bool
- validate3 :: Word8 -> Word8 -> Word8 -> Bool
- validate4 :: Word8 -> Word8 -> Word8 -> Word8 -> Bool
- decodeChar :: (Char -> Int -> a) -> Word8 -> Word8 -> Word8 -> Word8 -> a
- decodeCharIndex :: (Char -> Int -> a) -> (Int -> Word8) -> Int -> a
- reverseDecodeCharIndex :: (Char -> Int -> a) -> (Int -> Word8) -> Int -> a
- encodeChar :: (Word8 -> a) -> (Word8 -> Word8 -> a) -> (Word8 -> Word8 -> Word8 -> a) -> (Word8 -> Word8 -> Word8 -> Word8 -> a) -> Char -> a
- charTailBytes :: Char -> Int
Documentation
Validation
continuationByte :: Word8 -> Bool Source #
Utility function: check if a word is an UTF-8 continuation byte
decodeChar :: (Char -> Int -> a) -> Word8 -> Word8 -> Word8 -> Word8 -> a Source #
Hybrid combination of unsafeChr8
, chr2
, chr3
and chr4
. This
function will not touch the bytes it doesn't need.
decodeCharIndex :: (Char -> Int -> a) -> (Int -> Word8) -> Int -> a Source #
Version of decodeChar
which works with an indexing function.
reverseDecodeCharIndex :: (Char -> Int -> a) -> (Int -> Word8) -> Int -> a Source #
Version of decodeCharIndex
that takes the rightmost index and tracks
back to the left. Note that this function requires that the input is
valid unicode.
encodeChar :: (Word8 -> a) -> (Word8 -> Word8 -> a) -> (Word8 -> Word8 -> Word8 -> a) -> (Word8 -> Word8 -> Word8 -> Word8 -> a) -> Char -> a Source #
This function provides fast UTF-8 encoding of characters because the user can supply custom functions for the different code paths, which should be inlined properly.
charTailBytes :: Char -> Int Source #
Count the number of UTF-8 tail bytes needed to encode a character