Copyright	(c) 2008 2009 Tom Harper (c) 2009 2010 Bryan O'Sullivan (c) 2009 Duncan Coutts
License	BSD-style
Maintainer	bos@serpentine.com
Stability	experimental
Portability	GHC
Safe Haskell	None
Language	Haskell2010

Data.Text.Internal.Encoding.Utf8

Contents

Validation

Description

Warning: this is an internal module, and does not have a stable API or name. Functions in this module may not check or enforce preconditions expected by public modules. Use at your own risk!

Basic UTF-8 validation and character manipulation.

Synopsis

Documentation

ord2 :: Char -> (Word8, Word8) Source #

ord3 :: Char -> (Word8, Word8, Word8) Source #

ord4 :: Char -> (Word8, Word8, Word8, Word8) Source #

chr2 :: Word8 -> Word8 -> Char Source #

chr3 :: Word8 -> Word8 -> Word8 -> Char Source #

chr4 :: Word8 -> Word8 -> Word8 -> Word8 -> Char Source #

Validation

continuationByte :: Word8 -> Bool Source #

Utility function: check if a word is an UTF-8 continuation byte

validate1 :: Word8 -> Bool Source #

validate2 :: Word8 -> Word8 -> Bool Source #

validate3 :: Word8 -> Word8 -> Word8 -> Bool Source #

validate4 :: Word8 -> Word8 -> Word8 -> Word8 -> Bool Source #

decodeChar :: (Char -> Int -> a) -> Word8 -> Word8 -> Word8 -> Word8 -> a Source #

Hybrid combination of unsafeChr8, chr2, chr3 and chr4. This function will not touch the bytes it doesn't need.

decodeCharIndex :: (Char -> Int -> a) -> (Int -> Word8) -> Int -> a Source #

Version of decodeChar which works with an indexing function.

reverseDecodeCharIndex :: (Char -> Int -> a) -> (Int -> Word8) -> Int -> a Source #

Version of decodeCharIndex that takes the rightmost index and tracks back to the left. Note that this function requires that the input is valid unicode.

encodeChar :: (Word8 -> a) -> (Word8 -> Word8 -> a) -> (Word8 -> Word8 -> Word8 -> a) -> (Word8 -> Word8 -> Word8 -> Word8 -> a) -> Char -> a Source #

This function provides fast UTF-8 encoding of characters because the user can supply custom functions for the different code paths, which should be inlined properly.

charTailBytes :: Char -> Int Source #

Count the number of UTF-8 tail bytes needed to encode a character