Data.String.Unicode

Contents

Description

Unicode and UTF-8 Conversion Functions

Synopsis

Unicode Type declarations

Unicode is represented as the Char type Precondition for this is the support of Unicode character range in the compiler (e.g. ghc but not hugs)

the type for Unicode strings

UTF-8 charachters are represented by the Char type

UTF-8 strings are implemented as Haskell strings

Decoding function with a pair containing the result string and a list of decoding errors as result

Decoding function where decoding errors are interleaved with decoded characters

UTF-8 to Unicode conversion with deletion of leading byte order mark, as described in XML standard F.1

code conversion from latin1 to Unicode

UCS-2 to UTF-8 conversion with byte order mark analysis

UCS-2 big endian to Unicode conversion

UCS-2 little endian to Unicode conversion

UTF-16 big endian to UTF-8 conversion with removal of byte order mark

UTF-16 little endian to UTF-8 conversion with removal of byte order mark

conversion from Unicode (Char) to a UTF8 encoded string.

conversion from Unicode strings (UString) to UTF8 encoded strings.

substitute all Unicode characters, that are not legal 1-byte UTF-8 XML characters by a character reference.

This function can be used to translate all text nodes and attribute values into pure ascii.