Safe Haskell | Safe |
---|---|
Language | Haskell2010 |
This module provides a simple parser for UTF8. It converts a
string of bytes into a list of unicode tokens. Invalid input bytes
are converted to special Invalid
tokens, leaving it up to the
consuming application to decide what to do with them.
Documentation
A token in a parsed UTF8 stream is either a valid Unicode character or an invalid input character.
parse_utf8 :: String -> [Token] Source #
Parse a UTF8 stream into tokens. Rejects overlong forms and code points above 0x10ffff. Does not check validity of individual unicode code points.