Safe Haskell | Safe-Inferred |
---|
This module provides a simple parser for UTF8. It converts a
string of bytes into a list of unicode tokens. Invalid input bytes
are converted to special Invalid
tokens, leaving it up to the
consuming application to decide what to do with them.
Documentation
A token in a parsed UTF8 stream is either a valid Unicode character or an invalid input character.
parse_utf8 :: String -> [Token]Source
Parse a UTF8 stream into tokens. Rejects overlong forms and code points above 0x10ffff. Does not check validity of individual unicode code points.