| Copyright | (c) The University of Glasgow 2003 | 
|---|---|
| License | see libraries/base/LICENSE | 
| Maintainer | ghc-devs@haskell.org | 
| Stability | internal | 
| Portability | non-portable (GHC extensions) | 
| Safe Haskell | Trustworthy | 
| Language | Haskell2010 | 
GHC.Internal.Unicode
Description
Implementations for the character predicates (isLower, isUpper, etc.) and the conversions (toUpper, toLower). The implementation uses libunicode on Unix systems if that is available.
Synopsis
- unicodeVersion :: Version
- data GeneralCategory- = UppercaseLetter
- | LowercaseLetter
- | TitlecaseLetter
- | ModifierLetter
- | OtherLetter
- | NonSpacingMark
- | SpacingCombiningMark
- | EnclosingMark
- | DecimalNumber
- | LetterNumber
- | OtherNumber
- | ConnectorPunctuation
- | DashPunctuation
- | OpenPunctuation
- | ClosePunctuation
- | InitialQuote
- | FinalQuote
- | OtherPunctuation
- | MathSymbol
- | CurrencySymbol
- | ModifierSymbol
- | OtherSymbol
- | Space
- | LineSeparator
- | ParagraphSeparator
- | Control
- | Format
- | Surrogate
- | PrivateUse
- | NotAssigned
 
- generalCategory :: Char -> GeneralCategory
- isAscii :: Char -> Bool
- isLatin1 :: Char -> Bool
- isControl :: Char -> Bool
- isAsciiUpper :: Char -> Bool
- isAsciiLower :: Char -> Bool
- isPrint :: Char -> Bool
- isSpace :: Char -> Bool
- isUpper :: Char -> Bool
- isUpperCase :: Char -> Bool
- isLower :: Char -> Bool
- isLowerCase :: Char -> Bool
- isAlpha :: Char -> Bool
- isDigit :: Char -> Bool
- isOctDigit :: Char -> Bool
- isHexDigit :: Char -> Bool
- isAlphaNum :: Char -> Bool
- isPunctuation :: Char -> Bool
- isSymbol :: Char -> Bool
- toUpper :: Char -> Char
- toLower :: Char -> Char
- toTitle :: Char -> Char
Documentation
unicodeVersion :: Version Source #
Version of Unicode standard used by base:
 16.0.0.
Since: base-4.15.0.0
data GeneralCategory Source #
Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard (the Unicode Character Database, in particular).
Examples
Basic usage:
>>>:t OtherLetterOtherLetter :: GeneralCategory
Eq instance:
>>>UppercaseLetter == UppercaseLetterTrue>>>UppercaseLetter == LowercaseLetterFalse
Ord instance:
>>>NonSpacingMark <= MathSymbolTrue
Enum instance:
>>>enumFromTo ModifierLetter SpacingCombiningMark[ModifierLetter,OtherLetter,NonSpacingMark,SpacingCombiningMark]
Read instance:
>>>read "DashPunctuation" :: GeneralCategoryDashPunctuation>>>read "17" :: GeneralCategory*** Exception: Prelude.read: no parse
Show instance:
>>>show EnclosingMark"EnclosingMark"
Bounded instance:
>>>minBound :: GeneralCategoryUppercaseLetter>>>maxBound :: GeneralCategoryNotAssigned
Ix instance:
>>>import GHC.Internal.Data.Ix ( index )>>>index (OtherLetter,Control) FinalQuote12>>>index (OtherLetter,Control) Format*** Exception: Error in array index
Constructors
| UppercaseLetter | Lu: Letter, Uppercase | 
| LowercaseLetter | Ll: Letter, Lowercase | 
| TitlecaseLetter | Lt: Letter, Titlecase | 
| ModifierLetter | Lm: Letter, Modifier | 
| OtherLetter | Lo: Letter, Other | 
| NonSpacingMark | Mn: Mark, Non-Spacing | 
| SpacingCombiningMark | Mc: Mark, Spacing Combining | 
| EnclosingMark | Me: Mark, Enclosing | 
| DecimalNumber | Nd: Number, Decimal | 
| LetterNumber | Nl: Number, Letter | 
| OtherNumber | No: Number, Other | 
| ConnectorPunctuation | Pc: Punctuation, Connector | 
| DashPunctuation | Pd: Punctuation, Dash | 
| OpenPunctuation | Ps: Punctuation, Open | 
| ClosePunctuation | Pe: Punctuation, Close | 
| InitialQuote | Pi: Punctuation, Initial quote | 
| FinalQuote | Pf: Punctuation, Final quote | 
| OtherPunctuation | Po: Punctuation, Other | 
| MathSymbol | Sm: Symbol, Math | 
| CurrencySymbol | Sc: Symbol, Currency | 
| ModifierSymbol | Sk: Symbol, Modifier | 
| OtherSymbol | So: Symbol, Other | 
| Space | Zs: Separator, Space | 
| LineSeparator | Zl: Separator, Line | 
| ParagraphSeparator | Zp: Separator, Paragraph | 
| Control | Cc: Other, Control | 
| Format | Cf: Other, Format | 
| Surrogate | Cs: Other, Surrogate | 
| PrivateUse | Co: Other, Private Use | 
| NotAssigned | Cn: Other, Not Assigned | 
Instances
generalCategory :: Char -> GeneralCategory Source #
The Unicode general category of the character. This relies on the
 Enum instance of GeneralCategory, which must remain in the
 same order as the categories are presented in the Unicode
 standard.
Examples
Basic usage:
>>>generalCategory 'a'LowercaseLetter>>>generalCategory 'A'UppercaseLetter>>>generalCategory '0'DecimalNumber>>>generalCategory '%'OtherPunctuation>>>generalCategory '♥'OtherSymbol>>>generalCategory '\31'Control>>>generalCategory ' 'Space
isAscii :: Char -> Bool Source #
Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.
isLatin1 :: Char -> Bool Source #
Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.
isControl :: Char -> Bool Source #
Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.
isAsciiUpper :: Char -> Bool Source #
isAsciiLower :: Char -> Bool Source #
isPrint :: Char -> Bool Source #
Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).
This function returns False if its argument has one of the
 following GeneralCategorys, or True otherwise:
isSpace :: Char -> Bool Source #
Returns True for any Unicode space character, and the control
 characters \t, \n, \r, \f, \v.
isUpper :: Char -> Bool Source #
Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.
Note: this predicate does not work for letter-like characters such as:
 'Ⓐ' (U+24B6 circled Latin capital letter A) and
 'Ⅳ' (U+2163 Roman numeral four). This is due to selecting only
 characters with the GeneralCategory UppercaseLetter or TitlecaseLetter.
See isUpperCase for a more intuitive predicate. Note that
 unlike isUpperCase, isUpper does select title-case characters such as
 'Dž' (U+01C5 Latin capital letter d with small letter z with caron) or
 'ᾯ' (U+1FAF Greek capital letter omega with dasia and perispomeni and
 prosgegrammeni).
isUpperCase :: Char -> Bool Source #
Selects upper-case Unicode letter-like characters.
Note: this predicate selects characters with the Unicode property
 Uppercase, which include letter-like characters such as:
 'Ⓐ' (U+24B6 circled Latin capital letter A) and
 'Ⅳ' (U+2163 Roman numeral four).
See isUpper for the legacy predicate. Note that
 unlike isUpperCase, isUpper does select title-case characters such as
 'Dž' (U+01C5 Latin capital letter d with small letter z with caron) or
 'ᾯ' (U+1FAF Greek capital letter omega with dasia and perispomeni and
 prosgegrammeni).
Since: base-4.18.0.0
isLower :: Char -> Bool Source #
Selects lower-case alphabetic Unicode characters (letters).
Note: this predicate does not work for letter-like characters such as:
 'ⓐ' (U+24D0 circled Latin small letter a) and
 'ⅳ' (U+2173 small Roman numeral four). This is due to selecting only
 characters with the GeneralCategory LowercaseLetter.
See isLowerCase for a more intuitive predicate.
isLowerCase :: Char -> Bool Source #
Selects lower-case Unicode letter-like characters.
Note: this predicate selects characters with the Unicode property
 Lowercase, which includes letter-like characters such as:
 'ⓐ' (U+24D0 circled Latin small letter a) and
 'ⅳ' (U+2173 small Roman numeral four).
See isLower for the legacy predicate.
Since: base-4.18.0.0
isAlpha :: Char -> Bool Source #
Selects alphabetic Unicode characters (lower-case, upper-case and
 title-case letters, plus letters of caseless scripts and modifiers letters).
 This function is equivalent to isLetter.
This function returns True if its argument has one of the
 following GeneralCategorys, or False otherwise:
These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Letter".
isOctDigit :: Char -> Bool Source #
Selects ASCII octal digits, i.e. '0'..'7'.
isHexDigit :: Char -> Bool Source #
Selects ASCII hexadecimal digits,
 i.e. '0'..'9', 'a'..'f', 'A'..'F'.
isAlphaNum :: Char -> Bool Source #
Selects alphabetic or numeric Unicode characters.
Note that numeric digits outside the ASCII range, as well as numeric
 characters which aren't digits, are selected by this function but not by
 isDigit. Such characters may be part of identifiers but are not used by
 the printer and reader to represent numbers, e.g., Roman numerals like V'1' (aka '65297').
This function returns True if its argument has one of the
 following GeneralCategorys, or False otherwise:
isPunctuation :: Char -> Bool Source #
Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.
This function returns True if its argument has one of the
 following GeneralCategorys, or False otherwise:
- ConnectorPunctuation
- DashPunctuation
- OpenPunctuation
- ClosePunctuation
- InitialQuote
- FinalQuote
- OtherPunctuation
These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Punctuation".
Examples
Basic usage:
>>>isPunctuation 'a'False>>>isPunctuation '7'False>>>isPunctuation '♥'False>>>isPunctuation '"'True>>>isPunctuation '?'True>>>isPunctuation '—'True
isSymbol :: Char -> Bool Source #
Selects Unicode symbol characters, including mathematical and currency symbols.
This function returns True if its argument has one of the
 following GeneralCategorys, or False otherwise:
These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Symbol".
Examples
Basic usage:
>>>isSymbol 'a'False>>>isSymbol '6'False>>>isSymbol '='True
The definition of "math symbol" may be a little counter-intuitive depending on one's background:
>>>isSymbol '+'True>>>isSymbol '-'False
toUpper :: Char -> Char Source #
Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.