text-icu-0.8.0.2: Bindings to the ICU library
Copyright(c) 2010 Bryan O'Sullivan
LicenseBSD-style
Maintainerbos@serpentine.com
Stabilityexperimental
PortabilityGHC
Safe HaskellSafe-Inferred
LanguageHaskell98

Data.Text.ICU.Collate

Description

String collation functions for Unicode, implemented as bindings to the International Components for Unicode (ICU) libraries.

Synopsis

Unicode collation API

 

data MCollator Source #

String collator type.

data Attribute Source #

Constructors

French Bool

Direction of secondary weights, used in French. True, results in secondary weights being considered backwards, while False treats secondary weights in the order in which they appear.

AlternateHandling AlternateHandling

For handling variable elements. NonIgnorable is default.

CaseFirst (Maybe CaseFirst)

Control the ordering of upper and lower case letters. Nothing (the default) orders upper and lower case letters in accordance to their tertiary weights.

CaseLevel Bool

Controls whether an extra case level (positioned before the third level) is generated or not. When False (default), case level is not generated; when True, the case level is generated. Contents of the case level are affected by the value of the CaseFirst attribute. A simple way to ignore accent differences in a string is to set the strength to Primary and enable case level.

NormalizationMode Bool

Controls whether the normalization check and necessary normalizations are performed. When False (default) no normalization check is performed. The correctness of the result is guaranteed only if the input data is in so-called FCD form (see users manual for more info). When True, an incremental check is performed to see whether the input data is in FCD form. If the data is not in FCD form, incremental NFD normalization is performed.

Strength Strength 
HiraganaQuaternaryMode Bool

When turned on, this attribute positions Hiragana before all non-ignorables on quaternary level. This is a sneaky way to produce JIS sort order.

Numeric Bool

When enabled, this attribute generates a collation key for the numeric value of substrings of digits. This is a way to get '100' to sort after '2'.

Instances

Instances details
Show Attribute Source # 
Instance details

Defined in Data.Text.ICU.Collate

NFData Attribute Source # 
Instance details

Defined in Data.Text.ICU.Collate

Methods

rnf :: Attribute -> () #

Eq Attribute Source # 
Instance details

Defined in Data.Text.ICU.Collate

data AlternateHandling Source #

Control the handling of variable weight elements.

Constructors

NonIgnorable

Treat all codepoints with non-ignorable primary weights in the same way.

Shifted

Cause codepoints with primary weights that are equal to or below the variable top value to be ignored on primary level and moved to the quaternary level.

data CaseFirst Source #

Control the ordering of upper and lower case letters.

Constructors

UpperFirst

Force upper case letters to sort before lower case.

LowerFirst

Force lower case letters to sort before upper case.

data Strength Source #

The strength attribute. The usual strength for most locales (except Japanese) is tertiary. Quaternary strength is useful when combined with shifted setting for alternate handling attribute and for JIS x 4061 collation, when it is used to distinguish between Katakana and Hiragana (this is achieved by setting HiraganaQuaternaryMode mode to True). Otherwise, quaternary level is affected only by the number of non ignorable codepoints in the string. Identical strength is rarely useful, as it amounts to codepoints of the NFD form of the string.

Instances

Instances details
Bounded Strength Source # 
Instance details

Defined in Data.Text.ICU.Collate

Enum Strength Source # 
Instance details

Defined in Data.Text.ICU.Collate

Show Strength Source # 
Instance details

Defined in Data.Text.ICU.Collate

NFData Strength Source # 
Instance details

Defined in Data.Text.ICU.Collate

Methods

rnf :: Strength -> () #

Eq Strength Source # 
Instance details

Defined in Data.Text.ICU.Collate

Functions

open Source #

Arguments

:: LocaleName

The locale containing the required collation rules.

-> IO MCollator 

Open a Collator for comparing strings.

openRules Source #

Arguments

:: Text

A string describing the collation rules.

-> Maybe Bool

The normalization mode: One of 'Just False' (expect the text to not need normalization) 'Just True' (normalize), or Nothing (set the mode according to the rules)

-> Maybe Strength

The default collation strength; one of 'Just Primary', 'Just Secondary', 'Just Tertiary', 'Just Identical', Nothing (default strength) - can be also set in the rules.

-> IO MCollator 

Produce a Collator instance according to the rules supplied.

collate :: MCollator -> Text -> Text -> IO Ordering Source #

Compare two strings.

collateIter :: MCollator -> CharIterator -> CharIterator -> IO Ordering Source #

Compare two CharIterators.

If either iterator was constructed from a ByteString, it does not need to be copied or converted internally, so this function can be quite cheap.

Utility functions

getRules :: MCollator -> IO Text Source #

Get the rules of an MCollator attribute.

getAttribute :: MCollator -> Attribute -> IO Attribute Source #

Get the value of an MCollator attribute.

It is safe to provide a dummy argument to an Attribute constructor when using this function, so the following will work:

getAttribute mcol (NormalizationMode undefined)

setAttribute :: MCollator -> Attribute -> IO () Source #

Set the value of an MCollator attribute.

sortKey :: MCollator -> Text -> IO ByteString Source #

Create a key for sorting the Text using the given Collator. The result of comparing two ByteStrings that have been transformed with sortKey will be the same as the result of collate on the two untransformed Texts.

clone :: MCollator -> IO MCollator Source #

Make a copy of a mutable MCollator. Subsequent changes to the input MCollator will not affect the state of the returned MCollator.

freeze :: MCollator -> IO Collator Source #

Make a safe copy of a mutable MCollator for use in pure code. Subsequent changes to the MCollator will not affect the state of the returned Collator.