-- !!! THIS FILE ASSUMES UTF-8 ENCODING !!! {- Defaults for scoring whenever more explicit rules do not match. -} -- How to score a gap in a sequence. GapLinear -4 -- Later on we want affine gap scoring, it's already here but not used GapOpen -4 GapExtend -4 -- Some types of grammars have differently scored prefixes and suffixes PrefixSuffixOpen -1 PrefixSuffixExtend 0 -- This score is used when we align the same character but it's one we didn't -- specify in our sets Match 1 -- And finally how to score a mismatch that doesn't fit into the above Mismatch -999999 {- Provide a number of character groups. They will be used to define similar and equal characters below. -} CharGroup C b c d f g h j k m n p q s t v w x y z CharGroup L l r CharGroup VowelA A a Á á Å å Ä ä Æ æ à ã A\' a\' CharGroup VowelE E e É é E\' e\' CharGroup VowelI I i Í í I\' i\' CharGroup VowelO o Ó Ö O\' o\' CharGroup VowelU u ú ü u\' CharGroup Vowelx a å ä ã æ æ á a\' e é e\' ə ɐ æ œ í í i o ö ó u ú ú ü CharGroup Vowel $VowelA $VowelE $VowelI $VowelO $VowelU $Vowelx CharGroup V {ToLower,ToUpper} $Vowel -- we'll set up some characters that should be ignored CharGroup Ignored ' " ! . , ; 1 2 3 4 5 6 7 8 9 0 {- Set up similarity scores between groups. These should go from lower to higher scores, as latter rules overwrite earlier rules. -} Similarity $L $L -1 -- these two rules do the same, because 'V' is given the {ToLower,ToUpper} -- extension in CharGroup. Similarity $V $V 0 Similarity {ToLower,ToUpper} $Vowel {ToLower,ToUpper} $Vowel 0 Similarity $C $C 1 -- these are not the same character, but treated as highly similar, so we end -- up giving a high score. Similarity $VowelA $VowelA 2 {- And now, we just look at individual characters, maybe qualified. Now, each character is only equal to itself, we just use the character groups to provide easy enumeration. However, the extensions {ToLower,ToUpper} expand equality. Say, V = {A, ä}. Then Equality V 2 means A=A and ä=ä with score 2. Including {ToLower,ToUpper} yields A=A, A=a, a=A, a=a and Ä=Ä, Ä=ä, ä=Ä, ä=ä with score 2; but NOT A=ä! -} Equality {ToLower,ToUpper} $V 2 Equality {ToLower,ToUpper} $L 3 Equality {ToLower,ToUpper} $C 4 {- These characters should be *ignored* when gaps are considered. They are essentially "not in the word" or "free", but the scores can be set, of course. Fst and Snd designate which of the two tapes we talk about. If this is missing, the ignored characters are ignored on both tapes. -} IgnoredFst {ToLower,ToUpper} $Ignored 0 IgnoredSnd {ToLower,ToUpper} $Ignored 0 Ignored $Ignored 0