Copyright | © 2016-2017 George Steel and Peter Jurgec |
---|---|
License | GPL-2+ |
Maintainer | george.steel@gmail.com |
Safe Haskell | None |
Language | Haskell2010 |
Functions for saving and loading lexicons and ClassGlob
constraint grammars in standard formats.
- segmentFiero :: Set String -> String -> [String]
- joinFiero :: Set String -> [String] -> String
- data LexRow = LexRow [String] Int
- parseWordlist :: Set String -> Text -> [LexRow]
- collateWordlist :: Set String -> Text -> [LexRow]
- serWordlist :: Set String -> [LexRow] -> Text
- serWordlistSpaced :: [LexRow] -> Text
- data PhonoGrammar = PhonoGrammar {
- lengthDist :: Array Length Int
- constraintSet :: [ClassGlob]
- weightSet :: Vec
- parseGrammar :: Text -> Maybe PhonoGrammar
- serGrammarRules :: [ClassGlob] -> Vec -> Text
- serGrammar :: PhonoGrammar -> Text
Documentation
:: Set String | All possible segments |
-> String | Raw text |
-> [String] | Segmented text segmentFiero [] = error "Empty segment list." |
Given a set of possible segments and a string, break a string into segments. Uses the rules in Fiero orthography (a phonetic writing system using ASCII characters) where the longest possible match is always taken and apostrophes are used as a digraph break.
Joins segments together using Fiero rules. Inserts apostrophes where necerssary.
parseWordlist :: Set String -> Text -> [LexRow] Source #
Parse a lexicon from a file. Segmentation of a word uses fiero rules (which will also decode space-separated segments and single-character segments). Words may optionally be followed by a tab character and an integer indicating frequency (1 by default).
collateWordlist :: Set String -> Text -> [LexRow] Source #
Collate a list of words and frequencies from raw phonetic text.
serWordlist :: Set String -> [LexRow] -> Text Source #
Serializes a list of words and frequerncies to a string for decoding with parseWordlist
. Connects segments using Fiero rules.
serWordlistSpaced :: [LexRow] -> Text Source #
Serializes a list of words and frequerncies to a string for decoding with parseWordlist
. Puts spaces between segments.
data PhonoGrammar Source #
Reperesentation of a ClassGlob
grammar.
PhonoGrammar | |
|
parseGrammar :: Text -> Maybe PhonoGrammar Source #
Parse a grammar from a file. Blank lines ans lines begining with # are ignored. The first regular line must contain a list of (Length,Int) pairs and subsequent lines must contain a weight followed by a ClassGlob.
serGrammarRules :: [ClassGlob] -> Vec -> Text Source #
Serialize a grammar without length distribution
serGrammar :: PhonoGrammar -> Text Source #
Serialize a grammar including length distribution