morfeusz-0.4.2: Bindings to the morphological analyser Morfeusz

Safe HaskellNone

NLP.Morfeusz

Contents

Description

The module provides the analyse wrapper function which uses the Morfeusz library for morphosyntactic analysis. The result is represented as a directed acylic graph (DAG) with Token labeled edges. The DAG representation is needed when the input word has multiple correct segmentations.

>>> :m NLP.Morfeusz
>>> :set -XOverloadedStrings
>>> mapM_ print . analyse False $ "miałem"
Edge {from = 0, to = 1, label = Token {orth = "mia\322", interps = [Interp {base = "mie\263", msd = "praet:sg:m1.m2.m3:imperf"}]}}
Edge {from = 0, to = 2, label = Token {orth = "mia\322em", interps = [Interp {base = "mia\322", msd = "subst:sg:inst:m3"}]}}
Edge {from = 1, to = 2, label = Token {orth = "em", interps = [Interp {base = "by\263", msd = "aglt:sg:pri:imperf:wok"}]}}

You can use the paths function to extract all paths from the resultant DAG and, if you are not interested in all possible segmentations, just take the first of possible paths:

>>> mapM_ print . paths . analyse False $ "miałem"
[Token {orth = "mia\322em", interps = [Interp {base = "mia\322", msd = "subst:sg:inst:m3"}]}]
[Token {orth = "mia\322", interps = [Interp {base = "mie\263", msd = "praet:sg:m1.m2.m3:imperf"}]},Token {orth = "em", interps = [Interp {base = "by\263", msd = "aglt:sg:pri:imperf:wok"}]}]
>>> mapM_ print . head . paths . analyse False $ "miałem"
Token {orth = "mia\322em", interps = [Interp {base = "mia\322", msd = "subst:sg:inst:m3"}]}

Synopsis

Types

type DAG a = [Edge a]Source

A DAG with annotated edges.

data Edge a Source

A directed edge with label of type a between nodes of type Int.

Constructors

Edge 

Fields

from :: Int
 
to :: Int
 
label :: a
 

Instances

Functor Edge 
Eq a => Eq (Edge a) 
(Eq (Edge a), Ord a) => Ord (Edge a) 
Show a => Show (Edge a) 
Storable (Edge RawInterp)

We only provide the peek functionality.

data Token Source

A token with a list of recognized interpretations. If the list of interpretations is empty, the token is unknown to the Morfeusz.

Constructors

Token 

Fields

orth :: Text
 
interps :: [Interp]
 

Instances

data Interp Source

An interpretation of the word.

Constructors

Interp 

Fields

base :: Text
 
msd :: Text
 

Instances

Sentence analysis

type KeepSpaces = BoolSource

Keep spaces in the analysis output.

analyse :: KeepSpaces -> Text -> DAG TokenSource

Analyse the input sentence and return the result as a DAG of tokens.

Utilities

paths :: DAG a -> [[a]]Source

Retrieve all paths from DAG root to leaves.