Safe Haskell	None
Language	Haskell2010

Text.Earley

Contents

Context-free grammars
Derived operators
Parsing
Recognition
Language generation

Description

Parsing all context-free grammars using Earley's algorithm.

Synopsis

data Prod r e t a
terminal :: (t -> Maybe a) -> Prod r e t a
(<?>) :: Prod r e t a -> e -> Prod r e t a
data Grammar r a
rule :: Prod r e t a -> Grammar r (Prod r e t a)
satisfy :: (t -> Bool) -> Prod r e t t
token :: Eq t => t -> Prod r e t t
namedToken :: Eq t => t -> Prod r t t t
list :: Eq t => [t] -> Prod r e t [t]
listLike :: (Eq t, ListLike i t) => i -> Prod r e t i
data Report e i = Report {
- position :: Int
- expected :: [e]
- unconsumed :: i
}
data Result s e i a
- = Ended (Report e i)
- | Parsed (ST s [a]) Int i (ST s (Result s e i a))
type Parser e i a = forall s. i -> ST s (Result s e i a)
parser :: ListLike i t => (forall r. Grammar r (Prod r e t a)) -> Parser e i a
allParses :: Parser e i a -> i -> ([(a, Int)], Report e i)
fullParses :: ListLike i t => Parser e i a -> i -> ([a], Report e i)
report :: Parser e i a -> i -> Report e i
type Generator t a = forall s. ST s (Result s t a)
generator :: (forall r. Grammar r (Prod r e t a)) -> [t] -> Generator t a
language :: Generator t a -> [(a, [t])]
upTo :: Int -> Generator t a -> [(a, [t])]
exactly :: Int -> Generator t a -> [(a, [t])]

Context-free grammars

data Prod r e t a Source #

A production.

The type parameters are:

a: The return type of the production.

t for terminal: The type of the terminals that the production operates on.

e for expected: The type of names, used for example to report expected tokens.

r for rule: The type of a non-terminal. This plays a role similar to the s in the type ST s a. Since the parser function expects the r to be universally quantified, there is not much to do with this parameter other than leaving it universally quantified.

As an example, Prod r String Char Int is the type of a production that returns an Int, operates on (lists of) characters and reports String names.

Most of the functionality of Prods is obtained through its instances, e.g. Functor, Applicative, and Alternative.

Instances

Functor (Prod r e t) Source #
Instance details Defined in Text.Earley.Grammar Methods fmap :: (a -> b) -> Prod r e t a -> Prod r e t b # (<$) :: a -> Prod r e t b -> Prod r e t a #
Applicative (Prod r e t) Source #
Instance details Defined in Text.Earley.Grammar Methods pure :: a -> Prod r e t a # (<>) :: Prod r e t (a -> b) -> Prod r e t a -> Prod r e t b # liftA2 :: (a -> b -> c) -> Prod r e t a -> Prod r e t b -> Prod r e t c # (>) :: Prod r e t a -> Prod r e t b -> Prod r e t b # (<*) :: Prod r e t a -> Prod r e t b -> Prod r e t a #
Alternative (Prod r e t) Source #
Instance details Defined in Text.Earley.Grammar Methods empty :: Prod r e t a # (<\|>) :: Prod r e t a -> Prod r e t a -> Prod r e t a # some :: Prod r e t a -> Prod r e t [a] # many :: Prod r e t a -> Prod r e t [a] #
(IsString t, Eq t, a ~ t) => IsString (Prod r e t a) Source #	String literals can be interpreted as `Terminal`s that match that string. `>>>` `:set -XOverloadedStrings``>>>` `import Data.Text (Text)``>>>` `let determiner = "the" <\|> "a" <\|> "an" :: Prod r e Text Text`
Instance details Defined in Text.Earley.Grammar Methods fromString :: String -> Prod r e t a #
Semigroup a => Semigroup (Prod r e t a) Source #	Lifted instance: `(<>) = liftA2 (<>)`
Instance details Defined in Text.Earley.Grammar Methods (<>) :: Prod r e t a -> Prod r e t a -> Prod r e t a # sconcat :: NonEmpty (Prod r e t a) -> Prod r e t a # stimes :: Integral b => b -> Prod r e t a -> Prod r e t a #
Monoid a => Monoid (Prod r e t a) Source #	Lifted instance: `mempty = pure mempty`
Instance details Defined in Text.Earley.Grammar Methods mempty :: Prod r e t a # mappend :: Prod r e t a -> Prod r e t a -> Prod r e t a # mconcat :: [Prod r e t a] -> Prod r e t a #

terminal :: (t -> Maybe a) -> Prod r e t a Source #

Match a token for which the given predicate returns Just a, and return the a.

(<?>) :: Prod r e t a -> e -> Prod r e t a infixr 0 Source #

A named production (used for reporting expected things).

data Grammar r a Source #

A context-free grammar.

The type parameters are:

a: The return type of the grammar (often a Prod).

r for rule: The type of a non-terminal. This plays a role similar to the s in the type ST s a. Since the parser function expects the r to be universally quantified, there is not much to do with this parameter other than leaving it universally quantified.

Most of the functionality of Grammars is obtained through its instances, e.g. Monad and MonadFix. Note that GHC has syntactic sugar for MonadFix: use {-# LANGUAGE RecursiveDo #-} and mdo instead of do.

Instances

Monad (Grammar r) Source #
Instance details Defined in Text.Earley.Grammar Methods (>>=) :: Grammar r a -> (a -> Grammar r b) -> Grammar r b # (>>) :: Grammar r a -> Grammar r b -> Grammar r b # return :: a -> Grammar r a # fail :: String -> Grammar r a #
Functor (Grammar r) Source #
Instance details Defined in Text.Earley.Grammar Methods fmap :: (a -> b) -> Grammar r a -> Grammar r b # (<$) :: a -> Grammar r b -> Grammar r a #
MonadFix (Grammar r) Source #
Instance details Defined in Text.Earley.Grammar Methods mfix :: (a -> Grammar r a) -> Grammar r a #
Applicative (Grammar r) Source #
Instance details Defined in Text.Earley.Grammar Methods pure :: a -> Grammar r a # (<>) :: Grammar r (a -> b) -> Grammar r a -> Grammar r b # liftA2 :: (a -> b -> c) -> Grammar r a -> Grammar r b -> Grammar r c # (>) :: Grammar r a -> Grammar r b -> Grammar r b # (<*) :: Grammar r a -> Grammar r b -> Grammar r a #

rule :: Prod r e t a -> Grammar r (Prod r e t a) Source #

Create a new non-terminal by giving its production.

Derived operators

satisfy :: (t -> Bool) -> Prod r e t t Source #

Match a token that satisfies the given predicate. Returns the matched token.

token :: Eq t => t -> Prod r e t t Source #

Match a single token.

namedToken :: Eq t => t -> Prod r t t t Source #

Match a single token and give it the name of the token.

list :: Eq t => [t] -> Prod r e t [t] Source #

Match a list of tokens in sequence.

listLike :: (Eq t, ListLike i t) => i -> Prod r e t i Source #

Match a ListLike of tokens in sequence.

Parsing

data Report e i Source #

A parsing report, which contains fields that are useful for presenting errors to the user if a parse is deemed a failure. Note however that we get a report even when we successfully parse something.

Constructors

Report
Fields position :: Int The final position in the input (0-based) that the parser reached. expected :: [e] The named productions processed at the final position. unconsumed :: i The part of the input string that was not consumed, which may be empty.

Instances

(Eq e, Eq i) => Eq (Report e i) Source #
Instance details Defined in Text.Earley.Parser.Internal Methods (==) :: Report e i -> Report e i -> Bool # (/=) :: Report e i -> Report e i -> Bool #
(Ord e, Ord i) => Ord (Report e i) Source #
Instance details Defined in Text.Earley.Parser.Internal Methods compare :: Report e i -> Report e i -> Ordering # (<) :: Report e i -> Report e i -> Bool # (<=) :: Report e i -> Report e i -> Bool # (>) :: Report e i -> Report e i -> Bool # (>=) :: Report e i -> Report e i -> Bool # max :: Report e i -> Report e i -> Report e i # min :: Report e i -> Report e i -> Report e i #
(Read e, Read i) => Read (Report e i) Source #
Instance details Defined in Text.Earley.Parser.Internal Methods readsPrec :: Int -> ReadS (Report e i) # readList :: ReadS [Report e i] # readPrec :: ReadPrec (Report e i) # readListPrec :: ReadPrec [Report e i] #
(Show e, Show i) => Show (Report e i) Source #
Instance details Defined in Text.Earley.Parser.Internal Methods showsPrec :: Int -> Report e i -> ShowS # show :: Report e i -> String # showList :: [Report e i] -> ShowS #

data Result s e i a Source #

The result of a parse.

Constructors

Ended (Report e i)	The parser ended.
Parsed (ST s [a]) Int i (ST s (Result s e i a))	The parser parsed a number of `a`s. These are given as a computation, `ST s [a]` that constructs the `a`s when run. We can thus save some work by ignoring this computation if we do not care about the results. The `Int` is the position in the input where these results were obtained, the `i` the rest of the input, and the last component is the continuation.

Instances

Functor (Result s e i) Source #
Instance details Defined in Text.Earley.Parser.Internal Methods fmap :: (a -> b) -> Result s e i a -> Result s e i b # (<$) :: a -> Result s e i b -> Result s e i a #

type Parser e i a = forall s. i -> ST s (Result s e i a) Source #

parser :: ListLike i t => (forall r. Grammar r (Prod r e t a)) -> Parser e i a Source #

Create a parser from the given grammar.

allParses :: Parser e i a -> i -> ([(a, Int)], Report e i) Source #

Return all parses from the result of a given parser. The result may contain partial parses. The Ints are the position at which a result was produced.

The elements of the returned list of results are sorted by their position in ascending order. If there are multiple results at the same position they are returned in an unspecified order.

fullParses :: ListLike i t => Parser e i a -> i -> ([a], Report e i) Source #

Return all parses that reached the end of the input from the result of a given parser.

If there are multiple results they are returned in an unspecified order.

Recognition

report :: Parser e i a -> i -> Report e i Source #

See e.g. how far the parser is able to parse the input string before it fails. This can be much faster than getting the parse results for highly ambiguous grammars.

Language generation

type Generator t a = forall s. ST s (Result s t a) Source #

generator :: (forall r. Grammar r (Prod r e t a)) -> [t] -> Generator t a Source #

Create a language generator for given grammar and list of allowed tokens.

language :: Generator t a -> [(a, [t])] Source #

Run a generator, returning all members of the language.

The members are returned as parse results paired with the list of tokens used to produce the result. The elements of the returned list of results are sorted by their length in ascending order. If there are multiple results of the same length they are returned in an unspecified order.

upTo :: Int -> Generator t a -> [(a, [t])] Source #

upTo n gen runs the generator gen, returning all members of the language that are of length less than or equal to n.

The members are returned as parse results paired with the list of tokens used to produce the result. The elements of the returned list of results are sorted by their length in ascending order. If there are multiple results of the same length they are returned in an unspecified order.

exactly :: Int -> Generator t a -> [(a, [t])] Source #

exactly n gen runs the generator gen, returning all members of the language that are of length equal to n.

The members are returned as parse results paired with the list of tokens used to produce the result. If there are multiple results they are returned in an unspecified order.