Safe Haskell	None
Language	Haskell2010

Pinchot

Contents

Intervals
Non-empty
Production rules
- Errors
Qualifiers
Creating data types corresponding to grammars
- Wrappers and optics
Creating Earley grammars
Terminalizers
Locations
Running parsers with locations

Description

Pinchot provides a simple language that you use to write a Haskell values that describes a context-free grammar. You can then use Template Haskell to take this value and generate a series of data types that correspond to your context-free grammar. You can also use Template Haskell to create an Earley parser that will parse all strings in the context-free language. Other handy utilities generate functions that will return all the terminal characters from a parsed production rule. It is also possible to easily determine the location (line, column, and position) of any parsed production or character.

Everything you typically need should be in this module.

For examples, please consult Pinchot.Examples.

You should also look at the BNF Converter.

http://bnfc.digitalgrammars.com

Primary differences between BNFC and this library:

the BNF Converter works as a standalone binary that parses text BNF files. With Pinchot you specify your grammar in Haskell.
the BNF Converter currently generates many more outputs, such as LaTeX. It also generates code for many languages. Pinchot only works in Haskell.
the BNF Converter generates input for parser generators like Happy and Bison. Pinchot currently only generates input for the Haskell Earley library.
Pinchot integrates seamlessly into Haskell using Template Haskell.
the BNF Converter is GPL. Pinchot is BSD3.

Pinchot grows and harvests syntax trees, so it is named after Gifford Pinchot, first chief of the United States Forest Service.

Synopsis

Intervals

data Intervals a Source

Groups of terminals. Create an Intervals using include, exclude, solo and pariah. Combine Intervals using mappend, which will combine both the included and excluded terminal symbols from each operand.

Instances

Functor Intervals Source
Eq a => Eq (Intervals a) Source
Ord a => Ord (Intervals a) Source
Show a => Show (Intervals a) Source
Monoid (Intervals a) Source
Lift a => Lift (Intervals a) Source

include :: a -> a -> Intervals a Source

Include a range of symbols in the Intervals. For instance, to include the characters a, b, and c, use include a c. Example: rLetter.

exclude :: a -> a -> Intervals a Source

Exclude a range of symbols in the Intervals. Each symbol that is excluded is not included in the Intervals, even if it is also included.

solo :: a -> Intervals a Source

Include a single symbol. Example: rNorth.

pariah :: a -> Intervals a Source

Exclude a single symbol.

Non-empty

data NonEmpty a Source

A non-empty sequence.

Constructors

NonEmpty
Fields _front :: a The first item _rest :: Seq a All remaining items

Instances

Monad NonEmpty Source
Functor NonEmpty Source
Applicative NonEmpty Source
Foldable NonEmpty Source
Traversable NonEmpty Source
Eq a => Eq (NonEmpty a) Source
Ord a => Ord (NonEmpty a) Source
Show a => Show (NonEmpty a) Source

front :: forall a. Lens' (NonEmpty a) a Source

rest :: forall a. Lens' (NonEmpty a) (Seq a) Source

flatten :: NonEmpty a -> Seq a Source

Convert a NonEmpty to a Seq.

seqToNonEmpty :: Seq a -> Maybe (NonEmpty a) Source

Converts a non-empty Seq to a NonEmpty; Nothing if the Seq is empty.

prependSeq :: Seq a -> NonEmpty a -> NonEmpty a Source

Prepends a Seq to a NonEmpty.

appendSeq :: NonEmpty a -> Seq a -> NonEmpty a Source

Appends a Seq to a NonEmpty.

append :: NonEmpty a -> NonEmpty a -> NonEmpty a Source

Associative operation that appends to NonEmpty.

singleton :: a -> NonEmpty a Source

Place a single item at the head of the NonEmpty.

Production rules

type RuleName = String Source

Type synonym for the name of a production rule. This will be the name of the type constructor for the corresponding type that will be created, so this must be a valid Haskell type constructor name. Typically each context-free grammar that you write will have several production rules; you will want to make sure that every RuleName that you create for a single context-free grammar is unique. However, Pinchot currently does not check for uniqueness. If you use names that are not unique, GHC will give an error message when you try to splice the resulting code, as the data types will not have unique names.

data Rule t Source

A single production rule.

Instances

Eq t => Eq (Rule t) Source
Ord t => Ord (Rule t) Source
Show t => Show (Rule t) Source

type BranchName = String Source

Type synonym the the name of an alternative in a nonTerminal. This name must not conflict with any other data constructor in your grammar.

terminal Source

Arguments

:: RuleName
-> Intervals t	Valid terminal symbols
-> Rule t

Creates a terminal production rule. Example: rLetter.

nonTerminal Source

Arguments

:: RuleName	Will be used for the name of the resulting type
-> Seq (BranchName, Seq (Rule t))	Branches of the non-terminal production rule. This `Seq` must have at least one element; otherwise, an error will result.
-> Rule t

Creates a non-terminal production rule. This is the most flexible way to create non-terminals. You can even create a non-terminal that depends on itself. Example: rLetters.

union Source

Arguments

:: RuleName	Will be used for the name of the resulting type
-> Seq (Rule t)	List of branches. There must be at least one branch; otherwise a compile-time error will occur.
-> Rule t

Creates a non-terminal production rule where each branch has only one production. This function ultimately uses nonTerminal. Each branch is assigned a BranchName that is

RULE_NAME'PRODUCTION_NAME

where RULE_NAME is the name of the rule itself, and PRODUCTION_NAME is the rule name for what is being produced.

Example: rDirection.

terminals Source

Arguments

:: RuleName	Will be used for the name of the resulting type, and for the name of the sole data constructor
-> String
-> Rule Char

Creates a production for a sequence of terminals. Useful for parsing specific words. Ultimately this is simply a function that creates a Rule using the record function.

In terminals n s, For each Char in the String, a Rule is created whose RuleName is n followed by an apostrophe followed by the index of the position of the Char.

Examples: rBoulevard.

wrap Source

Arguments

:: RuleName	Will be used for the name of the resulting data type, and for the name of the sole data constructor
-> Rule t	The resulting `Rule` simply wraps this `Rule`.
-> Rule t

Creates a newtype wrapper. Example: rCity.

record Source

Arguments

:: RuleName	The name of this rule, which is used both as the type name and for the name of the sole data constructor
-> Seq (Rule t)	The right-hand side of this rule. This sequence can be empty, which results in an epsilon production.
-> Rule t

Creates a new non-terminal production rule with only one alternative where each field has a record name. The name of each record is:

_r'RULE_NAME'INDEX'FIELD_TYPE

where RULE_NAME is the name of this rule, INDEX is the index number for this field (starting with 0), and FIELD_TYPE is the type of the field itself.

Currently there is no way to change the names of the record fields.

Example: rAddress.

opt :: Rule t -> Rule t Source

Creates a rule for a production that optionally produces another rule. The name for the created Rule is the name of the Rule to which this function is applied, with 'Opt appended to the end.

Example: rOptNewline.

star :: Rule t -> Rule t Source

Creates a rule for the production of a sequence of other rules. The name for the created Rule is the name of the Rule to which this function is applied, with 'Star appended.

Example: rPreSpacedWord.

plus :: Rule t -> Rule t Source

Creates a rule for a production that appears at least once. The name for the created Rule is the name of the Rule to which this function is applied, with 'Plus appended.

Example: rDigits.

Errors

label :: Rule t -> String -> Rule t Source

Name a Rule for use in error messages. If you do not name a rule using this combinator, the rule's type name will be used in error messages.

(<?>) :: Rule t -> String -> Rule t infixr 0 Source

Infix synonym for label. Example: rDigit.

Qualifiers

type Qualifier = String Source

Many functions take an argument that holds the name qualifier for the module that contains the data types created by applying a function such as syntaxTrees or earleyProduct.

You will have to make sure that these data types are in scope. The spliced Template Haskell code has to know where to look for these data types. If you did an unqualified import or if the types are in the same module as the function that takes a Qualifier argument, just pass the empty string here. If you did a qualified import, use the appropriate qualifier here.

For example, if you used import qualified MyAst, pass "MyAst" here. If you used import qualified Data.MyLibrary.MyAst as MyLibrary.MyAst, pass "MyLibrary.MyAst" here.

I recommend that you always create a new module and that all you do in that module is apply syntaxTrees or earleyProduct, and that you then perform an import qualified to bring those names into scope in the module in which you use a function that takes a Qualifier argument. This avoids unlikely, but possible, issues that could otherwise arise due to naming conflicts.

Creating data types corresponding to grammars

syntaxTrees Source

Arguments

:: Name	Name of terminal type. Typically you will get this from the Template Haskell quoting mechanism, e.g. `''Char`.
-> [Name]	What to derive, e.g. `[''Eq, ''Ord, ''Show]`
-> Seq (Rule t)
-> DecsQ

Makes the top-level declarations for each given Rule and for all ancestors of the given Rules. Since ancestors are included, you can get the entire tree of types that you need by applying this function to a single start symbol. Example: Pinchot.Examples.SyntaxTrees.

allRulesRecord Source

Arguments

:: Qualifier	Qualifier for data types corresponding to those created from the `Rule`s
-> Name	Name of terminal type. Typically you will get this through the Template Haskell quoting mechanism, such as `''Char`.
-> Seq (Rule t)	A record is created that holds a value for each `Rule` in the `Seq`, as well as for every ancestor of these `Rule`s.
-> DecsQ	When spliced, this will create a single declaration that is a record with the name `Productions`. a'NAME where `NAME` is the name of the type. Don't count on these records being in any particular order.

Creates a record data type that holds a value of type

Prod r String (t, a) (p t a)

where

r is left universally quantified
t is the token type (often Char)
a is any additional information about each token (often Loc)
p is the type of the particular production

This always creates a single product type whose name is Productions; currently the name cannot be configured.

Example: Pinchot.Examples.SyntaxTrees.

Wrappers and optics

wrappedInstances :: Seq (Rule t) -> DecsQ Source

Creates a Wrapped instance for each Rule and its ancestors, if there is an instance. Only terminal, wrap, opt, star, and plus get instances of Wrapped.

This must be spliced in the same module in which the syntax tree types are created; this way, no orphans are created. Since ancestors are included, you can get the entire tree of types that you need by applying this function to a single start symbol.

Example: Pinchot.Examples.SyntaxTrees.

rulesToOptics Source

Arguments

:: Lift t
=> Qualifier	Qualifier for module containing the data types that will get optics
-> Name	Type name for the terminal
-> Seq (Rule t)
-> Q [Dec]

Creates optics declarations for a Rule, if optics can be made for the Rule:

terminal gets a single Prism
nonTerminal gets a Prism for each constructor
record gets a single Lens
wrap, opt, star, and plus do not get optics.

Each rule in the sequence of Rule, as well as all ancestors of those Rules, will be handled.

Example: Pinchot.Examples.RulesToOptics.

Creating Earley grammars

earleyGrammarFromRule Source

Arguments

:: Lift t
=> Qualifier	Module prefix holding the data types created with `syntaxTrees`
-> Rule t	Create a grammar for this `Rule`
-> Q Exp

Creates an expression that has type

Grammar r (Prod r String (c, a) (p c a))

where r is left universally quantified; c is the terminal type (often Char), a is arbitrary metadata about each token (often Loc) and p is the data type corresponding to the given Rule.

Example: addressGrammar.

earleyProduct Source

Arguments

:: Lift t
=> Qualifier	Qualifier for data types corresponding to those created from the `Rule`s
-> Qualifier	Qualifier for the type created with `allRulesRecord`
-> Seq (Rule t)	Creates an Earley grammar that contains a `Prod` for each `Rule` in this `Seq`, as well as all the ancestors of these `Rule`s.
-> ExpQ	When spliced, `earleyProduct` creates an expression whose type is `Grammar r (Productions r t a)`, where `Productions` is the type created by `allRulesRecord`; `r` is left universally quantified; `t` is the token type (often `Char`), and `a` is any additional information about each token (often `Loc`).

Creates a Grammar that contains a Prod for every given Rule and its ancestors. Example: addressAllProductions.

Terminalizers

terminalizeRuleExp :: Qualifier -> Rule t -> Q Exp Source

For the given rule, returns an expression that has type of either

Production a -> Seq (t, a)

or

Production a -> NonEmpty (t, a)

where Production is the production corresponding to the given Rule, and t is the terminal token type. NonEmpty is returned for productions that must always contain at least one terminal symbol; for those that can be empty, Seq is returned.

Example: terminalizeAddress.

terminalizers Source

Arguments

:: Qualifier	Qualifier for the module containing the data types created from the `Rule`s
-> Name	Name of terminal type. Typically you will get this through the Template Haskell quoting mechanism, such as `''Char`.
-> Seq (Rule t)
-> Q [Dec]

For all the given rules and their ancestors, creates declarations that reduce the rule and all its ancestors to terminal symbols. Each rule gets a declaration named t'RULE_NAME where RULE_NAME is the name of the rule. The type of the declaration is either

Production a -> Seq (t, a)

or

Production a -> NonEmpty (t, a)

where Production is the production corresponding to the given Rule, t is the terminal token type (often Char), and a is arbitrary metadata about each token (often Loc). NonEmpty is returned for productions that must always contain at least one terminal symbol; for those that can be empty, Seq is returned.

Example: Pinchot.Examples.Terminalize.

Locations

data Loc Source

A location.

Constructors

Loc
Fields _line :: Int _col :: Int _pos :: Int

Instances

Eq Loc Source
Ord Loc Source
Read Loc Source
Show Loc Source

line :: Lens' Loc Int Source

col :: Lens' Loc Int Source

pos :: Lens' Loc Int Source

locations :: FoldableLL full Char => full -> Seq (Char, Loc) Source

Takes any ListLike value based on Char (Seq, Text, String, etc.) and creates a Seq which pairs each Char with its location. Example: locatedFullParses.

noLocations :: FoldableLL full item => full -> Seq (item, ()) Source

Breaks a ListLike into a Seq but does not assign locations.

Running parsers with locations

locatedFullParses Source

Arguments

:: FoldableLL full Char
=> (forall r. Grammar r (Prod r String (Char, Loc) (p Char Loc)))	Earley grammar with production that you want to parse.
-> full	Source text, e.g. `String`, `Text`, etc.
-> ([p Char Loc], Report String (Seq (Char, Loc)))	A list of successful parses that when to the end of the source string, along with the Earley report showing possible errors.

Obtains all full Earley parses from a given input string, after assigning a location to every Char. Example: address.