pinchot-0.16.0.0: Write grammars, not parsers

Safe HaskellNone
LanguageHaskell2010

Pinchot

Contents

Description

Pinchot provides a simple language that you use to write a Haskell values that describes a context-free grammar. You can then use Template Haskell to take this value and generate a series of data types that correspond to your context-free grammar. You can also use Template Haskell to create an Earley parser that will parse all strings in the context-free language. Other handy utilities generate functions that will return all the terminal characters from a parsed production rule. It is also possible to easily determine the location (line, column, and position) of any parsed production or character.

Everything you typically need should be in this module.

For examples, please consult Pinchot.Examples.

You should also look at the BNF Converter.

http://bnfc.digitalgrammars.com

Primary differences between BNFC and this library:

  • the BNF Converter works as a standalone binary that parses text BNF files. With Pinchot you specify your grammar in Haskell.
  • the BNF Converter currently generates many more outputs, such as LaTeX. It also generates code for many languages. Pinchot only works in Haskell.
  • the BNF Converter generates input for parser generators like Happy and Bison. Pinchot currently only generates input for the Haskell Earley library.
  • Pinchot integrates seamlessly into Haskell using Template Haskell.
  • the BNF Converter is GPL. Pinchot is BSD3.

Pinchot grows and harvests syntax trees, so it is named after Gifford Pinchot, first chief of the United States Forest Service.

Synopsis

Intervals

data Intervals a Source

Groups of terminals. Create an Intervals using include, exclude, solo and pariah. Combine Intervals using mappend, which will combine both the included and excluded terminal symbols from each operand.

include :: a -> a -> Intervals a Source

Include a range of symbols in the Intervals. For instance, to include the characters a, b, and c, use include a c. Example: rLetter.

exclude :: a -> a -> Intervals a Source

Exclude a range of symbols in the Intervals. Each symbol that is excluded is not included in the Intervals, even if it is also included.

solo :: a -> Intervals a Source

Include a single symbol. Example: rNorth.

pariah :: a -> Intervals a Source

Exclude a single symbol.

Non-empty

data NonEmpty a Source

A non-empty sequence.

Constructors

NonEmpty 

Fields

_front :: a

The first item

_rest :: Seq a

All remaining items

front :: forall a. Lens' (NonEmpty a) a Source

rest :: forall a. Lens' (NonEmpty a) (Seq a) Source

flatten :: NonEmpty a -> Seq a Source

Convert a NonEmpty to a Seq.

seqToNonEmpty :: Seq a -> Maybe (NonEmpty a) Source

Converts a non-empty Seq to a NonEmpty; Nothing if the Seq is empty.

prependSeq :: Seq a -> NonEmpty a -> NonEmpty a Source

Prepends a Seq to a NonEmpty.

appendSeq :: NonEmpty a -> Seq a -> NonEmpty a Source

Appends a Seq to a NonEmpty.

append :: NonEmpty a -> NonEmpty a -> NonEmpty a Source

Associative operation that appends to NonEmpty.

singleton :: a -> NonEmpty a Source

Place a single item at the head of the NonEmpty.

Production rules

type RuleName = String Source

Type synonym for the name of a production rule. This will be the name of the type constructor for the corresponding type that will be created, so this must be a valid Haskell type constructor name. Typically each context-free grammar that you write will have several production rules; you will want to make sure that every RuleName that you create for a single context-free grammar is unique. However, Pinchot currently does not check for uniqueness. If you use names that are not unique, GHC will give an error message when you try to splice the resulting code, as the data types will not have unique names.

data Rule t Source

A single production rule.

Instances

Eq t => Eq (Rule t) Source 
Ord t => Ord (Rule t) Source 
Show t => Show (Rule t) Source 

type BranchName = String Source

Type synonym the the name of an alternative in a nonTerminal. This name must not conflict with any other data constructor in your grammar.

terminal Source

Arguments

:: RuleName 
-> Intervals t

Valid terminal symbols

-> Rule t 

Creates a terminal production rule. Example: rLetter.

nonTerminal Source

Arguments

:: RuleName

Will be used for the name of the resulting type

-> Seq (BranchName, Seq (Rule t))

Branches of the non-terminal production rule. This Seq must have at least one element; otherwise, an error will result.

-> Rule t 

Creates a non-terminal production rule. This is the most flexible way to create non-terminals. You can even create a non-terminal that depends on itself. Example: rLetters.

union Source

Arguments

:: RuleName

Will be used for the name of the resulting type

-> Seq (Rule t)

List of branches. There must be at least one branch; otherwise a compile-time error will occur.

-> Rule t 

Creates a non-terminal production rule where each branch has only one production. This function ultimately uses nonTerminal. Each branch is assigned a BranchName that is

RULE_NAME'PRODUCTION_NAME

where RULE_NAME is the name of the rule itself, and PRODUCTION_NAME is the rule name for what is being produced.

Example: rDirection.

terminals Source

Arguments

:: RuleName

Will be used for the name of the resulting type, and for the name of the sole data constructor

-> String 
-> Rule Char 

Creates a production for a sequence of terminals. Useful for parsing specific words. Ultimately this is simply a function that creates a Rule using the record function.

In terminals n s, For each Char in the String, a Rule is created whose RuleName is n followed by an apostrophe followed by the index of the position of the Char.

Examples: rBoulevard.

wrap Source

Arguments

:: RuleName

Will be used for the name of the resulting data type, and for the name of the sole data constructor

-> Rule t

The resulting Rule simply wraps this Rule.

-> Rule t 

Creates a newtype wrapper. Example: rCity.

record Source

Arguments

:: RuleName

The name of this rule, which is used both as the type name and for the name of the sole data constructor

-> Seq (Rule t)

The right-hand side of this rule. This sequence can be empty, which results in an epsilon production.

-> Rule t 

Creates a new non-terminal production rule with only one alternative where each field has a record name. The name of each record is:

_r'RULE_NAME'INDEX'FIELD_TYPE

where RULE_NAME is the name of this rule, INDEX is the index number for this field (starting with 0), and FIELD_TYPE is the type of the field itself.

Currently there is no way to change the names of the record fields.

Example: rAddress.

opt :: Rule t -> Rule t Source

Creates a rule for a production that optionally produces another rule. The name for the created Rule is the name of the Rule to which this function is applied, with 'Opt appended to the end.

Example: rOptNewline.

star :: Rule t -> Rule t Source

Creates a rule for the production of a sequence of other rules. The name for the created Rule is the name of the Rule to which this function is applied, with 'Star appended.

Example: rPreSpacedWord.

plus :: Rule t -> Rule t Source

Creates a rule for a production that appears at least once. The name for the created Rule is the name of the Rule to which this function is applied, with 'Plus appended.

Example: rDigits.

Errors

label :: Rule t -> String -> Rule t Source

Name a Rule for use in error messages. If you do not name a rule using this combinator, the rule's type name will be used in error messages.

(<?>) :: Rule t -> String -> Rule t infixr 0 Source

Infix synonym for label. Example: rDigit.

Qualifiers

type Qualifier = String Source

Many functions take an argument that holds the name qualifier for the module that contains the data types created by applying a function such as syntaxTrees or earleyProduct.

You will have to make sure that these data types are in scope. The spliced Template Haskell code has to know where to look for these data types. If you did an unqualified import or if the types are in the same module as the function that takes a Qualifier argument, just pass the empty string here. If you did a qualified import, use the appropriate qualifier here.

For example, if you used import qualified MyAst, pass "MyAst" here. If you used import qualified Data.MyLibrary.MyAst as MyLibrary.MyAst, pass "MyLibrary.MyAst" here.

I recommend that you always create a new module and that all you do in that module is apply syntaxTrees or earleyProduct, and that you then perform an import qualified to bring those names into scope in the module in which you use a function that takes a Qualifier argument. This avoids unlikely, but possible, issues that could otherwise arise due to naming conflicts.

Creating data types corresponding to grammars

syntaxTrees Source

Arguments

:: Name

Name of terminal type. Typically you will get this from the Template Haskell quoting mechanism, e.g. ''Char.

-> [Name]

What to derive, e.g. [''Eq, ''Ord, ''Show]

-> Seq (Rule t) 
-> DecsQ 

Makes the top-level declarations for each given Rule and for all ancestors of the given Rules. Since ancestors are included, you can get the entire tree of types that you need by applying this function to a single start symbol. Example: Pinchot.Examples.SyntaxTrees.

allRulesRecord Source

Arguments

:: Qualifier

Qualifier for data types corresponding to those created from the Rules

-> Name

Name of terminal type. Typically you will get this through the Template Haskell quoting mechanism, such as ''Char.

-> Seq (Rule t)

A record is created that holds a value for each Rule in the Seq, as well as for every ancestor of these Rules.

-> DecsQ

When spliced, this will create a single declaration that is a record with the name Productions.

a'NAME

where NAME is the name of the type. Don't count on these records being in any particular order.

Creates a record data type that holds a value of type

Prod r String (t, a) (p t a)

where

  • r is left universally quantified
  • t is the token type (often Char)
  • a is any additional information about each token (often Loc)
  • p is the type of the particular production

This always creates a single product type whose name is Productions; currently the name cannot be configured.

Example: Pinchot.Examples.SyntaxTrees.

Wrappers and optics

wrappedInstances :: Seq (Rule t) -> DecsQ Source

Creates a Wrapped instance for each Rule and its ancestors, if there is an instance. Only terminal, wrap, opt, star, and plus get instances of Wrapped.

This must be spliced in the same module in which the syntax tree types are created; this way, no orphans are created. Since ancestors are included, you can get the entire tree of types that you need by applying this function to a single start symbol.

Example: Pinchot.Examples.SyntaxTrees.

rulesToOptics Source

Arguments

:: Lift t 
=> Qualifier

Qualifier for module containing the data types that will get optics

-> Name

Type name for the terminal

-> Seq (Rule t) 
-> Q [Dec] 

Creates optics declarations for a Rule, if optics can be made for the Rule:

Each rule in the sequence of Rule, as well as all ancestors of those Rules, will be handled.

Example: Pinchot.Examples.RulesToOptics.

Creating Earley grammars

earleyGrammarFromRule Source

Arguments

:: Lift t 
=> Qualifier

Module prefix holding the data types created with syntaxTrees

-> Rule t

Create a grammar for this Rule

-> Q Exp 

Creates an expression that has type

Grammar r (Prod r String (c, a) (p c a))

where r is left universally quantified; c is the terminal type (often Char), a is arbitrary metadata about each token (often Loc) and p is the data type corresponding to the given Rule.

Example: addressGrammar.

earleyProduct Source

Arguments

:: Lift t 
=> Qualifier

Qualifier for data types corresponding to those created from the Rules

-> Qualifier

Qualifier for the type created with allRulesRecord

-> Seq (Rule t)

Creates an Earley grammar that contains a Prod for each Rule in this Seq, as well as all the ancestors of these Rules.

-> ExpQ

When spliced, earleyProduct creates an expression whose type is Grammar r (Productions r t a), where Productions is the type created by allRulesRecord; r is left universally quantified; t is the token type (often Char), and a is any additional information about each token (often Loc).

Creates a Grammar that contains a Prod for every given Rule and its ancestors. Example: addressAllProductions.

Terminalizers

terminalizeRuleExp :: Qualifier -> Rule t -> Q Exp Source

For the given rule, returns an expression that has type of either

Production a -> Seq (t, a)

or

Production a -> NonEmpty (t, a)

where Production is the production corresponding to the given Rule, and t is the terminal token type. NonEmpty is returned for productions that must always contain at least one terminal symbol; for those that can be empty, Seq is returned.

Example: terminalizeAddress.

terminalizers Source

Arguments

:: Qualifier

Qualifier for the module containing the data types created from the Rules

-> Name

Name of terminal type. Typically you will get this through the Template Haskell quoting mechanism, such as ''Char.

-> Seq (Rule t) 
-> Q [Dec] 

For all the given rules and their ancestors, creates declarations that reduce the rule and all its ancestors to terminal symbols. Each rule gets a declaration named t'RULE_NAME where RULE_NAME is the name of the rule. The type of the declaration is either

Production a -> Seq (t, a)

or

Production a -> NonEmpty (t, a)

where Production is the production corresponding to the given Rule, t is the terminal token type (often Char), and a is arbitrary metadata about each token (often Loc). NonEmpty is returned for productions that must always contain at least one terminal symbol; for those that can be empty, Seq is returned.

Example: Pinchot.Examples.Terminalize.

Locations

data Loc Source

A location.

Constructors

Loc 

Fields

_line :: Int
 
_col :: Int
 
_pos :: Int
 

locations :: FoldableLL full Char => full -> Seq (Char, Loc) Source

Takes any ListLike value based on Char (Seq, Text, String, etc.) and creates a Seq which pairs each Char with its location. Example: locatedFullParses.

noLocations :: FoldableLL full item => full -> Seq (item, ()) Source

Breaks a ListLike into a Seq but does not assign locations.

Running parsers with locations

locatedFullParses Source

Arguments

:: FoldableLL full Char 
=> (forall r. Grammar r (Prod r String (Char, Loc) (p Char Loc)))

Earley grammar with production that you want to parse.

-> full

Source text, e.g. String, Text, etc.

-> ([p Char Loc], Report String (Seq (Char, Loc)))

A list of successful parses that when to the end of the source string, along with the Earley report showing possible errors.

Obtains all full Earley parses from a given input string, after assigning a location to every Char. Example: address.