pinchot-0.14.0.0: Build parsers and ASTs for context-free grammars

Safe HaskellNone
LanguageHaskell2010

Pinchot.Internal

Description

Pinchot internals. Ordinarily the Pinchot module should have everything you need.

Synopsis

Documentation

type RuleName = String Source

Type synonym for the name of a production rule. This will be the name of the type constructor for the corresponding type that will be created, so this must be a valid Haskell type constructor name.

If you are creating a terminal, option, list, list1, or wrap, the RuleName will also be used for the name of the single data construtor. If you are creating a nonTerminal, you will specify the name of each data constructor with AlternativeName.

type AlternativeName = String Source

Type synonym the the name of an alternative in a nonTerminal. This name must not conflict with any other data constructor, either one specified as an AlternativeName or one that was created using terminal, option, list, or list1.

data Branch t Source

A branch in a sum rule. In Branch s ls, s is the name of the data constructor, and ls is the list of rules that this branch produces.

Constructors

Branch String (Seq (Rule t)) 

Instances

Eq t => Eq (Branch t) Source 
Ord t => Ord (Branch t) Source 
Show t => Show (Branch t) Source 

data RuleType t Source

Constructors

RTerminal (Intervals t) 
RBranch (Branch t, Seq (Branch t)) 
RUnion (Rule t, Seq (Rule t)) 
RSeqTerm (Seq t) 
ROptional (Rule t) 
RList (Rule t) 
RList1 (Rule t) 
RWrap (Rule t) 
RRecord (Seq (Rule t)) 

Instances

Eq t => Eq (RuleType t) Source 
Ord t => Ord (RuleType t) Source 
Show t => Show (RuleType t) Source 

data Rule t Source

A single production rule. It may be a terminal or a non-terminal.

Constructors

Rule String (Maybe String) (RuleType t) 

Instances

Eq t => Eq (Rule t) Source 
Ord t => Ord (Rule t) Source 
Show t => Show (Rule t) Source 

label :: String -> Rule t -> Rule t Source

Name a Rule for use in error messages. If you do not name a rule using this combinator, the rule's type name will be used in error messages.

(<?>) :: Pinchot t (Rule t) -> String -> Pinchot t (Rule t) infixr 0 Source

Infix form of label for use in a Pinchot; handy for use in do or mdo notation.

data Names t Source

Constructors

Names 

Instances

Eq t => Eq (Names t) Source 
Ord t => Ord (Names t) Source 
Show t => Show (Names t) Source 

data Error Source

Errors that may arise when constructing an AST.

Constructors

InvalidName String

A name was invalid. The field is the invalid name. The name might be invalid because it was already used, or because it does not begin with a capital letter.

EmptyNonTerminal String

A non-terminal must have at least one summand. The field is the name of the empty non-terminal.

newtype Pinchot t a Source

Constructs new Rules. t is the type of the token; often this will be Char.

Pinchot is a Monad and an Applicative so you can combine computations using the usual methods of those classes. Also, Pinchot is a MonadFix. This allows you to construct a Rule that depends on itself, and to construct sets of Rules that have mutually recursive dependencies. MonadFix also allows you to use the GHC RecursiveDo extension. Put

{-# LANGUAGE RecursiveDo #-}

at the top of your module, then use mdo instead of do. Because an mdo block is recursive, you can use a binding before it is defined, just as you can in a set of let bindings.

Constructors

Pinchot 

Fields

runPinchot :: ExceptT Error (State (Names t)) a
 

goPinchot :: Pinchot t a -> Q (Names t, a) Source

Runs a Pinchot with a starting empty state. Fails in the Q monad if the grammar is bad.

terminal Source

Arguments

:: RuleName 
-> Intervals t

Valid terminal symbols

-> Pinchot t (Rule t) 

Creates a terminal production rule.

terminalSeq Source

Arguments

:: RuleName 
-> Seq t

Sequence of terminal symbols to recognize

-> Pinchot t (Rule t) 

Creates a production for a sequence of terminals. Useful for parsing specific words.

nonTerminal Source

Arguments

:: RuleName 
-> Seq (AlternativeName, Seq (Rule t))

Alternatives. There must be at least one alternative; otherwise, an error will result. In each pair (a, b), a will be the data constructor, so this must be a valid Haskell data constructor name. b is the sequence of production rules, which can be empty (this is how to create an epsilon production).

-> Pinchot t (Rule t) 

Creates a new non-terminal production rule.

unionBranchName Source

Arguments

:: RuleName

Name of the parent rule

-> RuleName

Name of the branch rule

-> AlternativeName 

union Source

Arguments

:: RuleName 
-> Seq (Rule t)

List of alternatives. There must be at least one alternative; otherwise a compile-time error will occur.

-> Pinchot t (Rule t) 

Creates a new non-terminal production rule where each alternative produces only one rule. The constructor name for each alternative is

RULE_NAME'PRODUCTION_NAME

where RULE_NAME is the name of the rule itself, and PRODUCTION_NAME is the rule name for what is being produced. For an example, see Suffix.

Currently there is no way to change the names of the constructors; however, you can use nonTerminal, which is more flexible.

record Source

Arguments

:: RuleName

The name of this rule, which is used both as the type name and the name of the sole data constructor.

-> Seq (Rule t)

The right-hand side of this rule. This sequence can be empty, which results in an epsilon production.

-> Pinchot t (Rule t) 

Creates a new non-terminal production rule with only one alternative where each field has a record name. The name of each record is:

_r'RULE_NAME'INDEX'FIELD_TYPE

where RULE_NAME is the name of this rule, INDEX is the index number for this field (starting with 0), and FIELD_TYPE is the type of the field itself. For an example, see Address.

Currently there is no way to change the names of the record fields.

list Source

Arguments

:: Rule t

The resulting Rule is a sequence of productions of this Rule; that is, this Rule may appear zero or more times.

-> Pinchot t (Rule t) 

Creates a rule for the production of a sequence of other rules. The name for the created Rule is the name of the Rule to which this function is applied, with 'Seq appended.

list1 Source

Arguments

:: Rule t

The resulting Rule produces this Rule at least once.

-> Pinchot t (Rule t) 

Creates a rule for a production that appears at least once. The name for the created Rule is the name of the Rule to which this function is applied, with 'Seq1 appended.

option Source

Arguments

:: Rule t

The resulting Rule optionally produces this Rule; that is, this Rule may appear once or not at all.

-> Pinchot t (Rule t) 

Creates a rule for a production that optionally produces another rule. The name for the created Rule is the name of the Rule to which this function is applied, with 'Maybe appended to the end.

wrap Source

Arguments

:: RuleName 
-> Rule t

The resulting Rule simply wraps this Rule.

-> Pinchot t (Rule t) 

Creates a newtype wrapper.

getAncestors :: Rule t -> State (Set String) (Seq (Rule t)) Source

Gets all ancestor Rules. Skips duplicates.

ruleAndAncestors :: Rule t -> Seq (Rule t) Source

Returns both this Rule and any Rules that are ancestors.

rulesDemandedBeforeDefined :: Foldable f => f (Rule t) -> Set Name Source

Given a sequence of Rule, determine which rules are on a right-hand side before they are defined.

thUnionBranch Source

Arguments

:: RuleName

Parent rule name

-> Rule t

Child rule

-> ConQ 

thRule Source

Arguments

:: Lift t 
=> Bool

If True, make lenses.

-> Name

Name of terminal type

-> Seq Name

What to derive

-> Rule t 
-> Q [Dec] 

makeType Source

Arguments

:: Name

Name of terminal type

-> Seq Name

What to derive

-> String

Name of rule

-> RuleType t 
-> Q Dec 

fieldName Source

Arguments

:: Int

Index

-> String

Parent type name

-> String

Inner type name

-> String 

Field name - without a leading underscore

thAllRules Source

Arguments

:: Lift t 
=> Bool

If True, make optics as well.

-> Name

Terminal type constructor name

-> Seq Name

What to derive

-> Map Int (Rule t) 
-> DecsQ 

makeWrapped Source

Arguments

:: Type

Name of wrapped type

-> String

Name of wrapper type

-> Dec 

dynP :: String -> PatQ Source

TH helper like dyn but for patterns

seqTermToOptics Source

Arguments

:: Lift t 
=> Name

Terminal type name

-> String

Rule name

-> Seq t 
-> Q [Dec] 

terminalToOptics Source

Arguments

:: Lift t 
=> Name

Terminal type name

-> String

Rule name

-> Intervals t 
-> Q [Dec] 

Creates a prism for a terminal type. Although a newtype wraps each terminal, do not make a Wrapped or an Iso, because the relationship between the outer type and the type that it wraps typically is not isometric. Thus, use a Prism instead, which captures this relationship properly.

optionalToOptics Source

Arguments

:: String

Wrapped rule name

-> String

Wrapping Rule name

-> Dec 

many1ToOptics Source

Arguments

:: String

Wrapped rule name

-> String

Wrapping Rule name

-> Dec 

manyToOptics Source

Arguments

:: String

Wrapped rule name

-> String

Wrapping Rule name

-> Dec 

wrapToOptics Source

Arguments

:: String

Wrapped rule name

-> String

Wrapping Rule name

-> Dec 

terminalSeqToOptics Source

Arguments

:: Name

Terminal type name

-> String

Rule name

-> Dec 

branchesToOptics Source

Arguments

:: String

Rule name

-> Branch t 
-> Seq (Branch t) 
-> [Dec] 

unionToOptics Source

Arguments

:: String

Rule name

-> Rule t

First rule

-> Seq (Rule t)

Remaining rules

-> DecsQ 

recordsToOptics Source

Arguments

:: String

Rule name

-> Seq (Rule t) 
-> [Dec] 

ruleToOptics Source

Arguments

:: Lift t 
=> Name

Terminal type name

-> String

Rule name

-> RuleType t 
-> DecsQ 

type MakeOptics = Bool Source

Should optics be made?

makeOptics :: MakeOptics Source

Creates optics.

If you use this option, you will need {-# LANGUAGE TypeFamilies #-}

at the top of the module into which you splice in the declarations, because you will get instances of Wrapped.

Creates the listed optics for each kind of Rule, as follows:

  • terminal: Prism' a b, where a is the type of the terminal token (often Char) and b is the type of this particular production. For an example, see _Comma.
>>> ',' ^? _Comma
Just (Comma ',')
>>> 'a' ^? _Comma
Nothing
>>> Comma ',' ^. re _Comma
','

Thus this gives you a safe way to insert tokens into types made with terminal (useful if you want to construct a syntax tree.)

noOptics :: MakeOptics Source

Do not make any optics.

allRulesToTypes Source

Arguments

:: Lift t 
=> MakeOptics 
-> Name

Terminal type constructor name. Typically you will use the Template Haskell quoting mechanism to get this.

-> Seq Name

What to derive. For instance, you might use Eq, Ord, and Show here. Each created data type will derive these instances.

-> Pinchot t a

The return value from the Pinchot is ignored.

-> DecsQ 

Creates data types for every Rule created in the Pinchot. The data types are created in the same order in which they were created in the Pinchot. When spliced, the DecsQ is a list of declarations, each of which is an appropriate data or newtype. For an example use of allRulesToTypes, see Pinchot.Examples.PostalAstAllRules.

Also creates bindings whose names are prefixed with t'. Each of these is a function that, when given a particular production, reduces it to a sequence of terminal symbols.

ruleTreeToTypes Source

Arguments

:: Lift t 
=> MakeOptics 
-> Name

Terminal type constructor name. Typically you will use the Template Haskell quoting mechanism to get this.

-> Seq Name

What to derive. For instance, you might use Eq, Ord, and Show here. Each created data type will derive these instances.

-> Pinchot t (Rule t)

A data type is created for the Rule that the Pinchot returns, and for the ancestors of the Rule.

-> DecsQ 

Creates data types only for the Rule returned from the Pinchot, and for its ancestors.

Also creates bindings whose names are prefixed with t'. Each of these is a function that, when given a particular production, reduces it to a sequence of terminal symbols.

ruleToParser Source

Arguments

:: Lift t 
=> String

Module prefix

-> Rule t 
-> [(Name, ExpQ)] 

constructorName Source

Arguments

:: String

Module prefix

-> String

Name of constructor

-> ExpQ 

branchToParser Source

Arguments

:: Lift t 
=> String

Module prefix

-> Branch t 
-> ExpQ 

lazyPattern :: Foldable c => c Name -> Q Pat Source

Creates a lazy pattern for all the given names. Adds an empty pattern onto the front. This is the counterpart of bigTuple. All of the given names are bound. In addition, a single, wildcard pattern is bound to the front.

For example, lazyPattern (map mkName ["x", "y", "z"]) gives a pattern that looks like

~(_, (x, (y, (z, ()))))

The idea is that the named patterns are needed so that the recursive do notation works, and that the wildcard pattern is the return value, which is not needed here.

bigTuple Source

Arguments

:: Foldable c 
=> ExpQ

This expression will be the first one in the tuple.

-> c ExpQ

Remaining expressions in the tuple.

-> ExpQ 

Creates a big tuple. It is nested in the second element, such as (1, (2, (3, (4, ())))). Thus, the big tuple is terminated with a unit value. It resembles a list where each tuple is a cons cell and the terminator is unit.

earleyGrammar Source

Arguments

:: Lift t 
=> Qualifier

Qualifier for data types crated with ruleTreeToTypes or allRulesToTypes

-> Pinchot t (Rule t)

Creates an Earley parser for the Rule that the Pinchot returns.

-> Q Exp

When spliced, this expression has type Grammar r (Prod r String t a)

where

r is left universally quantified

t is the type of the token (usually Char)

a is the type defined by the Rule.

Creates an Earley grammar for a given Rule. For examples of how to use this, see the source code for Pinchot.Examples.PostalAstRuleTree and for Pinchot.Examples.PostalAstAllRules.

recursiveDo Source

Arguments

:: [(Name, ExpQ)]

Binding statements

-> ExpQ

Final return value from do block. The type of this ExpQ must be in the same monad as the do block; it must not be a pure value.

-> ExpQ

Returns an expression whose value is the final return value from the do block.

Builds a recursive do expression (because TH has no support for mdo notation).

earleyGrammarFromRule Source

Arguments

:: Lift t 
=> String

Module prefix

-> Rule t 
-> Q Exp 

allEarleyGrammars Source

Arguments

:: Lift t 
=> Qualifier

Qualifier for data types created with ruleTreeToTypes or allRulesToTypes

-> Name

Name for the terminal type; often this is Char. Typically you will use the Template Haskell quoting mechanism--for example, ''Char.

-> Pinchot t a

Creates an Earley grammar for each Rule created in the Pinchot. The return value of the Pinchot computation is ignored.

-> DecsQ

When spliced, this is a list of declarations. Each declaration has type Grammar r (Prod r String t a)

where

r is left universally quantified

t is the type of the token (usually Char)

a is the type defined by the Rule.

The name of each declaration is g'TYPE_NAME

where TYPE_NAME is the name of the type defined in the corresponding Rule.

Creates an Earley grammar for each Rule created in a Pinchot. For a Pinchot with a large number of Rules, this can create a large number of declarations that can take a long time to compile--sometimes several minutes. For lower compilation times, try earleyProduct.

addIndices :: Foldable c => c a -> [(Int, a)] Source

productionDecl Source

Arguments

:: String

Rule name

-> Name

Name of terminal type

-> RuleType t 
-> DecsQ 

Creates a production declaration for a Rule.

type Qualifier = String Source

Many functions take an argument that holds the name qualifier for the module that contains the data types created by applying ruleTreeToTypes or allRulesToTypes to the 'Pinchot.'

You have to make sure that the data types you created with ruleTreeToTypes, allRulesToTypes, or allRulesRecord are in scope. The spliced Template Haskell code has to know where to look for these data types. If you did an unqualified import or if the types are in the same module as is the splice of earleyParser, just pass the empty string here. If you did a qualified import, use the appropriate qualifier here.

For example, if you used import qualified MyAst, pass "MyAst" here. If you used import qualified Data.MyLibrary.MyAst as MyLibrary.MyAst, pass "MyLibrary.MyAst" here.

I recommend that you always create a new module and that all you do in that module is apply ruleTreeToTypes or allRulesToTypes, and that you then perform an import qualified to bring those names into scope in the module in which you use a function that takes a Qualifier argument. This avoids unlikely, but possible, issues that could otherwise arise due to naming conflicts.

allRulesRecord Source

Arguments

:: Qualifier

Qualifier for data types created with ruleTreeToTypes or allRulesToTypes

-> Name

Name of terminal type. Typically you will get this through the Template Haskell quoting mechanism, such as ''Char.

-> Pinchot t a

A record is created that holds a value for each Rule created in the Pinchot; the return value of the Pinchot is ignored.

-> DecsQ

When spliced, this will create a single declaration that is a record with the name Productions. It will have one type variable, r. Each record in the declaration will have a name like so:

a'NAME

where NAME is the name of the type. Don't count on these records being in any particular order.

Creates a record data type that holds a value of type

Prod r String t a

for every rule created in the Pinchot. r is left universally quantified, t is the token type (typically Char) and a is the type of the rule.

This always creates a single product type whose name is Productions; currently the name cannot be configured.

For an example of the use of allRulesRecord, please see Pinchot.Examples.AllRulesRecord.

earleyProduct Source

Arguments

:: Lift t 
=> Qualifier

Qualifier for data types created with ruleTreeToTypes or allRulesToTypes

-> Qualifier

Module prefix for the type created with allRulesRecord

-> Pinchot t a

Creates an Earley grammar that contains a Prod for each Rule in the Pinchot. The return value from the Pinchot is ignored.

-> ExpQ

When spliced, earleyProduct creates an expression whose type is Grammar r (Productions r), where Productions is the type created by allRulesRecord.

Creates a Grammar that contains a Prod for every Rule created in the Pinchot.