luthor-0.0.2: Tools for lexing and utilizing lexemes that integrate with Parsec.

Safe HaskellSafe
LanguageHaskell98

Text.Luthor.Combinator

Contents

Description

Parsec combinators are not composable by default. The idea is that this increases efficiency by eliminating unnecessary backtracking. The practical problem with this approach is that it is difficult to determine exactly where backtracking really is necessary and insert the needed try combinators. Any mistakes are almost always subtle and difficult to diagnose and fix.

Compute power is cheap and programmers are expensive, so it only makes sense to make it easy on the programmer first and on the computer second. This module is mostly full of composable drop-in replacements for Parsec combinators. There's also some renaming, to make the API more idiomatic. Also, some additional combinators are exported.

When migrating to this API, it is recommended to import Parsec modules qualified so that your code will be composable by default. Where efficiency really is needed, you can carefully but easily fall back to non-composable Parsec combinators by namespacing the appropriate combinators.

One place where we could not provide a replacement was the <|> operator. This is exported by the Alternative typeclass, which we really don't want to mess with. The composable alternative is spelled <||>.

The re-named parsers are option, optional, optional_, many_, and many1_. It may not be easiest to switch old habits, but these names are more idiomatic and also reduce the amount of typing necessary. Also, we've altered the semantics of manyTill to make room for the new manyThru combinator. This gives the user an easy choice whether to consume the terminating element. Finally, we've changed the type of notFollowedBy to allow writing x `notfollowedBy` y in place of x <* notFollowedBy y.

Below are some selected examples where this library is more intuitive:

  • In Parsec, string "aaa" <|> string "aa", will fail on input "aa". Using this module's <||> will succeed.
  • In Parsec, (char 'a' `sepBy` char ' ') *> optional (char ' ') *> eof will fail on input "a a ". Using this module's sepBy will succeed. Similar results hold for sepBy1, chainl, chainl1, chainr, and chainr1.
  • In Parsec, anyChar `manyTill` string "-->" *> eof will fail on input "part1 -- part2-->". Using this module's manyThru will succeed with the same semantics. This module's manyTill will not consume the "-->".
  • In Parsec, many (char a <* char ',') <* char a <* eof will fail on input a,a. Using this module's many will succeed. Similar results hold for many1, many_, many1_, endBy and endby1.

While we're at it, we've also re-exported applicative parsing functions and defined some of our own combinators that have been found useful. Applicative parsing is recommended over monadic parsing where it will suffice, so we'd rather eliminate the extra Control.Applicative import. Among the additional combinators defined here are dispatch, atLeast, atMost, manyNM, chomp, and between2.

Synopsis

Applicative Parsing

(<$>) :: Functor f => (a -> b) -> f a -> f b infixl 4

An infix synonym for fmap.

Examples

Convert from a Maybe Int to a Maybe String using show:

>>> show <$> Nothing
Nothing
>>> show <$> Just 3
Just "3"

Convert from an Either Int Int to an Either Int String using show:

>>> show <$> Left 17
Left 17
>>> show <$> Right 17
Right "17"

Double each element of a list:

>>> (*2) <$> [1,2,3]
[2,4,6]

Apply even to the second element of a pair:

>>> even <$> (2,2)
(2,True)

(<$$>) :: Functor f => f a -> (a -> b) -> f b infixl 4 Source

Flipped <$>.

Great for parsing infixes, e.g. addExpr = expr <$$> (+) <*> expr.

(<*>) :: Applicative f => forall a b. f (a -> b) -> f a -> f b

Sequential application.

(*>) :: Applicative f => forall a b. f a -> f b -> f b

Sequence actions, discarding the value of the first argument.

(<*) :: Applicative f => forall a b. f a -> f b -> f a

Sequence actions, discarding the value of the second argument.

(<**>) :: Applicative f => f a -> f (a -> b) -> f b infixl 4

A variant of <*> with the arguments reversed.

(<$) :: Functor f => forall a b. a -> f b -> f a

Replace all locations in the input with the same value. The default definition is fmap . const, but this may be overridden with a more efficient version.

pure :: Applicative f => forall a. a -> f a

Lift a value.

void :: Functor f => f a -> f ()

void value discards or ignores the result of evaluation, such as the return value of an IO action.

Examples

Replace the contents of a Maybe Int with unit:

>>> void Nothing
Nothing
>>> void (Just 3)
Just ()

Replace the contents of an Either Int Int with unit, resulting in an Either Int '()':

>>> void (Left 8675309)
Left 8675309
>>> void (Right 8675309)
Right ()

Replace every element of a list with unit:

>>> void [1,2,3]
[(),(),()]

Replace the second element of a pair with unit:

>>> void (1,2)
(1,())

Discard the result of an IO action:

>>> mapM print [1,2]
1
2
[(),()]
>>> void $ mapM print [1,2]
1
2

Choices

(<||>) :: Stream s m t => ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a infixl 3 Source

p <||> q tries to parse p, but if it fails, parses q.

Unlike the Alternative instance for ParsecT, backtracking will occur if p fails. That is, a parser such as string "flange" <||> string "fly" will succeed on the input "fly", whereas its Parsec counterpart will unintuitively fail.

choice :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a Source

choice ps tries to apply the parsers in the list ps in order, until one of them succeeds. Returns the value of the succeeding parser. Unlike the Parsec version, this one ensures that parsers do not consume input if they fail.

dispatch :: Stream s m t => [(ParsecT s u m a, ParsecT s u m b)] -> ParsecT s u m b Source

Given a map of parsers to parsers, attempt each key parser until one succeeds, then perform the value parser. Return the result of the value parser.

longestOf :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a Source

Attempt all of the passed parsers under the current conditions and return the value of the parser which makes it furthest into the input stream (and updates the parser's internals as if that were the only parser parsed).

longestOf [string "do", string "don't"]

Zero or One

option :: Stream s m t => a -> ParsecT s u m a -> ParsecT s u m a Source

option x p tries to apply parser p. If p fails, no input is consumed and x is returned.

optional :: Stream s m t => ParsecT s u m a -> ParsecT s u m (Maybe a) Source

optional p tries to parse p, but does not fail or consume input of p fails.

This is like optionMaybe, but is easier to type. See optional_.

optional_ :: Stream s m t => ParsecT s u m a -> ParsecT s u m () Source

optional_ p tries to parse p, but does not fail or consume input if p fails.

This is like Parsec's Text.Parsec.Combinator.optional', but the use of underscore is more idiomatic for actions whose results are ignored.

Many

many :: Stream s m t => ParsecT s u m a -> ParsecT s u m [a] Source

many p applies the parser p zero or more times and accumulates the result.

many1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m [a] Source

many p applies the parser p one or more times and accumulates the result.

many_ :: Stream s m t => ParsecT s u m a -> ParsecT s u m () Source

many1_ p applies the parser p zero or more times, skipping its result.

many1_ :: Stream s m t => ParsecT s u m a -> ParsecT s u m () Source

many1_ p applies the parser p one or more times, skipping its result.

count :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]

count n p parses n occurrences of p. If n is smaller or equal to zero, the parser equals to return []. Returns a list of n values returned by p.

atLeast :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a] Source

atLeast n p applies the parser p n or more times.

atMost :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a] Source

atMost n p applies the parser p up to n times.

manyNM :: Stream s m t => Int -> Int -> ParsecT s u m a -> ParsecT s u m [a] Source

manyNM n m p applies the parser p n or more times up to m times.

manyOf :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m [a] Source

Parse zero or more of any mix of the passed parsers.

manyOf_ :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m () Source

As manyOf, but ignoring the results.

Common Structures

Terminate

manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a] Source

manyTill p end applies parser p zero or more times until parser end succeeds. Returns the list of values returned by p. The end parser does not consume input, c.f. manyThru. This parser can be used to scan comments:

 simpleComment = do { string "//"
                    ; anyChar `manyTill` char '\n'
                    }

Note that despite the overlapping parsers anyChar and char '\n', there is never a need to add a try: the end parser does not consume input on failure.

manyThru :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a] Source

manyThru p end applies parser p zero or more times until parser end succeeds. Returns the list of values returned by p. The end parser does consume input, c.f. manyTill, but is not included in the result. This parser can be used to scan comments:

 simpleComment = do { string "<!--"
                    ; anyChar `manyThru` string "-->"
                    }

Note that despite the overlapping parsers anyChar and string "-->", there is no need to add a try: the end parser does not consume input on failure.

chomp :: Stream s m t => ParsecT s u m a -> ParsecT s u m trash -> ParsecT s u m a Source

chomp p x will parse p, then, provided x succeeds, discard a subsequent parse of x. This combinator will only fail when p fails, not when x does.

Surround

between :: Stream s m t => ParsecT s u m open -> ParsecT s u m close -> ParsecT s u m a -> ParsecT s u m a

between open close p parses open, followed by p and close. Returns the value returned by p.

 braces  = between (symbol "{") (symbol "}")

between2 :: Stream s m t => ParsecT s u m around -> ParsecT s u m a -> ParsecT s u m a Source

between2 p q is equivalent to between p p q

double_quoted = between2 (char '"')

Intercalate

sepBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

sepBy p sep parses zero or more occurrences of p, separated by sep. Returns a list of values returned by p.

 commaSep p  = p `sepBy` (symbol ",")

sepBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

sepBy1 p sep parses one or more occurrences of p, separated by sep. Returns a list of values returned by p.

sepEndBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

sepEndBy p sep parses zero or more occurrences of p, separated and optionally ended by sep.

 haskellStatements  = haskellStatement `sepEndBy` semi

sepEndBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

sepEndBy1 p sep parses one or more occurrences of p, separated and optionally ended by sep.

endBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

endBy p sep parses zero or more occurrences of p, seperated and ended by sep.

  cStatements  = cStatement `endBy` semi

endBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

endBy1 p sep parses one or more occurrences of p, seperated and ended by sep.

sepAroundBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

sepAroundBy p sep parses zero or more occurrences of p, separated and optionally starting with and ended by sep.

sepAroundBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

sepAroundBy1 p sep parses one or more occurrences of p, separated and optionally starting with and ended by sep.

Chaining

chainl :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a Source

chainl p op z parses zero or more occurrences of p, separated by op. Returns a value obtained by a left associative application of all functions returned by op to the values returned by p. If there are zero occurrences of p, the value z is returned. C.f. chainr.

chainl1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a Source

chainl1 p op parses one or more occurrences of p, separated by op Returns a value obtained by a left associative application of all functions returned by op to the values returned by p. This parser can for example be used to eliminate left recursion which typically occurs in expression grammars.

 expr    = term   `chainl1` mulop
 term    = factor `chainl1` addop
 factor  = parens expr <|> integer

 mulop   =   do{ symbol "*"; return (*)   }
         <|> do{ symbol "/"; return (div) }

 addop   =   do{ symbol "+"; return (+) }
         <|> do{ symbol "-"; return (-) }

C.f. chainr1.

chainr :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a Source

chainr p op z parses zero or more occurrences of p, separated by op Returns a value obtained by a right associative application of all functions returned by op to the values returned by p. If there are no occurrences of p, the value z is returned. C.f. chainl.

chainr1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a Source

chainr1 p op parses one or more occurrences of p, separated by op Returns a value obtained by a right associative application of all functions returned by op to the values returned by p. C.f. chainl1.

Lookahead

lookAhead :: Stream s m t => ParsecT s u m a -> ParsecT s u m a Source

lookAhead p parses p without consuming any input, even if p fails.

lookAhead_ :: Stream s m t => ParsecT s u m a -> ParsecT s u m () Source

As lookAhead, but throw away result.

notFollowedBy :: (Stream s m t, Show trash) => ParsecT s u m a -> ParsecT s u m trash -> ParsecT s u m a Source

notFollowedBy p q parses p, but only when q will fail immediately after parsing p. Parsing q never consumes input, and if this combinator fails, no input is consumed.

This combinator can be used to implement the 'longest match' rule. For example, when recognizing keywords (for example let), we want to make sure that a keyword is not followed by a legal identifier character, in which case the keyword is actually an identifier (for example lets). We can program this behavior as follows:

 keywordLet = string "let" `notFollowedBy` alphaNum

atEndOfInput :: (Stream s m t, Show t) => ParsecT s u m Bool Source

Returns True if there is no input left, False if there is.

endOfInput :: (Stream s m t, Show t) => ParsecT s u m () Source

Succeed only when at the end of the input stream.

Input Stream

allInput :: (Stream s m t, Show t) => ParsecT s u m a -> ParsecT s u m a Source

Uses the passed parser, but succeeds only if it consumes all of the input.

withRemainingInput :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a, s) Source

Parse using the passed parser, but also return the input that was not consumed.

Additional Data

(<?>) :: ParsecT s u m a -> String -> ParsecT s u m a infix 0

The parser p <?> msg behaves as parser p, but whenever the parser p fails without consuming any input, it replaces expect error messages with the expect error message msg.

This is normally used at the end of a set alternatives where we want to return an error message in terms of a higher level construct rather than returning all possible characters. For example, if the expr parser from the try example would fail, the error message is: '...: expecting expression'. Without the (<?>) combinator, the message would be like '...: expecting "let" or letter', which is less friendly.

expect :: Stream s m t => String -> ParsecT s u m a -> ParsecT s u m a Source

Flipped <?>.

withPosition :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a, SourcePos) Source

Annotate the return value of the passed parser with the position just before parsing.

withPositionEnd :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a, SourcePos) Source

Annotate the return value of the passed parser with the position just after parsing.

withPositions :: Stream s m t => ParsecT s u m a -> ParsecT s u m (SourcePos, a, SourcePos) Source

Annotate the return value of the passed parser with the position just before and after parsing respectively.

Re-exports

try :: ParsecT s u m a -> ParsecT s u m a

The parser try p behaves like parser p, except that it pretends that it hasn't consumed any input when an error occurs.

This combinator is used whenever arbitrary look ahead is needed. Since it pretends that it hasn't consumed any input when p fails, the (<|>) combinator will try its second alternative even when the first parser failed while consuming input.

The try combinator can for example be used to distinguish identifiers and reserved words. Both reserved words and identifiers are a sequence of letters. Whenever we expect a certain reserved word where we can also expect an identifier we have to use the try combinator. Suppose we write:

 expr        = letExpr <|> identifier <?> "expression"

 letExpr     = do{ string "let"; ... }
 identifier  = many1 letter

If the user writes "lexical", the parser fails with: unexpected 'x', expecting 't' in "let". Indeed, since the (<|>) combinator only tries alternatives when the first alternative hasn't consumed input, the identifier parser is never tried (because the prefix "le" of the string "let" parser is already consumed). The right behaviour can be obtained by adding the try combinator:

 expr        = letExpr <|> identifier <?> "expression"

 letExpr     = do{ try (string "let"); ... }
 identifier  = many1 letter

(<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a infixr 1

This combinator implements choice. The parser p <|> q first applies p. If it succeeds, the value of p is returned. If p fails without consuming any input, parser q is tried. This combinator is defined equal to the mplus member of the MonadPlus class and the (<|>) member of Alternative.

The parser is called predictive since q is only tried when parser p didn't consume any input (i.e.. the look ahead is 1). This non-backtracking behaviour allows for both an efficient implementation of the parser combinators and the generation of good error messages.

unexpected :: Stream s m t => String -> ParsecT s u m a

The parser unexpected msg always fails with an unexpected error message msg without consuming any input.

The parsers fail, (<?>) and unexpected are the three parsers used to generate error messages. Of these, only (<?>) is commonly used. For an example of the use of unexpected, see the definition of notFollowedBy.