Safe Haskell | Safe |
---|---|
Language | Haskell98 |
Parser combinator framework specialized to lexical analysis. Tokens can be specified via simple fold functions, and we include baked in source location handling.
If you want to parse expressions instead of performing lexical
analysis then try the parsec
or attoparsec
packages, which
have more general purpose combinators.
Matchers for standard tokens like comments and strings are in the Text.Lexer.Inchworm.Char module.
No dependencies other than the Haskell base
library.
Minimal example
The following code demonstrates how to perform lexical analysis of a simple LISP-like language. We use two separate name classes, one for variables that start with a lower-case letter, and one for constructors that start with an upper case letter.
Integers are scanned using the scanInteger
function from the
Text.Lexer.Inchworm.Char module.
The result of scanStringIO
contains the list of leftover input
characters that could not be parsed. In a real lexer you should
check that this is empty to ensure there has not been a lexical
error.
import Text.Lexer.Inchworm.Char import qualified Data.Char as Char -- | A source token. data Token = KBra | KKet | KVar String | KCon String | KInt Integer deriving Show -- | A thing with attached location information. data Located a = Located FilePath Location a deriving Show -- | Scanner for a lispy language. scanner :: FilePath -> Scanner IO Location [Char] (Located Token) scanner fileName = skip Char.isSpace $ alts [ fmap (stamp id) $ accept '(' KBra , fmap (stamp id) $ accept ')' KKet , fmap (stamp KInt) $ scanInteger , fmap (stamp KVar) $ munchWord (\ix c -> if ix == 0 then Char.isLower c else Char.isAlpha c) , fmap (stamp KCon) $ munchWord (\ix c -> if ix == 0 then Char.isUpper c else Char.isAlpha c) ] where -- Stamp a token with source location information. stamp k (l, t) = Located fileName l (k t) main = do let fileName = "Source.lispy" let source = "(some (Lispy like) 26 Program 93 (for you))" toks <- scanStringIO source (scanner fileName) print toks
Synopsis
- data Source m loc input
- data Scanner m loc input a
- scanListIO :: Eq i => loc -> (i -> loc -> loc) -> [i] -> Scanner IO loc [i] a -> IO ([a], loc, [i])
- makeListSourceIO :: forall i loc. Eq i => loc -> (i -> loc -> loc) -> [i] -> IO (Source IO loc [i])
- scanSourceToList :: Monad m => Source m loc [i] -> Scanner m loc [i] a -> m ([a], loc, [i])
- satisfies :: Monad m => (Elem input -> Bool) -> Scanner m loc input (loc, Elem input)
- skip :: Monad m => (Elem input -> Bool) -> Scanner m loc input a -> Scanner m loc input a
- accept :: (Monad m, Eq (Elem input)) => Elem input -> a -> Scanner m loc input (loc, a)
- accepts :: (Monad m, Sequence input, Eq input) => input -> a -> Scanner m loc input (loc, a)
- from :: Monad m => (Elem input -> Maybe a) -> Scanner m loc input (loc, a)
- froms :: Monad m => Maybe Int -> (input -> Maybe a) -> Scanner m loc input (loc, a)
- alt :: Monad m => Scanner m loc input a -> Scanner m loc input a -> Scanner m loc input a
- alts :: Monad m => [Scanner m loc input a] -> Scanner m loc input a
- munchPred :: Monad m => Maybe Int -> (Int -> Elem input -> Bool) -> (input -> Maybe a) -> Scanner m loc input (loc, a)
- munchWord :: Monad m => (Int -> Elem input -> Bool) -> Scanner m loc input (loc, input)
- munchFold :: Monad m => Maybe Int -> (Int -> Elem input -> state -> Maybe state) -> state -> (input -> Maybe a) -> Scanner m loc input (loc, a)
Basic Types
data Source m loc input Source #
An abstract source of input tokens that we want to perform lexical analysis on.
Each token is associated with a source location loc
.
A a sequence of tokens has type input
, and a single token type (Elem
input).
data Scanner m loc input a Source #
Scanner of input tokens that produces a result value
of type a
when successful.
Instances
Monad m => Monad (Scanner m loc input) Source # | |
Monad m => Functor (Scanner m loc input) Source # | |
Monad m => Applicative (Scanner m loc input) Source # | |
Defined in Text.Lexer.Inchworm.Scanner pure :: a -> Scanner m loc input a # (<*>) :: Scanner m loc input (a -> b) -> Scanner m loc input a -> Scanner m loc input b # liftA2 :: (a -> b -> c) -> Scanner m loc input a -> Scanner m loc input b -> Scanner m loc input c # (*>) :: Scanner m loc input a -> Scanner m loc input b -> Scanner m loc input b # (<*) :: Scanner m loc input a -> Scanner m loc input b -> Scanner m loc input a # |
Generic Scanning
:: Eq i | |
=> loc | Starting source location. |
-> (i -> loc -> loc) | Function to bump the current location by one input token. |
-> [i] | List of input tokens. |
-> Scanner IO loc [i] a | Scanner to apply. |
-> IO ([a], loc, [i]) |
Scan a list of generic input tokens in the IO monad, returning the source location of the final input token, along with the remaining input.
NOTE: If you just want to scan a String
of characters
use scanStringIO
from Text.Lexer.Inchworm.Char
Source Construction
:: Eq i | |
=> loc | Starting source location. |
-> (i -> loc -> loc) | Function to bump the current location by one input token. |
-> [i] | List of input tokens. |
-> IO (Source IO loc [i]) |
Make a source from a list of input tokens, maintaining the state in the IO monad.
Scanner Evaluation
scanSourceToList :: Monad m => Source m loc [i] -> Scanner m loc [i] a -> m ([a], loc, [i]) Source #
Apply a scanner to a source of input tokens, where the tokens are represented as a lazy list.
The result values are also produced in a lazy list.
Combinators
Basic
satisfies :: Monad m => (Elem input -> Bool) -> Scanner m loc input (loc, Elem input) Source #
Accept the next token if it matches the given predicate, returning that token as the result.
skip :: Monad m => (Elem input -> Bool) -> Scanner m loc input a -> Scanner m loc input a Source #
Skip tokens that match the given predicate, before applying the given argument scanner.
When lexing most source languages you can use this to skip whitespace.
Accept
accept :: (Monad m, Eq (Elem input)) => Elem input -> a -> Scanner m loc input (loc, a) Source #
Accept the next input token if it is equal to the given one,
and return a result of type a
.
accepts :: (Monad m, Sequence input, Eq input) => input -> a -> Scanner m loc input (loc, a) Source #
Accept a fixed length sequence of tokens that match the
given sequence, and return a result of type a
.
From
from :: Monad m => (Elem input -> Maybe a) -> Scanner m loc input (loc, a) Source #
Use the given function to check whether to accept the next token, returning the result it produces.
froms :: Monad m => Maybe Int -> (input -> Maybe a) -> Scanner m loc input (loc, a) Source #
Use the given function to check whether to accept a fixed length sequence of tokens, returning the result it produces.
Alternation
alt :: Monad m => Scanner m loc input a -> Scanner m loc input a -> Scanner m loc input a Source #
Combine two argument scanners into a result scanner, where the first argument scanner is tried before the second.
alts :: Monad m => [Scanner m loc input a] -> Scanner m loc input a Source #
Combine a list of argumenet scanners a result scanner, where each argument scanner is tried in turn until we find one that matches (or not).
Munching
:: Monad m | |
=> Maybe Int | Maximum number of tokens to consider,
or |
-> (Int -> Elem input -> Bool) | Predicate to decide whether to consider the next input token, also passed the index of the token in the prefix. |
-> (input -> Maybe a) | Take the prefix of input tokens and decide whether to produce a result value. |
-> Scanner m loc input (loc, a) | Scan a prefix of tokens of type |
Munch input tokens, using a predicate to select the prefix to consider.
Given munch (Just n) match accept
, we select a contiguous sequence
of tokens up to length n using the predicate match
, then pass
that sequence to accept
to make a result value out of it.
If match
selects no tokens, or accept
returns Nothing
then the scanner fails and no tokens are consumed from the source.
For example, to scan natural numbers use:
scanNat :: Monad m => Scanner m loc [Char] (loc, Integer) scanNat = munchPred Nothing match accept where match _ c = isDigit c accept cs = Just (read cs)
To match Haskell style constructor names use:
scanCon :: Monad m => Scanner m loc [Char] (loc, String) scanCon = munchPred Nothing match accept where match 0 c = isUpper c match _ c = isAlphaNum c accept cs = Just cs
If you want to detect built-in constructor names like Int
and Float
then you can do it in the accept
function and produce a different
result constructor for each one.
:: Monad m | |
=> (Int -> Elem input -> Bool) | Predicate to decide whether to accept the next input token, also passed the index of the token in the prefix. |
-> Scanner m loc input (loc, input) |
Like munchPred
, but we accept prefixes of any length,
and always accept the input tokens that match.
:: Monad m | |
=> Maybe Int | Maximum number of tokens to consider,
or |
-> (Int -> Elem input -> state -> Maybe state) | Fold function to decide whether to consider the next
input token. The next token will be considered if
the function produces a |
-> state | Initial state for the fold. |
-> (input -> Maybe a) | Take the prefix of input tokens and decide whether to produce a result value. |
-> Scanner m loc input (loc, a) | Scan a prefix of tokens of type |
Like munchPred
, but we can use a fold function to select the
prefix of tokens to consider. This is useful when lexing comments,
and string literals where consecutive tokens can have special meaning
(ie escaped quote characters).
See the source of scanHaskellChar
in the Text.Lexer.Inchworm.Char,
module for an example of its usage.