Copyright | (c) Sebastian Tee 2023 |
---|---|
License | MIT |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Synopsis
- type Grammar token = [GrammarRule token]
- data GrammarRule token
- type Lexer token = String -> Either LexException [token]
- data LexException
- hlex :: Grammar token -> Lexer token
Example
Here is an example module for a simple language.
module ExampleLang ( MyToken(..) -- Export the language's tokens and the lexer , myLexer ) where import Hlex data MyToken = Ident String -- String identifier token | Number Float -- Number token and numeric value | Assign -- Assignment operator token deriving(Show) myGrammar :: Grammar MyToken myGrammar = [ Error ""[^"]*n" "Can't have a new line in a string" -- Return Exception when a new line occurs in a string , Tokenize ""[^"]*"" $ Str . init . tail -- Encode string and strip the containing quotes , JustToken "=" Assign -- "=" Operator becomes the assign token , Tokenize "[a-zA-Z]+" (match -> Ident match) -- Identifier token with string , Tokenize "[0-9]+(\.[0-9]+)?" (match -> Number (read match) -- Number token with the parsed numeric value stored as a Float , Skip "[ \n\r\t]+" -- Skip whitespace ] myLexer :: Lexer MyToken myLexer = hlex myGrammar -- hlex turns a Grammar into a Lexer
Here is the lexer being used on a simple program.
>>>
lexer "x = 1.2"
Right [Ident "x", Assign, Number 1.2]
Here is the lexer being used on an program with a syntax error.
>>>
lexer "x = \"a\nb\""
Left (MatchedException 1 5 "\"a\n" "Can't have a new line in a string")
The lexer uses Either
. Right means the lexer successfully parsed the program to a list of MyTokens.
If Left was returned it would be a LexException
.
Types
type Grammar token = [GrammarRule token] Source #
Lexical grammar made up of GrammarRule
s.
The order is important. The Lexer
will apply each GrammarRule
rule in the order listed.
data GrammarRule token Source #
These are the individual rules that make up a Grammar
.
Takes a POSIX regular expression then converts it to a token or skips it.
type Lexer token = String -> Either LexException [token] Source #
Converts a string into a list of tokens.
If the string does not follow the Lexer's Grammar
a LexException
will be returned.
Exceptions
data LexException Source #
Exception thrown when a Lexer
encounters an error when lexxing a string.
UnmatchedException | Exception thrown when a substring cannot be matched. |
MatchedException | Exception thrown when a macth is found on the |
Instances
Read LexException Source # | |
Show LexException Source # | |
Eq LexException Source # | |
Defined in Hlex (==) :: LexException -> LexException -> Bool Source # (/=) :: LexException -> LexException -> Bool Source # |