Copyright	(c) Daan Leijen 1999-2001, (c) Paolo Martini 2007
License	BSD-style (see the LICENSE file)
Maintainer	derek.a.elkins@gmail.com
Stability	provisional
Portability	non-portable (uses local universal quantification: PolymorphicComponents)
Safe Haskell	Safe-Inferred
Language	Haskell98

Text.Parsec.Token

Description

A helper module to parse lexical elements (tokens). See makeTokenParser for a description of how to use it.

Synopsis

Documentation

type LanguageDef st = GenLanguageDef String st Identity Source

data GenLanguageDef s u m Source

The GenLanguageDef type is a record that contains all parameterizable features of the Text.Parsec.Token module. The module Text.Parsec.Language contains some default definitions.

Constructors

LanguageDef

Fields

commentStart :: String: Describes the start of a block comment. Use the empty string if the language doesn't support block comments. For example "/*".
commentEnd :: String: Describes the end of a block comment. Use the empty string if the language doesn't support block comments. For example "*/".
commentLine :: String: Describes the start of a line comment. Use the empty string if the language doesn't support line comments. For example "//".
nestedComments :: Bool: Set to True if the language supports nested block comments.
identStart :: ParsecT s u m Char: This parser should accept any start characters of identifiers. For example letter <|> char "_".
identLetter :: ParsecT s u m Char: This parser should accept any legal tail characters of identifiers. For example alphaNum <|> char "_".
opStart :: ParsecT s u m Char: This parser should accept any start characters of operators. For example oneOf ":!#$%&*+./<=>?@\\^|-~"
opLetter :: ParsecT s u m Char: This parser should accept any legal tail characters of operators. Note that this parser should even be defined if the language doesn't support user-defined operators, or otherwise the reservedOp parser won't work correctly.
reservedNames :: [String]: The list of reserved identifiers.
reservedOpNames :: [String]: The list of reserved operators.
caseSensitive :: Bool: Set to True if the language is case sensitive.

type TokenParser st = GenTokenParser String st Identity Source

data GenTokenParser s u m Source

The type of the record that holds lexical parsers that work on s streams with state u over a monad m.

Constructors

TokenParser

Fields

identifier :: ParsecT s u m String

This lexeme parser parses a legal identifier. Returns the identifier string. This parser will fail on identifiers that are reserved words. Legal identifier (start) characters and reserved words are defined in the LanguageDef that is passed to makeTokenParser. An identifier is treated as a single token using try.

reserved :: String -> ParsecT s u m ()

The lexeme parser reserved name parses symbol name, but it also checks that the name is not a prefix of a valid identifier. A reserved word is treated as a single token using try.

operator :: ParsecT s u m String

This lexeme parser parses a legal operator. Returns the name of the operator. This parser will fail on any operators that are reserved operators. Legal operator (start) characters and reserved operators are defined in the LanguageDef that is passed to makeTokenParser. An operator is treated as a single token using try.

reservedOp :: String -> ParsecT s u m ()

The lexeme parser reservedOp name parses symbol name, but it also checks that the name is not a prefix of a valid operator. A reservedOp is treated as a single token using try.

charLiteral :: ParsecT s u m Char

This lexeme parser parses a single literal character. Returns the literal character value. This parsers deals correctly with escape sequences. The literal character is parsed according to the grammar rules defined in the Haskell report (which matches most programming languages quite closely).

stringLiteral :: ParsecT s u m String

This lexeme parser parses a literal string. Returns the literal string value. This parsers deals correctly with escape sequences and gaps. The literal string is parsed according to the grammar rules defined in the Haskell report (which matches most programming languages quite closely).

natural :: ParsecT s u m Integer

This lexeme parser parses a natural number (a positive whole number). Returns the value of the number. The number can be specified in decimal, hexadecimal or octal. The number is parsed according to the grammar rules in the Haskell report.

integer :: ParsecT s u m Integer

This lexeme parser parses an integer (a whole number). This parser is like natural except that it can be prefixed with sign (i.e. '-' or '+'). Returns the value of the number. The number can be specified in decimal, hexadecimal or octal. The number is parsed according to the grammar rules in the Haskell report.

float :: ParsecT s u m Double

This lexeme parser parses a floating point value. Returns the value of the number. The number is parsed according to the grammar rules defined in the Haskell report.

naturalOrFloat :: ParsecT s u m (Either Integer Double)

This lexeme parser parses either natural or a float. Returns the value of the number. This parsers deals with any overlap in the grammar rules for naturals and floats. The number is parsed according to the grammar rules defined in the Haskell report.

decimal :: ParsecT s u m Integer

Parses a positive whole number in the decimal system. Returns the value of the number.

hexadecimal :: ParsecT s u m Integer

Parses a positive whole number in the hexadecimal system. The number should be prefixed with "0x" or "0X". Returns the value of the number.

octal :: ParsecT s u m Integer

Parses a positive whole number in the octal system. The number should be prefixed with "0o" or "0O". Returns the value of the number.

symbol :: String -> ParsecT s u m String

Lexeme parser symbol s parses string s and skips trailing white space.

lexeme :: forall a. ParsecT s u m a -> ParsecT s u m a

lexeme p first applies parser p and than the whiteSpace parser, returning the value of p. Every lexical token (lexeme) is defined using lexeme, this way every parse starts at a point without white space. Parsers that use lexeme are called lexeme parsers in this document.

The only point where the whiteSpace parser should be called explicitly is the start of the main parser in order to skip any leading white space.

   mainParser  = do{ whiteSpace
                    ; ds <- many (lexeme digit)
                    ; eof
                    ; return (sum ds)
                    }

whiteSpace :: ParsecT s u m ()

Parses any white space. White space consists of zero or more occurrences of a space, a line comment or a block (multi line) comment. Block comments may be nested. How comments are started and ended is defined in the LanguageDef that is passed to makeTokenParser.

parens :: forall a. ParsecT s u m a -> ParsecT s u m a

Lexeme parser parens p parses p enclosed in parenthesis, returning the value of p.

braces :: forall a. ParsecT s u m a -> ParsecT s u m a

Lexeme parser braces p parses p enclosed in braces ('{' and '}'), returning the value of p.

angles :: forall a. ParsecT s u m a -> ParsecT s u m a

Lexeme parser angles p parses p enclosed in angle brackets ('<' and '>'), returning the value of p.

brackets :: forall a. ParsecT s u m a -> ParsecT s u m a

Lexeme parser brackets p parses p enclosed in brackets ('[' and ']'), returning the value of p.

squares :: forall a. ParsecT s u m a -> ParsecT s u m a

DEPRECATED: Use brackets.

semi :: ParsecT s u m String

Lexeme parser |semi| parses the character ';' and skips any trailing white space. Returns the string ";".

comma :: ParsecT s u m String

Lexeme parser comma parses the character ',' and skips any trailing white space. Returns the string ",".

colon :: ParsecT s u m String

Lexeme parser colon parses the character ':' and skips any trailing white space. Returns the string ":".

dot :: ParsecT s u m String

Lexeme parser dot parses the character '.' and skips any trailing white space. Returns the string ".".

semiSep :: forall a. ParsecT s u m a -> ParsecT s u m [a]

Lexeme parser semiSep p parses zero or more occurrences of p separated by semi. Returns a list of values returned by p.

semiSep1 :: forall a. ParsecT s u m a -> ParsecT s u m [a]

Lexeme parser semiSep1 p parses one or more occurrences of p separated by semi. Returns a list of values returned by p.

commaSep :: forall a. ParsecT s u m a -> ParsecT s u m [a]

Lexeme parser commaSep p parses zero or more occurrences of p separated by comma. Returns a list of values returned by p.

commaSep1 :: forall a. ParsecT s u m a -> ParsecT s u m [a]

Lexeme parser commaSep1 p parses one or more occurrences of p separated by comma. Returns a list of values returned by p.

makeTokenParser :: Stream s m Char => GenLanguageDef s u m -> GenTokenParser s u m Source

The expression makeTokenParser language creates a GenTokenParser record that contains lexical parsers that are defined using the definitions in the language record.

The use of this function is quite stylized - one imports the appropiate language definition and selects the lexical parsers that are needed from the resulting GenTokenParser.

 module Main where

 import Text.Parsec
 import qualified Text.Parsec.Token as P
 import Text.Parsec.Language (haskellDef)

 -- The parser
 ...

 expr  =   parens expr
       <|> identifier
       <|> ...
      

 -- The lexer
 lexer       = P.makeTokenParser haskellDef    
     
 parens      = P.parens lexer
 braces      = P.braces lexer
 identifier  = P.identifier lexer
 reserved    = P.reserved lexer
 ...