| Safe Haskell | Safe | 
|---|---|
| Language | Haskell98 | 
Text.XML.HaXml.Lex
Description
You don't normally need to use this Lex module directly - it is called automatically by the parser. (This interface is only exposed for debugging purposes.)
This is a hand-written lexer for tokenising the text of an XML
 document so that it is ready for parsing.  It attaches position
 information in (line,column) format to every token.  The main
 entry point is xmlLex.  A secondary entry point, xmlReLex, is
 provided for when the parser needs to stuff a string back onto
 the front of the text and re-tokenise it (typically when expanding
 macros).
As one would expect, the lexer is essentially a small finite state machine.
Synopsis
- xmlLex :: String -> String -> [Token]
- xmlReLex :: Posn -> String -> [Token]
- reLexEntityValue :: (String -> Maybe String) -> Posn -> String -> [Token]
- type Token = (Posn, TokenT)
- data TokenT- = TokCommentOpen
- | TokCommentClose
- | TokPIOpen
- | TokPIClose
- | TokSectionOpen
- | TokSectionClose
- | TokSection Section
- | TokSpecialOpen
- | TokSpecial Special
- | TokEndOpen
- | TokEndClose
- | TokAnyOpen
- | TokAnyClose
- | TokSqOpen
- | TokSqClose
- | TokEqual
- | TokQuery
- | TokStar
- | TokPlus
- | TokAmp
- | TokSemi
- | TokHash
- | TokBraOpen
- | TokBraClose
- | TokPipe
- | TokPercent
- | TokComma
- | TokQuote
- | TokName String
- | TokFreeText String
- | TokNull
- | TokError String
 
- data Special
- data Section
Entry points to the lexer
xmlLex :: String -> String -> [Token] Source #
The first argument to xmlLex is the filename (used for source positions,
   especially in error messages), and the second is the string content of
   the XML file.
xmlReLex :: Posn -> String -> [Token] Source #
xmlReLex is used when the parser expands a macro (PE reference).
    The expansion of the macro must be re-lexed as if for the first time.
reLexEntityValue :: (String -> Maybe String) -> Posn -> String -> [Token] Source #
reLexEntityValue is used solely within parsing an entityvalue.
   Normally, a PERef is logically separated from its surroundings by
   whitespace.  But in an entityvalue, a PERef can be juxtaposed to
   an identifier, so the expansion forms a new identifier.
   Thus the need to rescan the whole text for possible PERefs.
Token types
type Token = (Posn, TokenT) Source #
All tokens are paired up with a source position.
   Lexical errors are passed back as a special TokenT value.
The basic token type.
Constructors
| TokCommentOpen | <!-- | 
| TokCommentClose | 
 | 
| TokPIOpen | <? | 
| TokPIClose | ?> | 
| TokSectionOpen | <![ | 
| TokSectionClose | ]]> | 
| TokSection Section | CDATA INCLUDE IGNORE etc | 
| TokSpecialOpen | <! | 
| TokSpecial Special | DOCTYPE ELEMENT ATTLIST etc | 
| TokEndOpen | </ | 
| TokEndClose | /> | 
| TokAnyOpen | < | 
| TokAnyClose | |
| TokSqOpen | [ | 
| TokSqClose | ] | 
| TokEqual | = | 
| TokQuery | ? | 
| TokStar | * | 
| TokPlus | + | 
| TokAmp | & | 
| TokSemi | ; | 
| TokHash | # | 
| TokBraOpen | ( | 
| TokBraClose | ) | 
| TokPipe | | | 
| TokPercent | % | 
| TokComma | , | 
| TokQuote | '' or "" | 
| TokName String | begins with letter, no spaces | 
| TokFreeText String | any character data | 
| TokNull | fake token | 
| TokError String | lexical error |