Copyright | (c) Alec Theriault 2017-2018 |
---|---|
License | BSD-style |
Maintainer | alec.theriault@gmail.com |
Stability | experimental |
Portability | portable |
Safe Haskell | None |
Language | Haskell2010 |
As much as possible, this follows Rust's choices for tokenization, including punting some things to
the parser. For instance, the last two >
in Vec<Option<i32>>
are lexed as a single
GreaterGreater
token while the last two tokens of Vec<Option<Option<i32>>>
are
GreaterGreater
and Greater
.
Yet weirder (but very useful in parsing for dealing with conflicts and precedences of logical and,
bitwise and, and unary reference), &&&x&&&y
lexes into AmpersandAmpersand
, Ampersand
,
, IdentTok
"x"AmpersandAmpersand
, Ampersand
,
. Although the parser sometimes
needs to "break apart" tokens, it never has to think about putting them together. That means it can
easily figure out that IdentTok
"y"&&&x&&&y
parses as &(&(&x)) && (&y)
and not &(&(&x)) & (&(&y))
even if
bitwise conjunctions bind more tightly that logical conjunctions.
This sort of amguity where one token need to be broken up by the parser occurs for
&&
in patterns like&&mut x
||
in closures with no arguments like|| x
<<
in qualified type paths likeFromIterator<<A as IntoIterator>::Item>
>>
in qualified paths like<Self as Foo<T>>::Bar
>=
in equality predicates likeF<A>=i32
>>=
in equality predicates likeF<G<A>>=i32
- lexToken :: P (Spanned Token)
- lexNonSpace :: P (Spanned Token)
- lexTokens :: P (Spanned Token) -> P [Spanned Token]
- lexShebangLine :: P (Maybe String)
- data Token
- = Equal
- | Less
- | Greater
- | Ampersand
- | Pipe
- | Exclamation
- | Tilde
- | Plus
- | Minus
- | Star
- | Slash
- | Percent
- | Caret
- | GreaterEqual
- | GreaterGreaterEqual
- | AmpersandAmpersand
- | PipePipe
- | LessLess
- | GreaterGreater
- | EqualEqual
- | NotEqual
- | LessEqual
- | LessLessEqual
- | MinusEqual
- | AmpersandEqual
- | PipeEqual
- | PlusEqual
- | StarEqual
- | SlashEqual
- | CaretEqual
- | PercentEqual
- | At
- | Dot
- | DotDot
- | DotDotEqual
- | DotDotDot
- | Comma
- | Semicolon
- | Colon
- | ModSep
- | RArrow
- | LArrow
- | FatArrow
- | Pound
- | Dollar
- | Question
- | OpenDelim !Delim
- | CloseDelim !Delim
- | LiteralTok LitTok (Maybe Name)
- | IdentTok Ident
- | Underscore
- | LifetimeTok Ident
- | Space Space Name
- | Doc String !AttrStyle !Bool
- | Shebang
- | Eof
- | Interpolated (Nonterminal Span)
- lexicalError :: P a
Lexing
lexToken :: P (Spanned Token) Source #
Lexer for one Token
. The only token this cannot produce is Interpolated
.
lexNonSpace :: P (Spanned Token) Source #
Lexer for one non-whitespace Token
. The only tokens this cannot produce are Interpolated
and Space
(which includes comments that aren't doc comments).
lexTokens :: P (Spanned Token) -> P [Spanned Token] Source #
Apply the given lexer repeatedly until (but not including) the Eof
token. Meant for debugging
purposes - in general this defeats the point of a threaded lexer.
lexShebangLine :: P (Maybe String) Source #
Lex the first line, if it immediately starts with #!
(but not #![
- that should be an
inner attribute). If this fails to find a shebang line, it consumes no input.
Tokens
A general token (based on syntax::parse::token::Token
).
Unlike its libsyntax
counterpart, Token
has folded in syntax::parse::token::BinOpToken
and syntax::parse::token::BinOpEqToken
as regular tokens.
Equal |
|
Less |
|
Greater |
|
Ampersand |
|
Pipe |
|
Exclamation |
|
Tilde |
|
Plus |
|
Minus |
|
Star |
|
Slash |
|
Percent |
|
Caret |
|
GreaterEqual |
|
GreaterGreaterEqual |
|
AmpersandAmpersand |
|
PipePipe |
|
LessLess |
|
GreaterGreater |
|
EqualEqual |
|
NotEqual |
|
LessEqual |
|
LessLessEqual |
|
MinusEqual |
|
AmpersandEqual |
|
PipeEqual |
|
PlusEqual |
|
StarEqual |
|
SlashEqual |
|
CaretEqual |
|
PercentEqual |
|
At |
|
Dot |
|
DotDot |
|
DotDotEqual |
|
DotDotDot |
|
Comma |
|
Semicolon |
|
Colon |
|
ModSep |
|
RArrow |
|
LArrow |
|
FatArrow |
|
Pound |
|
Dollar |
|
Question |
|
OpenDelim !Delim | One of |
CloseDelim !Delim | One of |
LiteralTok LitTok (Maybe Name) | a literal token with an optional suffix (something like |
IdentTok Ident | an arbitrary identifier (something like |
Underscore |
|
LifetimeTok Ident | a lifetime (something like |
Space Space Name | whitespace |
Doc String !AttrStyle !Bool | doc comment with its contents, whether it is outer/inner, and whether it is inline or not |
Shebang |
|
Eof | end of file token |
Interpolated (Nonterminal Span) | can be expanded into several tokens in macro-expansion |
Error reporting
lexicalError :: P a Source #
Signal a lexical error.