symbol-parser
Type-level string (Symbol
) parser combinators.
Features
- Define parsers compositionally, largely as you would on the term level.
- Pretty parse errors.
- Hopefully decent performance.
- No runtime cost (you shall find no term-level code here).
Examples
ghci> import Data.Type.Symbol.Parser
ghci> :k! Run (Drop 3 :*>: Isolate 2 NatDec :<*>: (Drop 3 :*>: NatHex)) "___10___FF"
...
= Right '( '(10, 255), "")
Why?
Via GHC.Generics
, we may inspect Haskell data types on the type level.
Constructor names are Symbols
. Ever reify these, then perform some sort of
checking or parsing on the term level? symbol-parser does the parsing on the
term level instead. Catch bugs earlier, get faster runtime.
Design
Parser basics
(Note that I may omit "type-level" when referring to type-level operations.)
Until GHC 9.2, Symbol
s were largely opaque. You could get them, but you
couldn't decompose them like you could String
s with uncons :: [a] -> Maybe (a, [a])
. Some Haskellers took this as a challenge (see
symbols). But realistically, we needed a bit more compiler
support.
As of GHC 9.2, Symbol
s may be decomposed via UnconsSymbol :: Symbol -> Maybe (Char, Symbol)
. We thus design a Char
-based parser:
type ParserCh s r = Char -> s -> Result s r
data Result s r = Cont s | Done r | Err E
A parser is a function which takes a Char
, the current state s
, and returns
some result:
Cont s
: keep going, here's the next state s
Done r
: parse successful with value r
Err E
: parse error, details in the E
(a structured error)
Run
handles calling the parser Char
by Char
, threading the state through,
and does some bookkeeping for nice errors.
This is a good first step, but we have some outstanding issues:
- What do we do when we reach the end of the string?
- How do we initialize parser state?
- How do we pass parsers around? (As of 2024-04-20, GHC HEAD does not support
unsaturated type families.)
As always, types are our salvation.
type ParserEnd s r = s -> Either E r
type Parser s r = (ParserChSym s r, ParserEndSym s r, s)
type ParserChSym s r = Char ~> s ~> Result s r
type ParserEndSym s r = s ~> Either E r
We define a parser as a tuple of a character parser, an end handler, and an
initial state. The types ending in Sym
are defunctionalization symbols,
which enable us to pass our parsers around as type-level functions. The plumbing
is provided Oleg's fantastic library defun-core.
Example: NatDec
Let's write a parser that parses a natural decimal.
type NatDec = '(NatDecChSym, NatDecEndSym, 0)
type NatDecCh ch acc = NatDecCh' acc (ParseDecimalDigitSym @@ ch)
type family NatDecCh' acc mDigit where
NatDecCh' acc (Just digit) = Cont (acc * 10 + digit)
NatDecCh' acc Nothing = Err -- ...
type NatDecEnd acc = Right acc
-- boring defunctionalization symbol definitions
At each Char
, we attempt to parse as a digit. If it's valid, we multiply the
current accumulator by 10 (a left shift) and add the digit value. At the end of
the string, we simply emit the current accumulator.
It can be that easy to define a parser with symbol-parser! But it isn't always.
Combinators get weird thanks to state handling. Take a look at Isolate
, then
Then
.
Pitfall: Character parsers always consume
There is no backtracking or lookahead, that you do not implement yourself. This
keeps the parser execution extremely simple, but breaks null parsers such as
Drop 0
, so these must be handled specially (unless you don't getting mind
stuck type families on misuse).
For concrete examples, see the implementation of Drop
and Literal
.
Pitfall: Not all parsers are definable
- No changing parser state. Thus, parsers such as
Try :: Parser s r -> Parser (s, [Char]) r
are not definable. Parsers may not backtrack.
- Combinators such as
<|>
can emulate backtracking, but they are complex and
hard to reason about (they may have bugs!).
Contributing
I would gladly accept further combinators or other suggestions. Please add an
issue or pull request, or contact me via email or whatever (I'm raehik
everywhere).
License
Provided under the MIT license. See LICENSE
for license text.