symparsec: Type level string parser combinators

[ data, library, mit, types ] [ Propose Tags ] [ Report a vulnerability ]
Versions [RSS] 0.4.0, 1.0.0, 1.0.1, 1.1.0, 1.1.1
Change log CHANGELOG.md
Dependencies base (>=4.16 && <5), defun-core (>=0.1 && <0.2) [details]
Tested with ghc ==9.8, ghc ==9.6, ghc ==9.4, ghc ==9.2
License MIT
Author Ben Orchard
Maintainer Ben Orchard <thefirstmuffinman@gmail.com>
Category Types, Data
Home page https://github.com/raehik/symparsec#readme
Bug tracker https://github.com/raehik/symparsec/issues
Source repo head: git clone https://github.com/raehik/symparsec
Uploaded by raehik at 2024-05-12T16:40:36Z
Distributions LTSHaskell:1.1.1, NixOS:1.1.1
Reverse Dependencies 1 direct, 0 indirect [details]
Downloads 178 total (9 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for symparsec-0.4.0

[back to package description]

Symparsec

Type level string (Symbol) parser combinators.

It's a Parsec for Symbols; thus, Symparsec.

Previously named symbol-parser.

Features

  • Define parsers compositionally, largely as you would on the term level.
  • Pretty parse errors.
  • Hopefully decent performance.
  • No runtime cost (you shall find no term level code here).

Examples

ghci> import Symparsec
ghci> :k! Run (Drop 3 :*>: Isolate 2 NatDec :<*>: (Drop 3 :*>: NatHex)) "___10___FF"
...
= Right '( '(10, 255), "")

Why?

Via GHC.Generics, we may inspect Haskell data types on the type level. Constructor names are Symbols. Ever reify these, then perform some sort of checking or parsing on the term level? Symparsec does the parsing on the type level instead. Catch bugs earlier, get faster runtime.

Design

Parser basics

(Note that I may omit "type-level" when referring to type-level operations.)

Until GHC 9.2, Symbols were largely opaque. You could get them, but you couldn't decompose them like you could Strings with uncons :: [a] -> Maybe (a, [a]). Some Haskellers took this as a challenge (see symbols). But realistically, we needed a bit more compiler support.

As of GHC 9.2, Symbols may be decomposed via UnconsSymbol :: Symbol -> Maybe (Char, Symbol). We thus design a Char-based parser:

type ParserCh s r = Char -> s -> Result s r
data Result   s r = Cont s | Done r | Err E

A parser is a function which takes a Char, the current state s, and returns some result:

  • Cont s: keep going, here's the next state s
  • Done r: parse successful with value r
  • Err E: parse error, details in the E (a structured error)

Run handles calling the parser Char by Char, threading the state through, and does some bookkeeping for nice errors.

This is a good first step, but we have some outstanding issues:

  • What do we do when we reach the end of the string?
  • How do we initialize parser state?
  • How do we pass parsers around? (As of 2024-04-20, GHC HEAD does not support unsaturated type families.)

As always, types are our salvation.

type ParserEnd s r = s -> Either E r
type Parser s r = (ParserChSym s r, ParserEndSym s r, s)

type ParserChSym s r = Char ~> s ~> Result s r
type ParserEndSym s r = s ~> Either E r

We define a parser as a tuple of a character parser, an end handler, and an initial state. The types ending in Sym are defunctionalization symbols, which enable us to pass our parsers around as type-level functions. The plumbing is provided Oleg's fantastic library defun-core.

Example: NatDec

Let's write a parser that parses a natural decimal.

type NatDec = '(NatDecChSym, NatDecEndSym, 0)

type NatDecCh ch acc = NatDecCh' acc (ParseDecimalDigitSym @@ ch)
type family NatDecCh' acc mDigit where
    NatDecCh' acc (Just digit) = Cont (acc * 10 + digit)
    NatDecCh' acc Nothing      = Err -- ...

type NatDecEnd acc = Right acc

-- boring defunctionalization symbol definitions

At each Char, we attempt to parse as a digit. If it's valid, we multiply the current accumulator by 10 (a left shift) and add the digit value. At the end of the string, we simply emit the current accumulator.

It can be that easy to define a parser with Symparsec! But it isn't always. Combinators get weird thanks to state handling. Take a look at Isolate, then Then.

Pitfall: Character parsers always consume

There is no backtracking or lookahead, that you do not implement yourself. This keeps the parser execution extremely simple, but breaks null parsers such as Drop 0, so these must be handled specially (unless you don't getting mind stuck type families on misuse).

For concrete examples, see the implementation of Drop and Literal.

Pitfall: Not all parsers are definable

  • No changing parser state. Thus, parsers such as Try :: Parser s r -> Parser (s, [Char]) r are not definable. Parsers may not backtrack.
    • Combinators such as <|> can emulate backtracking, but they are complex and hard to reason about (they may have bugs!).

Contributing

I would gladly accept further combinators or other suggestions. Please add an issue or pull request, or contact me via email or whatever (I'm raehik everywhere).

License

Provided under the MIT license. See LICENSE for license text.