S-expresso
S-expresso is a Haskell library designed to help you parse and print
data or source code encoded as an S-expression. It provides a very
flexible parser and (for now) a flat printer.
What is an S-expression
Basically, an S-expression is a special form of tree structured
data. An S-expression object is either an atom or a list of atoms and other S-expressions.
This datatype is the definition of an S-expression for
S-expresso.
data SExpr b a = SList b [SExpr b a]
| SAtom a
The parameter a allows you to specify the datatype of atoms and the
parameter b is usefull for keeping metadata about S-expression like
source position for example.
SExpr is not equivalent to [a] because the later cannot
distinguish between an atom (SAtom _) and a tree containing only one
atom (SList _ [SAtom _]). SExpr is also not equivalent to Tree a
from Data.Tree because the later cannot encode the empty tree
(SList _ []) and does not enforce that atoms are at the leaves.
The Sexp type
If you are only interested by the atoms, you can use the type alias
Sexp that is a variant of the more general 'SExpr' data type with no
data for the 'SList' constructor.
type Sexp a = SExpr () a
This type also comes with a bidirectional pattern synonym also named
Sexp for object of the form SExpr () _.
x = Sexp [A 3] <-> x = SList () [SAtom 3]
foo (Sexp xs) <-> foo (SList () xs)
foo (Sexp (Sexp ys : A x : xs)) <-> foo (SList () (SList () ys : SAtom x : xs))
Pattern synonyms
S-expresso defines four pattern synonyms to ease your programming with
SExpr. The patterns L helps you match the SList constructor and only
its sublist, disregarding the b field. The pattern ::: and Nil helps
you specify the shape of the sublist of an SList constructor and
finally the pattern A is a shorthand for SAtom.
Together they make working with SExpr a little easier.
a = A 3 <-> a = SAtom 3
foo (A x) <-> foo (SAtom x)
foo (A x1 ::: A x2 ::: Nil) <-> foo (SList _ [SAtom x1, SAtom x2])
foo (A x ::: L xs)) <-> foo (SList _ (SAtom x : xs))
foo (L ys ::: A x ::: L xs)) <-> foo (SList _ (SList _ ys : SAtom x : xs))
foo (L x) <-> foo (SList _ x)
Notice that you need to end the pattern ::: with Nil for the empty
list or L xs for matching the remainder of the list. Indeed, if you write
foo (x ::: xs) = ...
this is equivalent to :
foo (SList b (x : rest)) = let xs = SList b rest
in ...
You can refer to the documentation of the ::: constructor for more information.
Parsing S-expressions
The parsing is based on
megaparsec. S-expresso
allows you to customize the following :
- The parser for atoms
- The opening tag (usually "("), the closing tag (usually ")") and a
possible dependency of the closing tag on the opening one.
- If some space is required or optional between any pair of atoms.
- How to parse space (ex: treat comments as whitespace)
The library offers amoung others the decodeOne and decode
functions. The former only reads one S-expression while the other
parses many S-expressions. Both functions creates a megaparsec
parser from a SExprParser argument.
The SExprParser is the data type that defines how to read an
S-expression. The easiest way to create a SExprParser is to use the
function plainSExprParser with your own custom atom parser. This
will create a parser where S-expression starts with "(", ends with ")"
and space is mandatory between atoms.
Import Data.Void
Import qualified Data.Text as T
Import Text.Megaparsec
Import Text.Megaparsec.Char
Import qualified Text.Megaparser.Char.Lexer as L
atom = some letter
sexp = decode $ plainSExprParser atom
-- Returns (SList () [SAtom "hello", SAtom "world"])
ex1 = parse sexp "" "(hello world)"
-- Returns (SList () [SAtom "hello", SAtom "world", SList () [SAtom "bonjour"]])
ex2 = parse sexp "" " (hello world(bonjour)) "
-- Returns SAtom "hola"
ex2 = parse sexp "" "hola"
Customizing the SExprParser
S-expresso provides many functions to modify the behavior of the
parser. For example, you can use the functions setTags,
setTagsFromList, setSpace and setSpacingRule to modify the
behavior of the parser. Following on the preceding example:
-- setTags
data MyType = List | Vector
listOrVector =
let sTag = (char '(' >> return List) <|> (string "#(" >> return Vector)
eTag = \t -> char ')' >> return t
p = setTags sTag eTag $
plainSExprParser atom
in decode p
-- Returns (SList List [SList Vector [SAtom "a", SAtom "b"], SAtom "c"])
ex3 = parse listOrVector "" "(#(a b) c)"
-- setTagsFromList
listOrVector2 = decode $
setTagsFromList [("(",")",List),("#(",")",Vector)] $
plainSExprParser atom
-- Returns (SList List [SList Vector [SAtom "a", SAtom "b"], SAtom "c"])
ex4 = parse listOrVector2 "" "(#(a b) c)"
-- setSpace
withComments = decode $
-- See megaparsec Space in Megaparsec.Char.Lexer
setSpace (L.Space Space1 (skipLineComment ";") empty) $
plainSExprParser atom
-- Returns (SList () [SAtom "hello", SList () [SAtom "bonjour"]])
ex5 = parse withComments "" "(hello ;world\n (bonjour))"
-- setSpacingRule
optionalSpace = decode $
setSpacingRule spaceIsOptional $
plainSExprParser (some letter <|> some digitChar)
-- Returns (SList () [SAtom "hello", SAtom "1234", SAtom "world"])
ex5 = parse optionalSpace "" "(hello1234world)"
You can also directly build a custom SExprParser with the constructor SExprParser.
Adding Source Location
If you need the source position of the atoms and s-expression, the
function withLocation transforms an SExprParser b a into
SExprParser (Located b) (Located a). The Located datatype is
defined
here.