sexpresso: A flexible library for parsing and printing S-expression

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Please see the README on GitHub at https://github.com/archambaultv/sexpresso#readme

[Skip to Readme]

Properties

Versions	1.0.0.2, 1.0.0.2, 1.1.0.0, 1.2.0.0, 1.2.1.0, 1.2.2.0, 1.2.3.0, 1.2.4.0, 1.2.5.0
Change log	ChangeLog.md
Dependencies	base (>=4.7 && <5), containers (>=0.5 && <0.7), megaparsec (>=7.0 && <8.0), text (>=0.2 && <1.3) [details]
License	LicenseRef-OtherLicense
Copyright	Vincent Archambault-Bouffard
Author	Vincent Archambault-Bouffard
Maintainer	archambault.v@gmail.com
Category	Data
Home page	https://github.com/archambaultv/sexpresso#readme
Bug tracker	https://github.com/archambaultv/sexpresso/issues
Source repo	head: git clone https://github.com/archambaultv/sexpresso
Uploaded	by VincentArchambault at 2019-11-07T16:15:44Z

Modules

[Index] [Quick Jump]

Data
- SExpresso

Downloads

sexpresso-1.0.0.2.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

OliverCharles, VincentArchambault

For package maintainers and hackage trustees

edit package information

Readme for sexpresso-1.0.0.2

[back to package description]

S-expresso

S-expresso is a Haskell library designed to help you parse and print data or source code encoded as an S-expression. It provides a very flexible parser and (for now) a flat printer.

What is an S-expression

Basically, an S-expression is a special form of tree structured data. An S-expression object is either an atom or a list of atoms and other S-expressions.

This datatype is the definition of an S-expression for S-expresso.

data SExpr b a = SList b [SExpr b a]
               | SAtom a

The parameter a allows you to specify the datatype of atoms and the parameter b is usefull for keeping metadata about S-expression like source position for example.

SExpr is not equivalent to [a] because the later cannot distinguish between an atom (SAtom _) and a tree containing only one atom (SList _ [SAtom _]). SExpr is also not equivalent to Tree a from Data.Tree because the later cannot encode the empty tree (SList _ []) and does not enforce that atoms are at the leaves.

The Sexp type

If you are only interested by the atoms, you can use the type alias Sexp that is a variant of the more general 'SExpr' data type with no data for the 'SList' constructor.

type Sexp a = SExpr () a

This type also comes with a bidirectional pattern synonym also named Sexp for object of the form SExpr () _.

x = Sexp [A 3]                   <-> x = SList () [SAtom 3]
foo (Sexp xs)                    <-> foo (SList () xs)
foo (Sexp (Sexp ys : A x : xs))  <-> foo (SList () (SList () ys : SAtom x : xs))

Pattern synonyms

S-expresso defines four pattern synonyms to ease your programming with SExpr. The patterns L helps you match the SList constructor and only its sublist, disregarding the b field. The pattern ::: and Nil helps you specify the shape of the sublist of an SList constructor and finally the pattern A is a shorthand for SAtom.

Together they make working with SExpr a little easier.

a = A 3                      <-> a = SAtom 3
foo (A x)                    <-> foo (SAtom x)
foo (A x1 ::: A x2 ::: Nil)  <-> foo (SList _ [SAtom x1, SAtom x2])
foo (A x ::: L xs))          <-> foo (SList _ (SAtom x : xs))
foo (L ys ::: A x ::: L xs)) <-> foo (SList _ (SList _ ys : SAtom x : xs))
foo (L x)                    <-> foo (SList _ x)

Notice that you need to end the pattern ::: with Nil for the empty list or L xs for matching the remainder of the list. Indeed, if you write

foo (x ::: xs) = ...

this is equivalent to :

foo (SList b (x : rest)) = let xs = SList b rest
                           in ...

You can refer to the documentation of the ::: constructor for more information.

Parsing S-expressions

The parsing is based on megaparsec. S-expresso allows you to customize the following :

The parser for atoms
The opening tag (usually "("), the closing tag (usually ")") and a possible dependency of the closing tag on the opening one.
If some space is required or optional between any pair of atoms.
How to parse space (ex: treat comments as whitespace)

The library offers amoung others the decodeOne and decode functions. The former only reads one S-expression while the other parses many S-expressions. Both functions creates a megaparsec parser from a SExprParser argument.

The SExprParser is the data type that defines how to read an S-expression. The easiest way to create a SExprParser is to use the function plainSExprParser with your own custom atom parser. This will create a parser where S-expression starts with "(", ends with ")" and space is mandatory between atoms.

Import Data.Void
Import qualified Data.Text as T
Import Text.Megaparsec
Import Text.Megaparsec.Char
Import qualified Text.Megaparser.Char.Lexer as L

atom = some letter

sexp = decode $ plainSExprParser atom

-- Returns (SList () [SAtom "hello", SAtom "world"])
ex1 = parse sexp "" "(hello world)"

-- Returns (SList () [SAtom "hello", SAtom "world", SList () [SAtom "bonjour"]])
ex2 = parse sexp "" "  (hello world(bonjour))  "

-- Returns SAtom "hola"
ex2 = parse sexp "" "hola"

Customizing the SExprParser

S-expresso provides many functions to modify the behavior of the parser. For example, you can use the functions setTags, setTagsFromList, setSpace and setSpacingRule to modify the behavior of the parser. Following on the preceding example:

-- setTags
data MyType = List | Vector

listOrVector =
  let sTag = (char '(' >> return List) <|> (string "#(" >> return Vector)
      eTag = \t -> char ')' >> return t
      p = setTags sTag eTag $
          plainSExprParser atom
  in decode p

-- Returns (SList List [SList Vector [SAtom "a", SAtom "b"], SAtom "c"])
ex3 = parse listOrVector "" "(#(a b) c)"

-- setTagsFromList
listOrVector2 = decode $ 
                setTagsFromList [("(",")",List),("#(",")",Vector)] $
                plainSExprParser atom


-- Returns (SList List [SList Vector [SAtom "a", SAtom "b"], SAtom "c"])
ex4 = parse listOrVector2 "" "(#(a b) c)"

-- setSpace
withComments = decode $
               -- See megaparsec Space in Megaparsec.Char.Lexer
               setSpace (L.Space Space1 (skipLineComment ";") empty) $
               plainSExprParser atom

-- Returns (SList () [SAtom "hello", SList () [SAtom "bonjour"]])
ex5 = parse withComments "" "(hello ;world\n (bonjour))"

-- setSpacingRule
optionalSpace = decode $
                setSpacingRule spaceIsOptional $
                plainSExprParser (some letter <|> some digitChar)

-- Returns (SList () [SAtom "hello", SAtom "1234", SAtom "world"])
ex5 = parse optionalSpace "" "(hello1234world)"

You can also directly build a custom SExprParser with the constructor SExprParser.

Adding Source Location

If you need the source position of the atoms and s-expression, the function withLocation transforms an SExprParser b a into SExprParser (Located b) (Located a). The Located datatype is defined here.