inchworm: Simple parser combinators for lexical analysis.

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Parser combinator framework specialized to lexical analysis. Tokens are specified via simple fold functions, and we include baked in source location handling. Comes with matchers for standard lexemes like integers, comments, and Haskell style strings with escape handling. No dependencies other than the Haskell base library. If you want to parse expressions instead of tokens then try try the parsec or attoparsec packages, which have more general purpose combinators.

[Skip to Readme]

Properties

Versions	1.0.0.1, 1.0.1.1, 1.0.2.1, 1.0.2.2, 1.0.2.3, 1.0.2.4, 1.1.1.1, 1.1.1.1, 1.1.1.2
Change log	Changelog.md
Dependencies	base (>=4.8 && <4.13) [details]
License	MIT
Author	The Inchworm Development Team
Maintainer	Ben Lippmeier <benl@ouroborus.net>
Category	Parsing
Home page	https://github.com/discus-lang/inchworm
Source repo	head: git clone https://github.com/discus-lang/inchworm.git
Uploaded	by BenLippmeier at 2019-01-02T02:15:46Z

Modules

[Index] [Quick Jump]

Text
- Lexer
  - Text.Lexer.Inchworm

Downloads

inchworm-1.1.1.1.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

BenLippmeier

For package maintainers and hackage trustees

edit package information

Readme for inchworm-1.1.1.1

[back to package description]

Inchworm

Inchworm is a simple parser combinator framework specialized to lexical analysis. Tokens are specified via simple fold functions, and we include baked in source location handling.

If you want to parse expressions instead of performing lexical analysis then try the parsec or attoparsec packages, which have more general purpose combinators.

Matchers for standard tokens like comments and strings are in the Text.Lexer.Inchworm.Char module.

No dependencies other than the Haskell base library.

Minimal example

The following code demonstrates how to perform lexical analysis of a simple LISP-like language. We use two separate name classes, one for variables that start with a lower-case letter, and one for constructors that start with an upper case letter.

Integers are scanned using the scanInteger function from the Text.Lexer.Inchworm.Char module.

The result of scanStringIO contains the list of leftover input characters that could not be parsed. In a real lexer you should check that this is empty to ensure there has not been a lexical error.

import Text.Lexer.Inchworm.Char
import qualified Data.Char as Char

-- | A source token.
data Token 
        = KBra | KKet | KVar String | KCon String | KInt Integer
        deriving Show

-- | A thing with attached location information.
data Located a
        = Located FilePath (Range Location) a
        deriving Show

-- | Scanner for a lispy language.
scanner :: FilePath
        -> Scanner IO Location [Char] (Located Token)
scanner fileName
 = skip Char.isSpace
 $ alts [ fmap (stamp id)   $ accept '(' KBra
        , fmap (stamp id)   $ accept ')' KKet
        , fmap (stamp KInt) $ scanInteger 
        , fmap (stamp KVar)
          $ munchWord (\ix c -> if ix == 0 then Char.isLower c
                                           else Char.isAlpha c) 
        , fmap (stamp KCon) 
          $ munchWord (\ix c -> if ix == 0 then Char.isUpper c
                                           else Char.isAlpha c)
        ]
 where  -- Stamp a token with source location information.
        stamp k (range, t) 
          = Located fileName range (k t)

main :: IO ()
main 
 = do   let fileName = "Source.lispy"
        let source   = "(some (Lispy like) 26 Program 93 (for you))"
        toks    <- scanStringIO source (scanner fileName)
        print toks