parser-regex: Regex based parsers

[ bsd3, library, parsing ] [ Propose Tags ] [ Report a vulnerability ]

Regex based parsers. See

Regex.Text: To work with Text from the text library.
Regex.List: To work with Strings or lists.
Regex.Base: To work with other sequences.

[Skip to Readme]

Modules

[Index] [Quick Jump]

Data
- Data.CharSet
Regex

Downloads

parser-regex-0.3.0.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

meooow

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.0, 0.2.0.0, 0.2.0.1, 0.2.0.2, 0.3.0.0
Change log	CHANGELOG.md
Dependencies	base (>=4.15 && <5.0), containers (>=0.6.4 && <0.9), deepseq (>=1.4.5 && <1.6), ghc-bignum (>=1.1 && <1.4), primitive (>=0.7.3 && <0.10), text (>=2.0.1 && <2.2), transformers (>=0.5.6 && <0.7) [details]
Tested with	ghc ==9.0.2, ghc ==9.2.8, ghc ==9.4.8, ghc ==9.6.6, ghc ==9.8.4, ghc ==9.10.1, ghc ==9.12.1
License	BSD-3-Clause
Author	Soumik Sarkar
Maintainer	soumiksarkar.3120@gmail.com
Category	Parsing
Home page	https://github.com/meooow25/parser-regex
Bug tracker	https://github.com/meooow25/parser-regex/issues
Source repo	head: git clone https://github.com/meooow25/parser-regex.git
Uploaded	by meooow at 2025-04-19T07:34:07Z
Distributions	Stackage:0.3.0.0
Reverse Dependencies	1 direct, 12 indirect [details]
Downloads	157 total (2 in the last 30 days)
Rating	2.0 (votes: 1) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs available [build log] Last success reported on 2025-04-19 [all 1 reports]

Readme for parser-regex-0.3.0.0

[back to package description]

parser-regex

Regex based parsers

Features

Parsers based on regular expressions, capable of parsing regular languages. Note that there are no extra features to make parsing non-regular languages possible.
Regexes are composed using combinators.
Resumable parsing of sequences of any type containing values of any type.
Special support for Text and String in the form of convenient combinators and operations like find and replace.
Parsing runtime is linear in the length of the sequence being parsed. No exponential backtracking.

Examples

Versus regex patterns

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

Can you guess what this matches?

This is a non-validating regex to extract parts of a URI, from RFC 3986. It can be translated as follows.

{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative (optional)
import Data.Text (Text)

import Regex.Text (REText)
import qualified Regex.Text as R
import qualified Data.CharSet as CS

data URI = URI
  { scheme    :: Maybe Text
  , authority :: Maybe Text
  , path      :: Text
  , query     :: Maybe Text
  , fragment  :: Maybe Text
  } deriving Show

uriRE :: REText URI
uriRE = URI
  <$> optional (R.someTextOf (CS.not ":/?#") <* R.char ':')
  <*> optional (R.text "//" *> R.manyTextOf (CS.not "/?#"))
  <*> R.manyTextOf (CS.not "?#")
  <*> optional (R.char '?' *> R.manyTextOf (CS.not "#"))
  <*> optional (R.char '#' *> R.manyText)

>>> R.reParse uriRE "https://github.com/meooow25/parser-regex?tab=readme-ov-file#parser-regex"
Just (URI { scheme = Just "https"
          , authority = Just "github.com"
          , path = "/meooow25/parser-regex"
          , query = Just "tab=readme-ov-file"
          , fragment = Just "parser-regex" })

Parsing

Parsing is straightforward, even for tasks which may be impractical with submatch extraction typically offered by regex libraries.

import Control.Applicative ((<|>))
import Data.Text (Text)

import Regex.Text (REText)
import qualified Regex.Text as R
import qualified Data.CharSet as CS

data Expr
  = Var Text
  | Expr :+ Expr
  | Expr :- Expr
  | Expr :* Expr
  deriving Show

exprRE :: REText Expr
exprRE = var `R.chainl1` mul `R.chainl1` (add <|> sub)
  where
    var = Var <$> R.someTextOf CS.asciiLower
    add = (:+) <$ R.char '+'
    sub = (:-) <$ R.char '-'
    mul = (:*) <$ R.char '*'

>>> import qualified Regex.Text as R
>>> R.reParse exprRE "a+b-c*d*e+f"
Just (((Var "a" :+ Var "b") :- ((Var "c" :* Var "d") :* Var "e")) :+ Var "f")

Find and replace

Find and replace using regexes are supported for Text and lists.

>>> import Control.Applicative ((<|>))
>>> import qualified Data.Text as T
>>> import qualified Regex.Text as R
>>>
>>> data Color = Blue | Orange deriving Show
>>> let re = Blue <$ R.text "blue" <|> Orange <$ R.text "orange"
>>> R.find re "color: orange"
Just Orange
>>>
>>> let re = T.toUpper <$> (R.text "cat" <|> R.text "dog" <|> R.text "fish")
>>> R.replaceAll re "locate selfish hotdog"
"loCATe selFISH hotDOG"

Parse any sequence

Parsing is not restricted to text. One can parse a vector, a conduit, or any other sequence one might have.

import qualified Regex.Base as R
import qualified Data.Vector.Generic as VG -- from vector
import qualified Conduit as C -- from conduit

parseVector :: VG.Vector v c => R.Parser c a -> v c -> Maybe a
parseVector = R.parseFoldr VG.foldr

parseConduit :: Monad m => R.Parser c a -> C.ConduitT c x m (Maybe a)
parseConduit p = R.parseNext p C.await <* C.sinkNull

>>> import Control.Applicative (many)
>>> import qualified Regex.Base as R
>>> :{
let evenOddP :: R.Parser Int [(Int, Int)]
    evenOddP = R.compile $ many ((,) <$> R.satisfy even <*> R.satisfy odd)
:}
>>>
>>> import qualified Data.Vector as V
>>> parseVector evenOddP (V.fromList [6,1,2,5,4,3])
Just [(6,1),(2,5),(4,3)]
>>> parseVector evenOddP (V.fromList [4,3,1,2])
Nothing
>>>
>>> import Conduit ((.|))
>>> import qualified Conduit as C
>>> C.runConduit $ C.yieldMany [0..3] .| C.iterMC print .| parseConduit evenOddP
0
1
2
3
Just [(0,1),(2,3)]

Documentation

Documentation is available on Hackage: parser-regex

Already familiar with regex patterns? See the Regex pattern cheat sheet.

Alternatives

`regex-applicative`

regex-applicative is the primary inspiration for this library, and is similar in many ways.

parser-regex attempts to be a more efficient and featureful library built on the ideas of regex-applicative, though it does not aim to provide a superset of regex-applicative's API.

Traditional regex libraries

These libraries use regex patterns.

Consider using these if

The terseness of regex patterns is well-suited for your use case.
You need something very fast for typical use cases. regex-pcre, regex-pcre-builtin, pcre-light, pcre-heavy are faster than parser-regex for typical use cases, but there are trade-offs—such as losing Unicode support and a risk of ReDoS.

Use parser-regex instead if

You prefer parser combinators over regex patterns
You need more powerful parsing capabilities than just submatch extraction
You need to parse a sequence that is not supported by the above libraries

For a detailed comparison of regex libraries, see here.

Other options

If you are not restricted to regexes, there are many other parsing libraries you may use, too many to list here. See the "Parsing" category on Hackage for a start.

Contributing

Questions, bug reports, documentation improvements, code contributions welcome! Please open an issue as the first step.