willow-0.1.0.0: An implementation of the web Document Object Model, and its rendering.
Copyright(c) 2020 Sam May
LicenseMPL-2.0
Maintainerag.eitilt@gmail.com
Stabilityexperimental
Portabilityportable
Safe HaskellSafe-Inferred
LanguageHaskell98

Web.Willow.Common.Parser

Description

The existing parsing libraries are wonderful, but backtracking parsers have a bad habit of being strict in their output; sure, you might be able to operate over Data.ByteString.Lazy, but they all expect to consume their entire input before handing you their result. Data.Attoparsec's continuations fully lean into that---even though you don't have to provide all the input in one block, you can't get a value before closing it out. Text.Megaparsec does provide a reentrant form in runParser', but it also comes with comparatively heavyweight error and pretty-printing features.

For complicated formats, those all can indeed be desirable. However, the HTML algorithms have been optimized for minimal lookahead and certainly no output revocation---once something is shipped out, it's not going to be called back. Not taking advantage of that by using a lazy output type means that parsing would always be subject to the whims of slow or unreliable network connections. Moreover, the entire complexity of the parsing algorithm is built around never reaching a fatal failure condition, so error handling and especially recovery are unnecessary overhead.

And so, a custom parsing framework must be defined.

Synopsis

Concrete types

type Parser stream = ParserT stream Maybe Source #

Unlike most monad transformers, a Parser is built around the concept of success and failure, so its "default" form is better structured over Maybe than over Identity.

runParser :: Parser stream out -> stream -> Maybe (out, stream) Source #

Set the constructed parser loose on a given input. Returns both the resulting value and the remaining contents of the Stream.

newtype ParserT stream gather out Source #

Encapsulation of an operation for transforming the head of a Stream into some other value. Standard usage, with similar behaviour to other Text.Parsec-derived parsers, ("accept the first which matches") may be obtained by instantiating gather with Maybe, or non-deterministic parsing ("accept any of these") through [].

Notably, this implementation is designed to allow laziness in both input and output. For the best usage, therefore, consume as little input at a time as possible, and so call runParser often).

As part of this simplification, all Text.Parsec-style integrated state (use StateT) and Text.Megaparsec-style error pretty-printing (build your position tracking into the stream, and/or wrap the output in Either) has been stripped out.

Constructors

ParserT 

Fields

Instances

Instances details
Monad gather => MonadState stream (ParserT stream gather) Source #

Operates over the input that has not yet been processed.

Note that this therefore provides the means for forcing an early end-of-stream:

put mempty
Instance details

Defined in Web.Willow.Common.Parser

Methods

get :: ParserT stream gather stream #

put :: stream -> ParserT stream gather () #

state :: (stream -> (a, stream)) -> ParserT stream gather a #

Monad gather => MonadReader stream (ParserT stream gather) Source #

Performs an action on the current input without consuming it; i.e. ask is identical to get.

Instance details

Defined in Web.Willow.Common.Parser

Methods

ask :: ParserT stream gather stream #

local :: (stream -> stream) -> ParserT stream gather a -> ParserT stream gather a #

reader :: (stream -> a) -> ParserT stream gather a #

MonadError err gather => MonadError err (ParserT stream gather) Source #

throwError is a ParserT which fails without consuming any input. catchError runs the recovery parser over the same input as was passed to the original (failing) parser.

Instance details

Defined in Web.Willow.Common.Parser

Methods

throwError :: err -> ParserT stream gather a #

catchError :: ParserT stream gather a -> (err -> ParserT stream gather a) -> ParserT stream gather a #

MonadTrans (ParserT stream) Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lift :: Monad m => m a -> ParserT stream m a #

Monad gather => Monad (ParserT stream gather) Source #

(>>=) runs the ParserT resulting from the right (generation function) argument over the remaining input after the left (value) ParserT returns.

Instance details

Defined in Web.Willow.Common.Parser

Methods

(>>=) :: ParserT stream gather a -> (a -> ParserT stream gather b) -> ParserT stream gather b #

(>>) :: ParserT stream gather a -> ParserT stream gather b -> ParserT stream gather b #

return :: a -> ParserT stream gather a #

Functor gather => Functor (ParserT stream gather) Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

fmap :: (a -> b) -> ParserT stream gather a -> ParserT stream gather b #

(<$) :: a -> ParserT stream gather b -> ParserT stream gather a #

Monad gather => MonadFix (ParserT stream gather) Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

mfix :: (a -> ParserT stream gather a) -> ParserT stream gather a #

MonadFail gather => MonadFail (ParserT stream gather) Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

fail :: String -> ParserT stream gather a #

Monad gather => Applicative (ParserT stream gather) Source #

pure is a ParserT which succeeds without consuming any input. (<*>) and the other sequencing functions run the right ParserT over the remaining input after the left ParserT returns.

Instance details

Defined in Web.Willow.Common.Parser

Methods

pure :: a -> ParserT stream gather a #

(<*>) :: ParserT stream gather (a -> b) -> ParserT stream gather a -> ParserT stream gather b #

liftA2 :: (a -> b -> c) -> ParserT stream gather a -> ParserT stream gather b -> ParserT stream gather c #

(*>) :: ParserT stream gather a -> ParserT stream gather b -> ParserT stream gather b #

(<*) :: ParserT stream gather a -> ParserT stream gather b -> ParserT stream gather a #

(Alternative gather, Monad gather) => Alternative (ParserT stream gather) Source #

empty is a ParserT which fails without consuming any input. (<|>) applies both ParserTs to the same input ("automatically backtracks").

Instance details

Defined in Web.Willow.Common.Parser

Methods

empty :: ParserT stream gather a #

(<|>) :: ParserT stream gather a -> ParserT stream gather a -> ParserT stream gather a #

some :: ParserT stream gather a -> ParserT stream gather [a] #

many :: ParserT stream gather a -> ParserT stream gather [a] #

(Alternative gather, Monad gather) => MonadPlus (ParserT stream gather) Source #

mzero is a ParserT which fails without consuming input, while mplus applies both to the same input, modulo the semantics of the Alternative gather instance.

Instance details

Defined in Web.Willow.Common.Parser

Methods

mzero :: ParserT stream gather a #

mplus :: ParserT stream gather a -> ParserT stream gather a -> ParserT stream gather a #

MonadIO gather => MonadIO (ParserT stream gather) Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

liftIO :: IO a -> ParserT stream gather a #

MonadCont gather => MonadCont (ParserT stream gather) Source #

The parser the inner function generates is run over the remaining input after the argument function runs (thus generating the inner function).

Instance details

Defined in Web.Willow.Common.Parser

Methods

callCC :: ((a -> ParserT stream gather b) -> ParserT stream gather a) -> ParserT stream gather a #

(Alternative gather, Monad gather, Stream stream token, Monoid stream) => MonadParser (ParserT stream gather) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: ParserT stream gather out -> ParserT stream gather out Source #

avoiding :: ParserT stream gather out -> ParserT stream gather () Source #

next :: ParserT stream gather token Source #

nextChunk :: Word -> ParserT stream gather stream Source #

push :: token -> ParserT stream gather () Source #

pushChunk :: stream -> ParserT stream gather () Source #

abridge :: ParserT stream gather stream Source #

(Monad gather, Semigroup out) => Semigroup (ParserT stream gather out) Source #

(<>) runs the right ParserT over the remaining input after the left ParserT returns.

Instance details

Defined in Web.Willow.Common.Parser

Methods

(<>) :: ParserT stream gather out -> ParserT stream gather out -> ParserT stream gather out #

sconcat :: NonEmpty (ParserT stream gather out) -> ParserT stream gather out #

stimes :: Integral b => b -> ParserT stream gather out -> ParserT stream gather out #

(Monad gather, Monoid out) => Monoid (ParserT stream gather out) Source #

mempty is a ParserT which succeeds without consuming any input; it is therefore identical to pure mempty and not empty or mzero.

Instance details

Defined in Web.Willow.Common.Parser

Methods

mempty :: ParserT stream gather out #

mappend :: ParserT stream gather out -> ParserT stream gather out -> ParserT stream gather out #

mconcat :: [ParserT stream gather out] -> ParserT stream gather out #

type StateParser state stream = StateT state (Parser stream) Source #

Purely a convenience of the package rather than the module, the state machines described by the HTML standard all involve some degree of persistence, and so are built over a deeper monad stack. This could easily one of the most common transformers to add, anyway, no matter what input is being parsed.

Parsing combinators

class (Alternative m, Monad m, Stream stream token, Monoid stream) => MonadParser m stream token | m -> stream where Source #

Generalize the transformation of an input Stream into a more meaningful value. This class provides the basic building blocks from which more expressive such parsers may be constructed.

See also the description of ParserT for some of the design decisions.

Methods

lookAhead :: m out -> m out Source #

Runs the argument parser on the current input, without consuming any of it; these are identical semantics to saving and restoring the input after running the computation, assuming the MonadState instance runs over the input stream (see ParserT):

input <- get
a <- parser
put input
a <- lookAhead parser

avoiding :: m out -> m () Source #

Succeeds if and only if the argument parser fails (the input is not consumed).

next :: m token Source #

Retrieve the next token in the stream, whatever it may be. Identical to uncons in all but type.

nextChunk :: Word -> m stream Source #

Retrieve the next several tokens in the stream. Identical to count (with a safer index type) in the case that gather is a list [token].

If fewer tokens are in the input stream than asked for, returns what does remain in the input stream.

push :: token -> m () Source #

Prepend a token to the input stream to be processed next. Identical to operating on the stream directly through MonadState, if that instance also exists.

stream <- get
put $ cons tok stream
push tok

pushChunk :: stream -> m () Source #

Concatenate the given sequence with the existing input, processing the argument before the older stream.

abridge :: m stream Source #

Drop the remainder of the input, simulating an early end-of-stream. Can be emulated through appropriate MonadState and Monoid instances:

stream <- get
put mempty
return stream
abridge

Instances

Instances details
MonadParser trans stream token => MonadParser (MaybeT trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: MaybeT trans out -> MaybeT trans out Source #

avoiding :: MaybeT trans out -> MaybeT trans () Source #

next :: MaybeT trans token Source #

nextChunk :: Word -> MaybeT trans stream Source #

push :: token -> MaybeT trans () Source #

pushChunk :: stream -> MaybeT trans () Source #

abridge :: MaybeT trans stream Source #

MonadParser trans stream token => MonadParser (IdentityT trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: IdentityT trans out -> IdentityT trans out Source #

avoiding :: IdentityT trans out -> IdentityT trans () Source #

next :: IdentityT trans token Source #

nextChunk :: Word -> IdentityT trans stream Source #

push :: token -> IdentityT trans () Source #

pushChunk :: stream -> IdentityT trans () Source #

abridge :: IdentityT trans stream Source #

(MonadParser trans stream token, Monoid except) => MonadParser (ExceptT except trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: ExceptT except trans out -> ExceptT except trans out Source #

avoiding :: ExceptT except trans out -> ExceptT except trans () Source #

next :: ExceptT except trans token Source #

nextChunk :: Word -> ExceptT except trans stream Source #

push :: token -> ExceptT except trans () Source #

pushChunk :: stream -> ExceptT except trans () Source #

abridge :: ExceptT except trans stream Source #

MonadParser trans stream token => MonadParser (ReaderT reader trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: ReaderT reader trans out -> ReaderT reader trans out Source #

avoiding :: ReaderT reader trans out -> ReaderT reader trans () Source #

next :: ReaderT reader trans token Source #

nextChunk :: Word -> ReaderT reader trans stream Source #

push :: token -> ReaderT reader trans () Source #

pushChunk :: stream -> ReaderT reader trans () Source #

abridge :: ReaderT reader trans stream Source #

(MonadParser trans stream token, MonadPlus trans) => MonadParser (StateT state trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: StateT state trans out -> StateT state trans out Source #

avoiding :: StateT state trans out -> StateT state trans () Source #

next :: StateT state trans token Source #

nextChunk :: Word -> StateT state trans stream Source #

push :: token -> StateT state trans () Source #

pushChunk :: stream -> StateT state trans () Source #

abridge :: StateT state trans stream Source #

(MonadParser trans stream token, MonadPlus trans) => MonadParser (StateT state trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: StateT state trans out -> StateT state trans out Source #

avoiding :: StateT state trans out -> StateT state trans () Source #

next :: StateT state trans token Source #

nextChunk :: Word -> StateT state trans stream Source #

push :: token -> StateT state trans () Source #

pushChunk :: stream -> StateT state trans () Source #

abridge :: StateT state trans stream Source #

(MonadParser trans stream token, Monoid writer) => MonadParser (WriterT writer trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: WriterT writer trans out -> WriterT writer trans out Source #

avoiding :: WriterT writer trans out -> WriterT writer trans () Source #

next :: WriterT writer trans token Source #

nextChunk :: Word -> WriterT writer trans stream Source #

push :: token -> WriterT writer trans () Source #

pushChunk :: stream -> WriterT writer trans () Source #

abridge :: WriterT writer trans stream Source #

(MonadParser trans stream token, Monoid writer) => MonadParser (WriterT writer trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: WriterT writer trans out -> WriterT writer trans out Source #

avoiding :: WriterT writer trans out -> WriterT writer trans () Source #

next :: WriterT writer trans token Source #

nextChunk :: Word -> WriterT writer trans stream Source #

push :: token -> WriterT writer trans () Source #

pushChunk :: stream -> WriterT writer trans () Source #

abridge :: WriterT writer trans stream Source #

(MonadParser trans stream token, Monoid accum, MonadPlus trans) => MonadParser (AccumT accum trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: AccumT accum trans out -> AccumT accum trans out Source #

avoiding :: AccumT accum trans out -> AccumT accum trans () Source #

next :: AccumT accum trans token Source #

nextChunk :: Word -> AccumT accum trans stream Source #

push :: token -> AccumT accum trans () Source #

pushChunk :: stream -> AccumT accum trans () Source #

abridge :: AccumT accum trans stream Source #

(Alternative gather, Monad gather, Stream stream token, Monoid stream) => MonadParser (ParserT stream gather) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: ParserT stream gather out -> ParserT stream gather out Source #

avoiding :: ParserT stream gather out -> ParserT stream gather () Source #

next :: ParserT stream gather token Source #

nextChunk :: Word -> ParserT stream gather stream Source #

push :: token -> ParserT stream gather () Source #

pushChunk :: stream -> ParserT stream gather () Source #

abridge :: ParserT stream gather stream Source #

(MonadParser trans stream token, Monoid writer, MonadPlus trans) => MonadParser (RWST reader writer state trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: RWST reader writer state trans out -> RWST reader writer state trans out Source #

avoiding :: RWST reader writer state trans out -> RWST reader writer state trans () Source #

next :: RWST reader writer state trans token Source #

nextChunk :: Word -> RWST reader writer state trans stream Source #

push :: token -> RWST reader writer state trans () Source #

pushChunk :: stream -> RWST reader writer state trans () Source #

abridge :: RWST reader writer state trans stream Source #

(MonadParser trans stream token, Monoid writer, MonadPlus trans) => MonadParser (RWST reader writer state trans) stream token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

lookAhead :: RWST reader writer state trans out -> RWST reader writer state trans out Source #

avoiding :: RWST reader writer state trans out -> RWST reader writer state trans () Source #

next :: RWST reader writer state trans token Source #

nextChunk :: Word -> RWST reader writer state trans stream Source #

push :: token -> RWST reader writer state trans () Source #

pushChunk :: stream -> RWST reader writer state trans () Source #

abridge :: RWST reader writer state trans stream Source #

end :: MonadParser trans stream token => trans () Source #

Succeeds if and only if the input is empty.

satisfying :: MonadParser trans stream token => (out -> Bool) -> out -> trans out Source #

Succeeds if and only if the value parsed by the argument parser satisfies the predicate. No further input is consumed.

token :: (MonadParser trans stream token, Eq token) => token -> trans token Source #

Expect a specific token from the Stream, and fail if a different token is found instead. Identical to running satisfying with equality in the (by far most likely) case that gather is a Monad in addition to an Alternative:

tok <- next >>= satisfying (== desired)
tok <- token desired

chunk :: (MonadParser trans stream token, Eq stream) => stream -> trans stream Source #

Expect a specific sequence of tokens from the Stream, and fail if anything else is found instead, or if the Stream doesn't have enough characters before its end. Identical to running satisfying with equality over nextChunk in the case that stream is an Eq (which all provided instances are) and can easily provide a length (which they do, unless the sequence to test against also needs to be lazy).

stream <- nextChunk (length desired) >>= satisfying (== desired)
stream <- chunk desired

Supporting typeclasses

class Monoid stream => Stream stream token | stream -> token where Source #

A sequence of values which may be processed via a MonadParser. This class is essentially just a unification of the various list-like interfaces (uncons == head, etc.) as Haskell's abstractions are slightly lacking in that area.

>>> Just (tok, str) == uncons (cons tok str)
True

Minimal complete definition

cons, uncons

Methods

cons :: token -> stream -> stream Source #

Prepend a token to the stream for proximate processing, before everything already in it.

consChunk :: stream -> stream -> stream Source #

As cons, but append multiple tokens at once.

uncons :: stream -> Maybe (token, stream) Source #

Retrieve the next token from the stream.

This should only return Nothing if the stream is actually empty---if the next value is not available yet due to slow IO or other computation, uncons waits until it is.

unconsChunk :: Word -> stream -> (stream, stream) Source #

Retrieve the next several tokens from the stream.

If fewer tokens are in the input stream than asked for, the left side of the return value is the (shorter than requested) entire input stream and the right is mempty.

chunkLen :: stream -> Word Source #

The number of tokens remaining in the stream.

Instances

Instances details
Stream ByteString Word8 Source # 
Instance details

Defined in Web.Willow.Common.Parser

Stream ByteString Word8 Source # 
Instance details

Defined in Web.Willow.Common.Parser

Stream Text Char Source # 
Instance details

Defined in Web.Willow.Common.Parser

Stream Text Char Source # 
Instance details

Defined in Web.Willow.Common.Parser

Stream [token] token Source # 
Instance details

Defined in Web.Willow.Common.Parser

Methods

cons :: token -> [token] -> [token] Source #

consChunk :: [token] -> [token] -> [token] Source #

uncons :: [token] -> Maybe (token, [token]) Source #

unconsChunk :: Word -> [token] -> ([token], [token]) Source #

chunkLen :: [token] -> Word Source #