mangrove: A parser for web documents according to the HTML5 specification.

[ library, mpl, web ] [ Propose Tags ]

mangrove provides HTML parsing for the Willow web browser suite. As such, it has not necessarily been written with a broader audience in mind, but the resulting data structures should still be generic enough to serve as a general parsing library should you need HTML5 compatibility (most likely, its codified error recovery algorithms); if you do use this for other projects, please do share any issues —or even just discomforts— that broader usage reveals. Notably, however, mangrove makes no attempt to parse CSS, JavaScript, or to access linked files, leaving those tasks to other parts of the suite and merely generates a simple document tree from the markup.


[Skip to Readme]

Flags

Manual Flags

NameDescriptionDefault
dev

Trigger stricter behaviour for development

Disabled
Automatic Flags
NameDescriptionDefault
html5lib

Enable the html5lib tests, which require manually downloading the test data.

Disabled
wpt

Enable integration with web-platform-tests suite, greatly expanding the test surface at the expense of requiring the manual download of the (large) test data repository.

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0
Change log CHANGELOG.md
Dependencies aeson (<1.6), base (>=4.10 && <4.15), bytestring (<0.11), containers (<0.7), filepath (<1.5), text (<1.3), transformers (<0.6), unordered-containers (<0.3), utility-ht (<0.1), vector (<0.13), willow [details]
License MPL-2.0
Copyright © 2020-2021 Sam May
Author Sam May
Maintainer sam@eitilt.life
Category Web
Home page https://ag.eitilt.life/willow
Bug tracker mailto:ag@eitilt.life
Source repo head: darcs get https://darcs.eitilt.life/willow (mangrove)
this: darcs get https://darcs.eitilt.life/willow --tag v/html/0.1.0.0. (mangrove)
head: darcs get https://hub.darcs.net/ag.eitilt/willow (mangrove)
this: darcs get https://hub.darcs.net/ag.eitilt/willow --tag v/html/0.1.0.0. (mangrove)
Uploaded by ageitilt at 2021-04-16T23:54:27Z
Distributions
Downloads 182 total (8 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for mangrove-0.1.0.0

[back to package description]

About

'mangrove' provides an HTML5-compatible parser for web documents, implemented in Haskell. In keeping with the immutable data paradigms, an emphasis has been placed on avoiding side effects and mutable structures rather than strictly following the official algorithms. The resulting document tree can be returned to willow to be styled and rendered.

This readme is rather sparse, as it has been written for a subfolder of the complete repository; for full info on the project, see the primary readme in either this directory, its parent, or the online host, whichever of those links may work.

Coverage reporting

Unfortunately, the invocation of hpc by cabal-install <= 3.4.0.0 doesn't work properly when multiple packages are developed as part of the same project.
Until the next version is released, I recommend that you don't enable coverage reports for mangrove, in order for the tests themselves to run correctly.