html-parse: A high-performance HTML tokenizer

[ bsd3, library, text ] [ Propose Tags ]

This package provides a fast and reasonably robust HTML5 tokenizer built upon the attoparsec library. The parsing strategy is based upon the HTML5 parsing specification with few deviations.

The package targets similar use-cases to the venerable tagsoup library, but is significantly more efficient, achieving parsing speeds of over 50 megabytes per second on modern hardware with and typical web documents.

Modules

[Index]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0.1.0.0, 0.2.0.0, 0.2.0.1, 0.2.0.2, 0.2.1.0
Dependencies attoparsec (>=0.13 && <0.14), base (>=4.8 && <4.10), deepseq (>=1.4 && <1.5), text (>=1.2 && <1.3) [details]
License BSD-3-Clause
Copyright (c) 2016 Ben Gamari
Author Ben Gamari
Maintainer ben@smart-cactus.org
Category Text
Home page http://github.com/bgamari/html-parse
Source repo head: git clone git://github.com/bgamari/html-parse
Uploaded by BenGamari at 2016-04-13T21:06:14Z
Distributions Arch:0.2.1.0
Reverse Dependencies 3 direct, 0 indirect [details]
Downloads 3645 total (32 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]