html-parse: A high-performance HTML tokenizer
This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.
This package provides a fast and reasonably robust HTML5 tokenizer built
upon the attoparsec
library. The parsing strategy is based upon the HTML5
parsing specification with few deviations.
The package targets similar use-cases to the venerable tagsoup
library,
but is significantly more efficient, achieving parsing speeds of over 50
megabytes per second on modern hardware with and typical web documents.
For instance,
>>>
parseTokens "<div><h1 class=widget>Hello World</h1><br/>"
[TagOpen "div" [],TagOpen "h1" [Attr "class" "widget"], ContentText "Hello World",TagClose "h1",TagSelfClose "br" []]
Properties
Versions | 0.1.0.0, 0.2.0.0, 0.2.0.1, 0.2.0.1, 0.2.0.2, 0.2.1.0 |
---|---|
Change log | None available |
Dependencies | attoparsec (>=0.13 && <0.14), base (>=4.7 && <4.11), containers (>=0.5 && <0.6), deepseq (>=1.4 && <1.5), text (>=1.2 && <1.3) [details] |
License | BSD-3-Clause |
Copyright | (c) 2016 Ben Gamari |
Author | Ben Gamari |
Maintainer | ben@smart-cactus.org |
Category | Text |
Home page | http://github.com/bgamari/html-parse |
Source repo | head: git clone git://github.com/bgamari/html-parse |
Uploaded | by BenGamari at 2017-08-10T03:52:34Z |
Modules
[Index]
- Text
Downloads
- html-parse-0.2.0.1.tar.gz [browse] (Cabal source package)
- Package description (as included in the package)
Maintainer's Corner
Package maintainers
For package maintainers and hackage trustees