html-parse-0.1.0.0: A high-performance HTML tokenizer

Safe HaskellSafe
LanguageHaskell2010

Text.HTML.Parser

Description

This is a performance-oriented HTML tokenizer aim at web-crawling applications. It follows the HTML5 parsing specification quite closely, so it behaves reasonable well on ill-formed documents from the open Web.

Synopsis

Documentation

data Token Source

An HTML token

Constructors

TagOpen !TagName [Attr]

An opening tag. Attribute ordering is arbitrary

TagClose !TagName

A closing tag.

ContentChar !Char

The content between tags.

ContentText !Text 
Comment !Builder

Contents of a comment.

Doctype !Text

Doctype

data Attr Source

An attribute of a tag

Constructors

Attr !AttrName !AttrValue 

token :: Parser Token Source

Parse a single Token.

tagStream :: Text -> [Token] Source

Produce a lazy list of tokens.