This module is for working with HTML/XML. It deals with both well-formed XML and malformed HTML from the web. It features:
- A lazy parser, based on the HTML 5 specification - see
parseTags
. - A renderer that can write out HTML/XML - see
renderTags
. - Utilities for extracting information from a document - see
~==
,sections
andpartitions
.
The standard practice is to parse a String
to [
Tag
String
]
using parseTags
,
then operate upon it to extract the necessary information.
- data Tag str
- = TagOpen str [Attribute str]
- | TagClose str
- | TagText str
- | TagComment str
- | TagWarning str
- | TagPosition !Row !Column
- type Row = Int
- type Column = Int
- type Attribute str = (str, str)
- parseTags :: StringLike str => str -> [Tag str]
- parseTagsOptions :: StringLike str => ParseOptions str -> str -> [Tag str]
- data ParseOptions str = ParseOptions {
- optTagPosition :: Bool
- optTagWarning :: Bool
- optEntityData :: str -> [Tag str]
- optEntityAttrib :: (str, Bool) -> (str, [Tag str])
- optTagTextMerge :: Bool
- parseOptions :: StringLike str => ParseOptions str
- parseOptionsFast :: StringLike str => ParseOptions str
- renderTags :: StringLike str => [Tag str] -> str
- renderTagsOptions :: StringLike str => RenderOptions str -> [Tag str] -> str
- data RenderOptions str = RenderOptions {
- optEscape :: str -> str
- optMinimize :: str -> Bool
- renderOptions :: StringLike str => RenderOptions str
- canonicalizeTags :: StringLike str => [Tag str] -> [Tag str]
- isTagOpen :: Tag str -> Bool
- isTagClose :: Tag str -> Bool
- isTagText :: Tag str -> Bool
- isTagWarning :: Tag str -> Bool
- isTagPosition :: Tag str -> Bool
- isTagOpenName :: Eq str => str -> Tag str -> Bool
- isTagCloseName :: Eq str => str -> Tag str -> Bool
- fromTagText :: Show str => Tag str -> str
- fromAttrib :: (Show str, Eq str, StringLike str) => str -> Tag str -> str
- maybeTagText :: Tag str -> Maybe str
- maybeTagWarning :: Tag str -> Maybe str
- innerText :: StringLike str => [Tag str] -> str
- sections :: (a -> Bool) -> [a] -> [[a]]
- partitions :: (a -> Bool) -> [a] -> [[a]]
- class TagRep a
- (~==) :: (StringLike str, TagRep t) => Tag str -> t -> Bool
- (~/=) :: (StringLike str, TagRep t) => Tag str -> t -> Bool
Data structures and parsing
A single HTML element. A whole document is represented by a list of Tag
.
There is no requirement for TagOpen
and TagClose
to match.
TagOpen str [Attribute str] | An open tag with |
TagClose str | A closing tag |
TagText str | A text node, guaranteed not to be the empty string |
TagComment str | A comment |
TagWarning str | Meta: A syntax error in the input file |
TagPosition !Row !Column | Meta: The position of a parsed element |
parseTags :: StringLike str => str -> [Tag str]Source
Parse a string to a list of tags, using an HTML 5 compliant parser.
parseTags "<hello>my&</world>" == [TagOpen "hello" [],TagText "my&",TagClose "world"]
parseTagsOptions :: StringLike str => ParseOptions str -> str -> [Tag str]Source
Parse a string to a list of tags, using settings supplied by the ParseOptions
parameter,
eg. to output position information:
parseTagsOptions parseOptions{optTagPosition = True} "<hello>my&</world>" == [TagPosition 1 1,TagOpen "hello" [],TagPosition 1 8,TagText "my&",TagPosition 1 15,TagClose "world"]
data ParseOptions str Source
These options control how parseTags
works.
ParseOptions | |
|
parseOptions :: StringLike str => ParseOptions strSource
The default parse options value, described in ParseOptions
.
parseOptionsFast :: StringLike str => ParseOptions strSource
A ParseOptions
structure optimised for speed, following the fast options.
renderTags :: StringLike str => [Tag str] -> strSource
Show a list of tags, as they might have been parsed, using the default settings given in
RenderOptions
.
renderTags [TagOpen "hello" [],TagText "my&",TagClose "world"] == "<hello>my&</world>"
renderTagsOptions :: StringLike str => RenderOptions str -> [Tag str] -> strSource
Show a list of tags using settings supplied by the RenderOptions
parameter,
eg. to avoid escaping any characters one could do:
renderTagsOptions renderOptions{optEscape = id} [TagText "my&"] == "my&"
data RenderOptions str Source
These options control how renderTags
works.
The strange quirk of only minimizing <br>
tags is due to Internet Explorer treating
<br></br>
as <br><br>
.
RenderOptions | |
|
renderOptions :: StringLike str => RenderOptions strSource
The default render options value, described in RenderOptions
.
canonicalizeTags :: StringLike str => [Tag str] -> [Tag str]Source
Turns all tag names and attributes to lower case and converts DOCTYPE to upper case.
Tag identification
isTagWarning :: Tag str -> BoolSource
Test if a Tag
is a TagWarning
isTagPosition :: Tag str -> BoolSource
Test if a Tag
is a TagPosition
isTagOpenName :: Eq str => str -> Tag str -> BoolSource
isTagCloseName :: Eq str => str -> Tag str -> BoolSource
Extraction
fromTagText :: Show str => Tag str -> strSource
fromAttrib :: (Show str, Eq str, StringLike str) => str -> Tag str -> strSource
Extract an attribute, crashes if not a TagOpen
.
Returns ""
if no attribute present.
maybeTagWarning :: Tag str -> Maybe strSource
Extract the string from within TagWarning
, otherwise Nothing
innerText :: StringLike str => [Tag str] -> strSource
Extract all text content from tags (similar to Verbatim found in HaXml)
Utility
sections :: (a -> Bool) -> [a] -> [[a]]Source
This function takes a list, and returns all suffixes whose first item matches the predicate.
partitions :: (a -> Bool) -> [a] -> [[a]]Source
This function is similar to sections
, but splits the list
so no element appears in any two partitions.
Combinators
Define a class to allow String's or Tag str's to be used as matches
TagRep String | |
StringLike str => TagRep (Tag str) |
(~==) :: (StringLike str, TagRep t) => Tag str -> t -> BoolSource
Performs an inexact match, the first item should be the thing to match. If the second item is a blank string, that is considered to match anything. For example:
(TagText "test" ~== TagText "" ) == True (TagText "test" ~== TagText "test") == True (TagText "test" ~== TagText "soup") == False
For TagOpen
missing attributes on the right are allowed.