tagsoup-0.14.1: Parsing and extracting information from (possibly malformed) HTML/XML documents

Safe HaskellNone
LanguageHaskell2010

Text.HTML.TagSoup.Tree

Description

NOTE: This module is preliminary and may change at a future date.

This module is intended to help converting a list of tags into a tree of tags.

Synopsis

Documentation

data TagTree str Source #

A tree of Tag values.

Constructors

TagBranch str [Attribute str] [TagTree str]

A 'TagOpen'/'TagClose' pair with the Tag values in between.

TagLeaf (Tag str)

Any leaf node

Instances

Functor TagTree Source # 

Methods

fmap :: (a -> b) -> TagTree a -> TagTree b #

(<$) :: a -> TagTree b -> TagTree a #

Eq str => Eq (TagTree str) Source # 

Methods

(==) :: TagTree str -> TagTree str -> Bool #

(/=) :: TagTree str -> TagTree str -> Bool #

Ord str => Ord (TagTree str) Source # 

Methods

compare :: TagTree str -> TagTree str -> Ordering #

(<) :: TagTree str -> TagTree str -> Bool #

(<=) :: TagTree str -> TagTree str -> Bool #

(>) :: TagTree str -> TagTree str -> Bool #

(>=) :: TagTree str -> TagTree str -> Bool #

max :: TagTree str -> TagTree str -> TagTree str #

min :: TagTree str -> TagTree str -> TagTree str #

Show str => Show (TagTree str) Source # 

Methods

showsPrec :: Int -> TagTree str -> ShowS #

show :: TagTree str -> String #

showList :: [TagTree str] -> ShowS #

tagTree :: Eq str => [Tag str] -> [TagTree str] Source #

Convert a list of tags into a tree. This version is not lazy at all, that is saved for version 2.

parseTree :: StringLike str => str -> [TagTree str] Source #

Build a TagTree from a string.

parseTreeOptions :: StringLike str => ParseOptions str -> str -> [TagTree str] Source #

Build a TagTree from a string, specifying the ParseOptions.

data ParseOptions str Source #

These options control how parseTags works. The ParseOptions type is usually generated by one of parseOptions, parseOptionsFast or parseOptionsEntities, then selected fields may be overriden.

The options optTagPosition and optTagWarning specify whether to generate TagPosition or TagWarning elements respectively. Usually these options should be set to False to simplify future stages, unless you rely on position information or want to give malformed HTML messages to the end user.

The options optEntityData and optEntityAttrib control how entities, for example &nbsp; are handled. Both take a string, and a boolean, where True indicates that the entity ended with a semi-colon ;. Inside normal text optEntityData will be called, and the results will be inserted in the tag stream. Inside a tag attribute optEntityAttrib will be called, and the first component of the result will be used in the attribute, and the second component will be appended after the TagOpen value (usually the second component is []). As an example, to not decode any entities, pass:

parseOptions
    {optEntityData=\(str,b) -> [TagText $ "&" ++ str ++ [';' | b]]
    ,optEntityAttrib\(str,b) -> ("&" ++ str ++ [';' | b], [])

Constructors

ParseOptions 

Fields

flattenTree :: [TagTree str] -> [Tag str] Source #

Flatten a TagTree back to a list of Tag.

renderTree :: StringLike str => [TagTree str] -> str Source #

Render a TagTree.

renderTreeOptions :: StringLike str => RenderOptions str -> [TagTree str] -> str Source #

Render a TagTree with some RenderOptions.

data RenderOptions str Source #

These options control how renderTags works.

The strange quirk of only minimizing <br> tags is due to Internet Explorer treating <br></br> as <br><br>.

Constructors

RenderOptions 

Fields

  • optEscape :: str -> str

    Escape a piece of text (default = escape the four characters &"<>)

  • optMinimize :: str -> Bool

    Minimise <b></b> -> <b/> (default = minimise only <br> tags)

  • optRawTag :: str -> Bool

    Should a tag be output with no escaping (default = true only for script)

transformTree :: (TagTree str -> [TagTree str]) -> [TagTree str] -> [TagTree str] Source #

This operation is based on the Uniplate transform function. Given a list of trees, it applies the function to every tree in a bottom-up manner. This operation is useful for manipulating a tree - for example to make all tag names upper case:

upperCase = transformTree f
  where f (TagBranch name atts inner) = [TagBranch (map toUpper name) atts inner]
        f x = [x]

universeTree :: [TagTree str] -> [TagTree str] Source #

This operation is based on the Uniplate universe function. Given a list of trees, it returns those trees, and all the children trees at any level. For example:

universeTree
   [TagBranch "a" [("href","url")] [TagBranch "b" [] [TagLeaf (TagText "text")]]]
== [TagBranch "a" [("href","url")] [TagBranch "b" [] [TagLeaf (TagText "text")]]]
   ,TagBranch "b" [] [TagLeaf (TagText "text")]]

This operation is particularly useful for queries. To collect all "a" tags in a tree, simply do:

[x | x@(TagBranch "a" _ _) <- universeTree tree]