Safe Haskell | None |
---|---|
Language | Haskell98 |
This module uses HXT to transverse an HTML document using CSS selectors.
The most important function here is findBySelector
, it takes a CSS query and
a string containing the HTML to look into,
and it returns a list of the HTML fragments that matched the given query.
Only a subset of the CSS spec is currently supported:
- By tag name: table td a
- By class names: .container .content
- By Id: #oneId
- By attribute: [hasIt], [exact=match], [contains*=text], [starts^=with], [ends$=with]
- Union: a, span, p
- Immediate children: div > p
- Get jiggy with it: div[data-attr=yeah] > .mon, .foo.bar div, #oneThing
- findBySelector :: HtmlLBS -> Query -> Either String [String]
- type HtmlLBS = ByteString
- type Query = Text
- parseQuery :: Text -> Either String [[SelectorGroup]]
- runQuery :: Cursor -> [[SelectorGroup]] -> [Cursor]
- data Selector
- = ById Text
- | ByClass Text
- | ByTagName Text
- | ByAttrExists Text
- | ByAttrEquals Text Text
- | ByAttrContains Text Text
- | ByAttrStarts Text Text
- | ByAttrEnds Text Text
- data SelectorGroup
Documentation
findBySelector :: HtmlLBS -> Query -> Either String [String] Source
Perform a css Query
on Html
. Returns Either
- Left: Query parse error.
- Right: List of matching Html fragments.
type HtmlLBS = ByteString Source
For HXT hackers
These functions expose some low level details that you can blissfully ignore.
parseQuery :: Text -> Either String [[SelectorGroup]] Source
Parses a query into an intermediate format which is easy to feed to HXT
- The top-level lists represent the top level comma separated queries.
- SelectorGroup is a group of qualifiers which are separated with spaces or > like these three: table.main.odd tr.even > td.big
- A SelectorGroup as a list of Selector items, following the above example the selectors in the group are: table, .main and .odd
runQuery :: Cursor -> [[SelectorGroup]] -> [Cursor] Source
ById Text | |
ByClass Text | |
ByTagName Text | |
ByAttrExists Text | |
ByAttrEquals Text Text | |
ByAttrContains Text Text | |
ByAttrStarts Text Text | |
ByAttrEnds Text Text |