readability-0.0.1.0: Extracts text of main article from HTML document

Safe HaskellNone
LanguageHaskell2010

Readability.Clean

Synopsis

Documentation

sanitizeNode :: Settings -> Scores -> Node -> Maybe Node Source #

Remove elements that do not contribute to the article.

Removes:

  • headings that are part of ads, menus, widgets etc.
  • forms, textareas
  • iframes
  • block elements (div, section, table etc.) with enough textual content

Preserves:

  • textual content