{-| Description: Basic data structures for building markup trees. Copyright: (c) 2020-2021 Sam May License: MPL-2.0 Maintainer: ag.eitilt@gmail.com Stability: experimental Portability: portable In lieu of a fully-featured __[DOM](https://dom.spec.whatwg.org/)__ implementation ---and even, for that matter, a styled tree--- this module provides bare-bones data structures to temporarily contain the minimal data currently returned by tree parsing. Eventually this will be padded out into a fully-featured DOM implementation, but doing so now would be creating much more work than necessary. -} module Web.Willow.DOM ( -- * Structure Tree ( .. ) , emptyTree , Node ( .. ) , NodeType ( .. ) , nodeType -- * Data -- ** Document , QuirksMode ( .. ) -- ** Elements , ElementParams ( .. ) , emptyElementParams , ElementName , ElementPrefix -- *** Attribute list , AttributeMap , fromAttrList , toAttrList , insertAttribute -- ** Attributes , BasicAttribute , AttributeParams ( .. ) , emptyAttributeParams , AttributeName , AttributeValue , AttributePrefix -- ** Document type declarations , DocumentTypeParams ( .. ) , emptyDocumentTypeParams , DoctypeName , DoctypePublicId , DoctypeSystemId -- ** Namespaces , Namespace , htmlNamespace , mathMLNamespace , svgNamespace , xlinkNamespace , xmlNamespace , xmlnsNamespace ) where import qualified Data.HashMap.Strict as M import qualified Data.Text as T -- | __DOM:__ -- @[tree] -- (https://dom.spec.whatwg.org/#concept-tree)@ -- -- The core concept underlying HTML and related languages: a nested collection -- of data and metadata marked up according to several broad categories. -- Values may be easily instantiated as updates to 'emptyTree'. data Tree = Tree { node :: Node -- ^ The atomic portion of the tree at the current location. , children :: [Tree] -- ^ All parts of the tree nested below the current location. } deriving ( Eq, Show, Read ) -- | A sane default collection for easy record initialization; namely, a -- 'Document' without any 'children'. emptyTree :: Tree emptyTree = Tree { node = Document NoQuirks , children = [] } -- | __DOM:__ -- @[node] -- (https://dom.spec.whatwg.org/#concept-node)@ -- -- The sum type of all different classes of behaviour a particular point of -- data may fill. data Node = Text T.Text -- ^ __DOM:__ -- @[Text] -- (https://dom.spec.whatwg.org/#interface-text)@ -- -- A simple character string to be rendered to the output or to be -- processed further, according to which 'Element's enclose it. | Comment T.Text -- ^ __DOM:__ -- @[Comment] -- (https://dom.spec.whatwg.org/#interface-comment)@ -- -- An author's aside, not intended to be shown to the end user. | DocumentType DocumentTypeParams -- ^ __DOM:__ -- @[DocumentType] -- (https://dom.spec.whatwg.org/#interface-documenttype)@ -- -- Largely vestigial in HTML5, but used in previous versions and -- related languages to specify the semantics of 'Element's used in the -- document. | Element ElementParams -- ^ __DOM:__ -- @[Element] -- (https://dom.spec.whatwg.org/#interface-element)@ -- -- Markup instructions directing the behaviour or classifying a portion -- of the document's content. | Attribute AttributeParams -- ^ __DOM:__ -- @[Attr] -- (https://dom.spec.whatwg.org/#interface-attr)@ -- -- Metadata allowing finer customization and description of the heavier -- 'Element's. | DocumentFragment -- ^ __DOM:__ -- @[DocumentType] -- (https://dom.spec.whatwg.org/#interface-documenttype)@ -- -- As like 'Document', but requiring less precise structure in its -- 'children' and generally only containing a small slice of a larger -- document. | Document QuirksMode -- ^ __DOM:__ -- @[Document] -- (https://dom.spec.whatwg.org/#interface-document)@ -- -- The root of a 'Tree', typically imposing a principled structure. deriving ( Eq, Show, Read ) -- | __DOM:__ -- @[nodeType] -- (https://dom.spec.whatwg.org/#dom-node-nodetype)@ -- -- Simplify the algebraic data type to a one-dimensional 'Enum' to allow -- equality testing rather than requiring pattern matching. nodeType :: Node -> Maybe NodeType nodeType Text{} = Just TextNode nodeType Comment{} = Just CommentNode nodeType DocumentType{} = Just DocumentTypeNode nodeType Element{} = Just ElementNode nodeType Attribute{} = Just AttributeNode nodeType DocumentFragment{} = Just DocumentFragmentNode nodeType Document{} = Just DocumentNode -- | Through the long history of HTML browsers, many unique and/or buggy -- behaviours have become enshrined due to the simple fact that website authors -- used them. As the standards and the parse engines have continued to -- develop, three separated degrees of emulation have emerged for that -- backwards compatibility. data QuirksMode = NoQuirks -- ^ __DOM:__ -- @[no-quirks mode] -- (https://dom.spec.whatwg.org/#concept-document-no-quirks)@ -- -- Fully compliant with the modern standard. | LimitedQuirks -- ^ __DOM:__ -- @[limited-quirks mode] -- (https://dom.spec.whatwg.org/#concept-document-limited-quirks)@ -- -- Largely compliant with the standard, except for a couple height -- calculations. | FullQuirks -- ^ __DOM:__ -- @[quirks mode] -- (https://dom.spec.whatwg.org/#concept-document-quirks)@ -- -- Backwards compatibility with 1990's-era technology. deriving ( Eq, Ord, Enum, Bounded, Show, Read ) -- | A simple key-value representation of an attribute on an HTML tag, before -- any namespace processing. type BasicAttribute = (AttributeName, AttributeValue) -- | Type-level clarification for the short namespace reference classifying a -- supplemental point of metadata. type AttributePrefix = T.Text -- | Type-level clarification for the key of a supplemental point of metadata. type AttributeName = T.Text -- | Type-level clarification for the value of a supplemental point of -- metadata. type AttributeValue = T.Text -- | __DOM:__ -- @[Attr] -- (https://dom.spec.whatwg.org/#attr)@ -- -- A more complete representation of an attribute, including extensions beyond -- the 'BasicAttribute' to support more structured (XML-like) markup languages. -- Values may be easily instantiated as updates to 'emptyAttributeParams'. data AttributeParams = AttributeParams { attrPrefix :: Maybe AttributePrefix -- ^ The variable fragment used to represent the 'attrNamespace' in the -- original source. , attrName :: AttributeName -- ^ The key defining what role the metadata value point at 'attrValue' -- is meant to represent, as defined by the 'attrNamespace'. , attrNamespace :: Maybe Namespace -- ^ The scope defining the language by which the attribute -- participates in the document. , attrValue :: AttributeValue -- ^ A point of metadata further describing rendering behaviour or -- adding other information. } deriving ( Eq, Show, Read ) -- | A sane default collection for easy record initialization; namely, -- 'Nothing's and 'T.empty's. emptyAttributeParams :: AttributeParams emptyAttributeParams = AttributeParams { attrPrefix = Nothing , attrName = T.empty , attrNamespace = Nothing , attrValue = T.empty } -- | Type-level clarification for the short namespace reference classifying a -- markup tag. type ElementPrefix = T.Text -- | Type-level clarification for the name of a markup tag. type ElementName = T.Text -- | __DOM:__ -- @[Element] -- (https://dom.spec.whatwg.org/#element)@ -- -- The collection of metadata identifying and describing a markup tag used to -- associate text or other data with its broader role in the document, or to -- indicate a preferred rendering. Values may be easily instantiated as -- updates to 'emptyElementParams'. data ElementParams = ElementParams { elementPrefix :: Maybe ElementPrefix -- ^ The variable fragment used to represent the 'elementNamespace' in -- the original source. , elementName :: ElementName -- ^ The key defining what role the markup tag is meant to represent, -- as defined by the 'elementNamespace'. , elementNamespace :: Maybe Namespace -- ^ The scope defining the language by which the elementibute -- participates in the document. , elementAttributes :: AttributeMap -- ^ The points of metadata further describing rendering behaviour or -- adding other information. } deriving ( Eq, Show, Read ) -- | A sane default collection for easy record initialization. emptyElementParams :: ElementParams emptyElementParams = ElementParams { elementPrefix = Nothing , elementName = T.empty , elementNamespace = Nothing , elementAttributes = M.empty } -- | __DOM:__ -- @[NamedNodeMap] -- (https://dom.spec.whatwg.org/#interface-namednodemap)@ -- -- Type-level clarification for the collection of key-value points of -- supplemental metadata attached to an 'Element'. Note that, while an -- 'Attribute'\'s prefix is used to determine the associated namespace (and -- needs to be tracked for round-trip serialization), it doesn't factor into -- testing equality or in lookups. type AttributeMap = M.HashMap (Maybe Namespace, AttributeName) (Maybe AttributePrefix, AttributeValue) -- | Helper function to transform key-value metadata from the indexable form -- stored by an 'AttributeMap' into more structured data. packAttr :: (Maybe Namespace, AttributeName) -> (Maybe AttributePrefix, AttributeValue) -> AttributeParams packAttr (ns, n) (p, v) = emptyAttributeParams { attrNamespace = ns , attrName = n , attrPrefix = p , attrValue = v } -- | Helper function to transform structured key-value metadata into the -- indexable form stored by an 'AttributeMap'. unpackAttr :: AttributeParams -> ((Maybe Namespace, AttributeName), (Maybe AttributePrefix, AttributeValue)) unpackAttr d = ((attrNamespace d, attrName d), (attrPrefix d, attrValue d)) -- | Extract the key-value metadata pairs from a indexed collection into an -- iterable form. The order of elements is unspecified. toAttrList :: AttributeMap -> [AttributeParams] toAttrList = map (uncurry packAttr) . M.toList -- | Pack a list of key-value metadata pairs into a form better optimized for -- random lookup. fromAttrList :: [AttributeParams] -> AttributeMap fromAttrList = M.fromList . map unpackAttr -- | As 'M.insert', performing the required data reordering for the -- less-comfortable internal type representation. insertAttribute :: AttributeParams -> AttributeMap -> AttributeMap insertAttribute d = uncurry M.insert $ unpackAttr d -- | Type-level clarification for the language used in the document or, -- equivalently, the name of the root node. type DoctypeName = T.Text -- | Type-level clarification for a registered or otherwise globally-unique -- reference to a description of the language used in the document. type DoctypePublicId = T.Text -- | Type-level clarification for a reference to the description of the -- language used in the document, dependant on the state of the system (and/or -- the internet). type DoctypeSystemId = T.Text -- | __DOM:__ -- @[DocumentType] -- (https://dom.spec.whatwg.org/#documenttype)@ -- -- The collection of metadata representing a document type declaration -- describing the markup language used in a document; of vestigal use in HTML, -- but important for related languages. Values may be easily instantiated as -- updates to 'emptyDocumentTypeParams'. data DocumentTypeParams = DocumentTypeParams { documentTypeName :: DoctypeName -- ^ The root element of the document, which may also identify the -- primary language used. , documentTypePublicId :: DoctypePublicId -- ^ A globally-unique reference to the definition of the language. , documentTypeSystemId :: DoctypeSystemId -- ^ A system-dependant (but perhaps easier to access) reference to the -- definition of the language. } deriving ( Eq, Show, Read ) -- | A sane default collection for easy record initialization; namely, -- 'T.empty's. emptyDocumentTypeParams :: DocumentTypeParams emptyDocumentTypeParams = DocumentTypeParams { documentTypeName = T.empty , documentTypePublicId = T.empty , documentTypeSystemId = T.empty } -- | A simplified view of the 'Node' constructors, for use in testing via -- 'nodeType'. data NodeType = ElementNode -- ^ __DOM:__ -- @[ELEMENT_NODE] -- (https://dom.spec.whatwg.org/#dom-node-element_node)@ -- -- 'Element' | AttributeNode -- ^ __DOM:__ -- @[ATTRIBUTE_NODE] -- (https://dom.spec.whatwg.org/#dom-node-attribute_node)@ -- -- 'Attribute' | TextNode -- ^ __DOM:__ -- @[TEXT_NODE] -- (https://dom.spec.whatwg.org/#dom-node-text_node)@ -- -- 'Text' | CDataSectionNode -- ^ __DOM:__ -- @[CDATA_SECTION_NODE] -- (https://dom.spec.whatwg.org/#dom-node-cdata_section_node)@ | EntityReferenceNode -- ^ __DOM:__ -- @[ENTITY_REFERENCE_NODE] -- (https://dom.spec.whatwg.org/#node)@ | EntityNode -- ^ __DOM:__ -- @[ENTITY_NODE] -- (https://dom.spec.whatwg.org/#node)@ | ProcessingInstructionNode -- ^ __DOM:__ -- @[PROCESSING_INSTRUCTION_NODE] -- (https://dom.spec.whatwg.org/#dom-node-processing-instruction_node)@ | CommentNode -- ^ __DOM:__ -- @[COMMENT_NODE] -- (https://dom.spec.whatwg.org/#dom-node-comment_node)@ -- -- 'Comment' | DocumentNode -- ^ __DOM:__ -- @[DOCUMENT_NODE] -- (https://dom.spec.whatwg.org/#dom-node-document_node)@ -- -- 'Document' | DocumentTypeNode -- ^ __DOM:__ -- @[DOCUMENT_TYPE_NODE] -- (https://dom.spec.whatwg.org/#dom-node-document_type_node)@ -- -- 'DocumentType' | DocumentFragmentNode -- ^ __DOM:__ -- @[DOCUMENT_FRAGMENT_NODE] -- (https://dom.spec.whatwg.org/#dom-node-document_fragment_node)@ -- -- 'DocumentFragment' | NotationNode -- ^ __DOM:__ -- @[NOTATION_NODE] -- (https://dom.spec.whatwg.org/#node)@ deriving ( Eq, Ord, Bounded, Show, Read ) instance Enum NodeType where fromEnum ElementNode = 1 fromEnum AttributeNode = 2 fromEnum TextNode = 3 fromEnum CDataSectionNode = 4 fromEnum EntityReferenceNode = 5 fromEnum EntityNode = 6 fromEnum ProcessingInstructionNode = 7 fromEnum CommentNode = 8 fromEnum DocumentNode = 9 fromEnum DocumentTypeNode = 10 fromEnum DocumentFragmentNode = 11 fromEnum NotationNode = 12 toEnum 1 = ElementNode toEnum 2 = AttributeNode toEnum 3 = TextNode toEnum 4 = CDataSectionNode toEnum 5 = EntityReferenceNode toEnum 6 = EntityNode toEnum 7 = ProcessingInstructionNode toEnum 8 = CommentNode toEnum 9 = DocumentNode toEnum 10 = DocumentTypeNode toEnum 11 = DocumentFragmentNode toEnum 12 = NotationNode toEnum _ = error "Web.Willow.DOM.NodeType.toEnum: invalid index" {-# DEPRECATED EntityReferenceNode, EntityNode, NotationNode "historical" #-} -- | __XML-NAMES:__ -- @[XML namespace] -- (https://www.w3.org/TR/xml-names/#sec-namespaces)@ -- -- An identifier (theoretically) pointing to a reference defining a particular -- element or attribute ---though not necessarily in machine-readable form--- -- and so providing a scope for differentiating multiple elements with the same -- local name but different semantics. type Namespace = T.Text -- | __Infra:__ -- @[HTML namespace] -- (https://infra.spec.whatwg.org/#html-namespace)@ -- -- The canonical scope value for elements and attributes defined by the HTML -- standard when used in XML or XML-compatible documents. htmlNamespace :: Namespace htmlNamespace = T.pack "http://www.w3.org/1999/xhtml" -- | __Infra:__ -- @[MathML namespace] -- (https://infra.spec.whatwg.org/#mathml-namespace)@ -- -- The canonical scope value for elements and attributes defined by the MathML -- standard. mathMLNamespace :: Namespace mathMLNamespace = T.pack "http://www.w3.org/1998/Math/MathML" -- | __Infra:__ -- @[SVG namespace] -- (https://infra.spec.whatwg.org/#svg-namespace)@ -- -- The canonical scope value for elements and attributes defined by the SVG -- standard. svgNamespace :: Namespace svgNamespace = T.pack "http://www.w3.org/2000/svg" -- | __Infra:__ -- @[XLink namespace] -- (https://infra.spec.whatwg.org/#xlink-namespace)@ -- -- The canonical scope value for elements and attributes defined by the XLink -- standard. xlinkNamespace :: Namespace xlinkNamespace = T.pack "http://www.w3.org/1999/xlink" -- | __Infra:__ -- @[XML namespace] -- (https://infra.spec.whatwg.org/#xml-namespace)@ -- -- The canonical scope value for elements and attributes defined by the XML -- standard. xmlNamespace :: Namespace xmlNamespace = T.pack "http://www.w3.org/XML/1998/namespace" -- | __Infra:__ -- @[XMLNS namespace] -- (https://infra.spec.whatwg.org/#xmlns-namespace)@ -- -- The canonical scope value for elements and attributes defined by the XMLNS -- standard. xmlnsNamespace :: Namespace xmlnsNamespace = T.pack "http://www.w3.org/2000/xmlns/"