Safe Haskell | Safe |
---|---|
Language | Haskell2010 |
This module provides conversions between various different markup formats. In principle, it provides four different conversions:
- Converting the Superdoc markup language to HTML.
- Converting ASCII-armored Unicode to HTML.
- Converting Unicode streams to ASCII-armor.
- Converting Unicode streams to HTML.
Conversions 1 and 2 are combined into a single parser for the
Superdoc markup language, which is exposed by the function
markup
. This is used by the post-Haddock hook.
Conversion 3 is provided by the to_armor
function. Within the
Superdoc workflow, this is used by the superdoc-armor
preprocessor, which is in turns run by the Haddock hook. It makes
sense to keep conversions 2 and 3 in a single module, because
they jointly define the format for the ASCII armor.
Conversion 4 is provided by the to_html
function. It is used by
the post-HsColour hook.
- type Filter a = String -> (String, a)
- filter_id :: Filter ()
- filter_handles :: Filter a -> Handle -> Handle -> IO a
- filter_file :: Filter a -> FilePath -> FilePath -> IO a
- filter_files :: Filter a -> [FilePath] -> IO [a]
- markup :: Filter (Set FilePath)
- markup_top :: ReadP (String, Set FilePath)
- lift :: ReadP String -> ReadP (String, Set FilePath)
- markup_nested :: ReadP (String, Set FilePath)
- markup_bracketed :: ReadP (String, Set FilePath)
- markup_underscore :: ReadP String
- markup_nonbracket :: ReadP String
- markup_other :: ReadP String
- markup_char :: ReadP String
- markup_tag :: ReadP (String, Set FilePath)
- markup_keyword :: ReadP String
- markup_uni :: ReadP String
- markup_uni_upper :: ReadP String
- markup_uni_lower :: ReadP String
- markup_uni_symbol :: ReadP String
- markup_symbol_digit :: ReadP Char
- markup_literal :: ReadP String
- markup_bracketed_literal :: ReadP String
- markup_body :: String -> ReadP (String, Set FilePath)
- to_html :: [Token] -> String
- to_armor :: [Token] -> String
- encode :: String -> String
Format definitions
The Superdoc markup language provides tags for superscripts, subscripts, and more. The ASCII armor format has been designed to hide Unicode characters from tools that do not understand them. The following markup is recognized:
Markup:
- [super text]: superscript.
- [sup text]: superscript. A synonym for [super text].
- [sub text]: subscript.
- [exp text]: exponential function.
- [bold text]: bold.
- [center text]: centered.
- [nobr text]: inhibit line breaks.
- [image filename]: insert image.
- [uni nnnn]: Unicode character.
- [literal text]: literal text. Brackets '[' and ']' may only occur in nested pairs.
- uni_x_nnnn_x_: armored Unicode lower-case character.
- UNI_x_nnnn_x_: armored Unicode upper-case character.
- ==|ssss|==: armored Unicode symbol and punctuation.
Here, nnnn is a decimal number representing a Unicode code point. Also ssss is an encoding of a decimal number representing a Unicode code point, using the following symbols for digits:
! = 1 ^ = 6 ? = 2 + = 7 ~ = 3 * = 8 $ = 4 - = 9 % = 5 . = 0
Filters
type Filter a = String -> (String, a) Source #
A filter is basically a function from strings to strings. Ideally a filter is lazy, so that the input string is consumed incrementally; however, this is not strictly necessary. A filter may also return another result in addition to a string.
filter_handles :: Filter a -> Handle -> Handle -> IO a Source #
Run a filter by reading from one handle and writing to another. The handles are set to binary mode.
filter_file :: Filter a -> FilePath -> FilePath -> IO a Source #
Run a filter by reading from a file and writing to another file. We do not assume that the two files are necessarily distinct, so special care is taken not to overwrite the output file until after the input file has been read.
filter_files :: Filter a -> [FilePath] -> IO [a] Source #
Run a filter on a number of files, overwriting each file in place.
Markup parser
This section defines a simple grammar and parser for the Superdoc markup language, translating it to HTML. In addition, the parser also converts ASCII-armored Unicode to HTML. This is used to post-process Haddock's output.
Top-level function
markup :: Filter (Set FilePath) Source #
The top-level parser for Superdoc markup and ASCII armor, expressed as a filter. In addition to producing HTML output, this filter also returns the set of all image files that were linked to.
Grammar definition
markup_top :: ReadP (String, Set FilePath) Source #
Top-level parser for Superdoc markup and ASCII armor.
top ::= (other | tag | uni | char)*.
lift :: ReadP String -> ReadP (String, Set FilePath) Source #
Lift a parser returning a string to a parser returning a string and an empty set.
markup_nested :: ReadP (String, Set FilePath) Source #
Like markup
, but only permit brackets "[" and "]" to occur
in nested pairs.
nested ::= (other | tag | uni | bracketed | underscore)*.
markup_bracketed :: ReadP (String, Set FilePath) Source #
Parse bracketed text.
bracketed ::= "[" nested "]".
markup_underscore :: ReadP String Source #
Parse a single underscore '_'.
underscore ::= "_".
markup_nonbracket :: ReadP String Source #
Parse any single character except '[' and ']'.
nonbracket ::= any character besides '[', ']'.
markup_other :: ReadP String Source #
Parse any sequence of non-special characters: anything but '[', 'u', 'U', '=', and ']'.
other ::= (any character besides '[', 'u', 'U', '=', ']')+.
markup_char :: ReadP String Source #
Parse any one character.
char ::= any character.
markup_keyword :: ReadP String Source #
Parse a keyword.
keyword ::= "sup" | "super" | "sub" | "exp" | "bold" | "center" | "nobr" | "image" | "uni" | "literal".
markup_uni :: ReadP String Source #
Parse an armored Unicode character.
markup_uni_upper :: ReadP String Source #
Parse an upper-case Unicode letter.
uni_upper ::= "UNI_x_" digit+ "_x_".
markup_uni_lower :: ReadP String Source #
Parse a lower-case Unicode letter.
uni_lower ::= "uni_x_" digit+ "_x_".
markup_uni_symbol :: ReadP String Source #
Parse a Unicode operator symbol.
uni_symbol ::= "==|" symbol_digit+ "|==".
markup_symbol_digit :: ReadP Char Source #
Parse a symbol encoding a decimal digit. See to_armor
for the
encoding used.
markup_literal :: ReadP String Source #
Parse any text with balanced brackets.
literal ::= (nonbracket | bracketed_literal)*.
markup_bracketed_literal :: ReadP String Source #
Parse any bracketed text with balanced brackets.
bracketed_literal ::= "[" literal "]".
markup_body :: String -> ReadP (String, Set FilePath) Source #
Parse a tag's body. What to do depends on the tag name.
body ::= nested (for keyword = sup, super, sub, exp, bold, center, nobr),
body ::= filename (for keyword = image),
body ::= digit+ (for keyword = uni).
body ::= literal (for keyword = literal).
Unicode to HTML conversion
to_html :: [Token] -> String Source #
Convert a tokenized Unicode stream into HTML entities. Non-ASCII
characters are represented as HTML entities of the form &#
nnnn;
.
Any invalid characters are simply copied to the output.
Unicode to ASCII armor conversion
to_armor :: [Token] -> String Source #
Convert a tokenized Unicode stream to ASCII armor.
The armor is designed to preserve lexical validity: thus, the armored version of a valid Haskell lower-case identifier, upper-case identifier, or operator is again a valid identifier or operator of the same kind. This makes it possible to use armored Unicode in source code as well as documentation comments.
The armoring is further designed to use only symbols that will not confuse GHC or Haddock. See ASCII armor for a description of the format.