Portability | GHC |
---|---|
Stability | unstable |
Maintainer | stephen.tetley@gmail.com |
Extended character code handling.
Wumpus uses an escaping style derived from SVG to embed character codes and PostScript glyph names in regular strings.
"regular ascii text & more ascii text"
i.e. character codes are delimited by &#
on the
left and ;
on the right.
Glyph names are delimited by &
on the left and ;
on the
right.
"regular ascii text &ersand; more ascii text"
Note that glyph names ** should always ** correspond to PostScript glyph names not SVG / HTML glyph names.
In Wumpus both glyph names and character codes can
be embedded in strings - (e.g. è
or è
)
although glyph names are preferred for PostScript (see below).
Character codes can be also be expressed as octal or hexadecimal numbers:
myst�o350;re
myst�xE8;re
In the generated PostScript, Wumpus uses the character name, e.g.:
(myst) show /egrave glyphshow (re) show
The generated SVG uses the numeric code, e.g.:
mystère
Unless you are generating only SVG, you should favour glyph names rather than code points as they are unambiguously interpreted by Wumpus. Character codes are context-dependent on the encoding of the font used to render the text. Standard fonts (e.g. Helvetica, Times, Courier) use the Standard Encoding is which has some differences to the common Latin1 character set.
Unfortunately if a glyph is not present in a font it cannot be rendered in PostScript. Wumpus-Core is oblivious to the contents of fonts, it does not warn about missing glyphs or attempt to substitute them.
- data EscapedText
- data EscapedChar
- type EncodingVector = IntMap String
- escapeString :: String -> EscapedText
- wrapEscChar :: EscapedChar -> EscapedText
- destrEscapedText :: ([EscapedChar] -> a) -> EscapedText -> a
- textLength :: EscapedText -> Int
Documentation
data EscapedText Source
Internal string representation for Wumpus-Core.
EscapedText
is a list of characters, where each character
may be either a regular character, an integer representing a
Unicode code-point or a PostScript glyph name.
data EscapedChar Source
Internal character representation for Wumpus-Core.
An EscapedChar
may be either a regular character, an integer
representing a Unicode code-point or a PostScript glyph
name.
PostScript glyph names are generally made up only of chars
[a-zA-Z]
.
type EncodingVector = IntMap StringSource
EncodingVecor
- a map from code point to PostScript glyph
name.
escapeString :: String -> EscapedTextSource
escapeString
input is regular text and escaped glyph names
or decimal character codes. Escaping in the input string should
follow the SVG convention - the escape sequence starts with
&
(ampresand) for glyph names or &#
(ampersand hash) for
char codes and ends with ;
(semicolon).
Escaped characters are output to PostScript as their respective glyph names:
/egrave glyphshow
Escaped chararacters are output to SVG as an escaped decimal, e.g.:
è
Note - for SVG output, Wumpus automatically escapes characters
where the char code is above 128. This is the convention used
by the Text.XHtml
library.
wrapEscChar :: EscapedChar -> EscapedTextSource
Build an EscapedText
from a single EscChar
.
destrEscapedText :: ([EscapedChar] -> a) -> EscapedText -> aSource
Destructor for EscapedText
.
textLength :: EscapedText -> IntSource
Get the character count of an EscapedText
string.