unbeliever-0.7.3.0: Opinionated Haskell Interoperability

Safe HaskellNone
LanguageHaskell2010

Core.Text.Rope

Contents

Description

If you're accustomed to working with text in almost any other programming language, you'd be aware that a "string" typically refers to an in-memory array of characters. Traditionally this was a single ASCII byte per character; more recently UTF-8 variable byte encodings which dramatically complicates finding offsets but which gives efficient support for the entire Unicode character space. In Haskell, the original text type, String, is implemented as a list of Char which, because a Haskell list is implemented as a linked-list of boxed values, is wildly inefficient at any kind of scale.

In modern Haskell there are two primary ways to represent text.

First is via the [rather poorly named] ByteString from the bytestring package (which is an array of bytes in pinned memory). The Data.ByteString.Char8 submodule gives you ways to manipulate those arrays as if they were ASCII characters. Confusingly there are both strict (Data.ByteString) and lazy (Data.ByteString.Lazy) variants which are often hard to tell the difference between when reading function signatures or haddock documentation. The performance problem an immutable array backed data type runs into is that appending a character (that is, ASCII byte) or concatonating a string (that is, another array of ASCII bytes) is very expensive and requires allocating a new larger array and copying the whole thing into it. This led to the development of "builders" which amortize this reallocation cost over time, but it can be cumbersome to switch between Builder, the lazy ByteString that results, and then having to inevitably convert to a strict ByteString because that's what the next function in your sequence requires.

The second way is through the opaque Text type of Data.Text from the text package, which is well tuned and high-performing but suffers from the same design; it is likewise backed by arrays. Rather surprisingly, the storage backing Text objects are encoded in UTF-16, meaning every time you want to work with unicode characters that came in from anywhere else and which inevitably are UTF-8 encoded you have to convert to UTF-16 and copy into a new array, wasting time and memory.

In this package we introduce Rope, a text type backed by the 2-3 FingerTree data structure from the fingertree package. This is not an uncommon solution in many languages as finger trees support exceptionally efficient appending to either end and good performance inserting anywhere else (you often find them as the backing data type underneath text editors for this reason). Rather than Char the pieces of the rope are ShortText from the text-short package, which are UTF-8 encoded and in normal memory managed by the Haskell runtime. Conversion from other Haskell text types is not O(1) (UTF-8 validity must be checked, or UTF-16 decoded, or...), but in our benchmarking the performance has been comparable to the established types and you may find the resultant interface for combining chunks is comparable to using a Builder, without being forced to use a Builder.

Rope is used as the text type throughout this library. If you use the functions within this package (rather than converting to other text types) operations are quite efficient. When you do need to convert to another type you can use fromRope or intoRope from the Textual typeclass.

Note that we haven't tried to cover the entire gamut of operations or customary convenience functions you would find in the other libraries; so far Rope is concentrated on aiding interoperation, being good at appending (lots of) small pieces, and then efficiently taking the resultant text object out to a file handle, be that the terminal console, a file, or a network socket.

Synopsis

Rope type

data Rope Source #

A type for textual data. A rope is text backed by a tree data structure, rather than a single large continguous array, as is the case for strings.

There are three use cases:

Referencing externally sourced data

Often we interpret large blocks of data sourced from external systems as text. Ideally we would hold onto this without copying the memory, but (as in the case of ByteString which is the most common source of data) before we can treat it as text we have to validate the UTF-8 content. Safety first. We also copy it out of pinned memory, allowing the Haskell runtime to manage the storage.

Interoperating with other libraries

The only constant of the Haskell universe is that you won't have the right combination of {strict, lazy} × {Text, ByteString, String, [Word8], etc} you need for the next function call. The Textual typeclass provides for moving between different text representations. To convert between Rope and something else use fromRope; to construct a Rope from textual content in another type use intoRope.

You can get at the underlying finger tree with the unRope function.

Assembling text to go out

This involves considerable appending of data, very very occaisionally inserting it. Often the pieces are tiny. To add text to a Rope use the append method as below or (<>) from Data.Monoid (like you would have with a Builder).

Output to a Handle can be done efficiently with hWrite.

Instances
Eq Rope Source # 
Instance details

Defined in Core.Text.Rope

Methods

(==) :: Rope -> Rope -> Bool #

(/=) :: Rope -> Rope -> Bool #

Ord Rope Source # 
Instance details

Defined in Core.Text.Rope

Methods

compare :: Rope -> Rope -> Ordering #

(<) :: Rope -> Rope -> Bool #

(<=) :: Rope -> Rope -> Bool #

(>) :: Rope -> Rope -> Bool #

(>=) :: Rope -> Rope -> Bool #

max :: Rope -> Rope -> Rope #

min :: Rope -> Rope -> Rope #

Show Rope Source # 
Instance details

Defined in Core.Text.Rope

Methods

showsPrec :: Int -> Rope -> ShowS #

show :: Rope -> String #

showList :: [Rope] -> ShowS #

IsString Rope Source # 
Instance details

Defined in Core.Text.Rope

Methods

fromString :: String -> Rope #

Generic Rope Source # 
Instance details

Defined in Core.Text.Rope

Associated Types

type Rep Rope :: Type -> Type #

Methods

from :: Rope -> Rep Rope x #

to :: Rep Rope x -> Rope #

Semigroup Rope Source # 
Instance details

Defined in Core.Text.Rope

Methods

(<>) :: Rope -> Rope -> Rope #

sconcat :: NonEmpty Rope -> Rope #

stimes :: Integral b => b -> Rope -> Rope #

Monoid Rope Source # 
Instance details

Defined in Core.Text.Rope

Methods

mempty :: Rope #

mappend :: Rope -> Rope -> Rope #

mconcat :: [Rope] -> Rope #

NFData Rope Source # 
Instance details

Defined in Core.Text.Rope

Methods

rnf :: Rope -> () #

Hashable Rope Source # 
Instance details

Defined in Core.Text.Rope

Methods

hashWithSalt :: Int -> Rope -> Int #

hash :: Rope -> Int #

ToJSON Rope Source # 
Instance details

Defined in Core.Encoding.Json

Pretty Rope Source # 
Instance details

Defined in Core.Text.Rope

Methods

pretty :: Rope -> Doc ann #

prettyList :: [Rope] -> Doc ann #

Textual Rope Source # 
Instance details

Defined in Core.Text.Rope

Render Rope Source # 
Instance details

Defined in Core.Text.Utilities

Associated Types

type Token Rope :: Type Source #

Key Rope Source # 
Instance details

Defined in Core.Data.Structures

type Rep Rope Source # 
Instance details

Defined in Core.Text.Rope

type Rep Rope = D1 (MetaData "Rope" "Core.Text.Rope" "unbeliever-0.7.3.0-3b0ma8uLQlp9wydwBiHNN0" False) (C1 (MetaCons "Rope" PrefixI False) (S1 (MetaSel (Nothing :: Maybe Symbol) NoSourceUnpackedness NoSourceStrictness DecidedStrict) (Rec0 (FingerTree Width ShortText))))
type Token Rope Source # 
Instance details

Defined in Core.Text.Utilities

type Token Rope = ()

width :: Rope -> Int Source #

Get the length of this text, in characters.

split :: Int -> Rope -> (Rope, Rope) Source #

Break the text into two pieces at the specified offset.

Examples:

λ> split 0 "abcdef"
("", "abcdef")
λ> split 3 "abcdef"
("abc", "def")
λ> split 6 "abcdef"
("abcdef","")

Going off either end behaves sensibly:

λ> split 7 "abcdef"
("abcdef","")
λ> split (-1) "abcdef"
("", "abcdef")

insert :: Int -> Rope -> Rope -> Rope Source #

Insert a new piece of text into an existing Rope at the specified offset.

Examples:

λ> insert 3 "Con" "Def 1"
"DefCon 1"
λ> insert 0 "United " "Nations"
"United Nations"

contains :: Char -> Rope -> Bool Source #

Does the text contain this character?

We've used it to ask whether there are newlines present in a Rope, for example:

    if contains '\n' text
        then handleComplexCase
        else keepItSimple

Interoperation and Output

class Textual α where Source #

Machinery to interpret a type as containing valid Unicode that can be represented as a Rope object.

Implementation notes

Given that Rope is backed by a finger tree, append is relatively inexpensive, plus whatever the cost of conversion is. There is a subtle trap, however: if adding small fragments of that were obtained by slicing (for example) a large ByteString we would end up holding on to a reference to the entire underlying block of memory. This module is optimized to reduce heap fragmentation by letting the Haskell runtime and garbage collector manage the memory, so instances are expected to copy these substrings out of pinned memory.

The ByteString instance requires that its content be valid UTF-8. If not an empty Rope will be returned.

Several of the fromRope implementations are expensive and involve a lot of intermiate allocation and copying. If you're ultimately writing to a handle prefer hWrite which will write directly to the output buffer.

Minimal complete definition

fromRope, intoRope

Methods

fromRope :: Rope -> α Source #

Convert a Rope into another text-like type.

intoRope :: α -> Rope Source #

Take another text-like type and convert it to a Rope.

append :: α -> Rope -> Rope Source #

Append some text to this Rope. The default implementation is basically a convenience wrapper around calling intoRope and mappending it to your text (which will work just fine, but for some types more efficient implementations are possible).

Instances
Textual ByteString Source #

from Data.ByteString.Lazy

Instance details

Defined in Core.Text.Rope

Textual ByteString Source #

from Data.ByteString Strict

Instance details

Defined in Core.Text.Rope

Textual Text Source #

from Data.Text Strict

Instance details

Defined in Core.Text.Rope

Textual Text Source #

from Data.Text.Lazy

Instance details

Defined in Core.Text.Rope

Textual ShortText Source #

from Data.Text.Short

Instance details

Defined in Core.Text.Rope

Textual Rope Source # 
Instance details

Defined in Core.Text.Rope

Textual JsonKey Source # 
Instance details

Defined in Core.Encoding.Json

Textual [Char] Source #

from Data.String

Instance details

Defined in Core.Text.Rope

Textual (FingerTree Width ShortText) Source # 
Instance details

Defined in Core.Text.Rope

unsafeIntoRope :: ByteString -> Rope Source #

If you know the input bytes are valid UTF-8 encoded characters, then you can use this function to convert to a piece of Rope.

hWrite :: Handle -> Rope -> IO () Source #

Write the Rope to the given Handle.

import Core.Text
import Core.System -- re-exports stdout

main :: IO ()
main =
  let
    text :: Rope
    text = "Hello World"
  in
    hWrite stdout text

because it's tradition.

Uses hPutBuilder internally which saves all kinds of intermediate allocation and copying because we can go from the ShortTexts in the finger tree to ShortByteString to Builder to the Handle's output buffer in one go.

If you're working in the Program monad, then write provides an efficient way to write a Rope to stdout.

Internals

unRope :: Rope -> FingerTree Width ShortText Source #

Access the finger tree underlying the Rope. You'll want the following imports:

import qualified Data.FingerTree as F  -- from the fingertree package
import qualified Data.Text.Short as S  -- from the text-short package

newtype Width Source #

The length of the Rope, in characters. This is the monoid used to structure the finger tree underlying the Rope.

Constructors

Width Int 
Instances
Eq Width Source # 
Instance details

Defined in Core.Text.Rope

Methods

(==) :: Width -> Width -> Bool #

(/=) :: Width -> Width -> Bool #

Num Width Source # 
Instance details

Defined in Core.Text.Rope

Ord Width Source # 
Instance details

Defined in Core.Text.Rope

Methods

compare :: Width -> Width -> Ordering #

(<) :: Width -> Width -> Bool #

(<=) :: Width -> Width -> Bool #

(>) :: Width -> Width -> Bool #

(>=) :: Width -> Width -> Bool #

max :: Width -> Width -> Width #

min :: Width -> Width -> Width #

Show Width Source # 
Instance details

Defined in Core.Text.Rope

Methods

showsPrec :: Int -> Width -> ShowS #

show :: Width -> String #

showList :: [Width] -> ShowS #

Generic Width Source # 
Instance details

Defined in Core.Text.Rope

Associated Types

type Rep Width :: Type -> Type #

Methods

from :: Width -> Rep Width x #

to :: Rep Width x -> Width #

Semigroup Width Source # 
Instance details

Defined in Core.Text.Rope

Methods

(<>) :: Width -> Width -> Width #

sconcat :: NonEmpty Width -> Width #

stimes :: Integral b => b -> Width -> Width #

Monoid Width Source # 
Instance details

Defined in Core.Text.Rope

Methods

mempty :: Width #

mappend :: Width -> Width -> Width #

mconcat :: [Width] -> Width #

Measured Width ShortText Source # 
Instance details

Defined in Core.Text.Rope

Methods

measure :: ShortText -> Width #

Textual (FingerTree Width ShortText) Source # 
Instance details

Defined in Core.Text.Rope

type Rep Width Source # 
Instance details

Defined in Core.Text.Rope

type Rep Width = D1 (MetaData "Width" "Core.Text.Rope" "unbeliever-0.7.3.0-3b0ma8uLQlp9wydwBiHNN0" True) (C1 (MetaCons "Width" PrefixI False) (S1 (MetaSel (Nothing :: Maybe Symbol) NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Int)))

Orphan instances

Measured Width ShortText Source # 
Instance details

Methods

measure :: ShortText -> Width #