yi-rope-0.5.0.0: A rope data structure used by Yi

Portabilityportable
Stabilityexperimental
Maintaineryi-devel@googlegroups.com
Safe HaskellNone

Yi.Rope

Contents

Description

This module defines a rope data structure for use in Yi. This specific implementation uses a fingertree over Text.

In contrast to our old implementation, we can now reap all the benefits of Text: automatic unicode handling and blazing fast implementation on underlying strings. This frees us from a lot of book-keeping. We don't lose out on not using ByteString directly because the old implementation encoded it into UTF8 anyway, making it unsuitable for storing anything but text.

Synopsis

Documentation

data YiString Source

A YiString is a FingerTree with cached column and line counts over chunks of Text.

Instances

Eq YiString

Two YiStrings are equal if their underlying text is.

Implementation note: This just uses Text equality as there is no real opportunity for optimisation here except for a cached length check first. We could unroll the trees and mess around with matching prefixes but the overhead would be higher than a simple conversion and relying on GHC optimisation.

The derived Eq implementation for the underlying tree only passes the equality check if the chunks are the same too which is not what we want.

Ord YiString 
Show YiString 
Typeable YiString 
IsString YiString 
Default YiString 
Monoid YiString 
NFData YiString 
Binary YiString

To serialise a YiString, we turn it into a regular String first.

Conversions to YiString

fromText :: Text -> YiStringSource

Converts a Text into a YiString using defaultChunkSize-sized chunks for the underlying tree.

fromString' :: Int -> String -> YiStringSource

See fromText'.

fromText' :: Int -> Text -> YiStringSource

This is like fromText but it allows the user to specify the chunk size to be used. Uses defaultChunkSize if the given size is <= 0.

Conversions from YiString

toString :: YiString -> StringSource

See toText.

toReverseString :: YiString -> StringSource

See toReverseText.

Note that it is actually ~4.5 times faster to use toReverseText and unpack the result than to convert to String and use reverse.

toText :: YiString -> TextSource

Consider whether you really need to use this!

toReverseText :: YiString -> TextSource

Spits out the underlying string, reversed.

Note that this is actually slightly faster than manually unrolling the tree from the end, reverseing each chunk and concating, at least with -O2 which you really need to be using with Text anyway.

Functions over content

null :: YiString -> BoolSource

Checks if the given YiString is actually empty.

empty :: YiStringSource

Creates an empty YiString.

take :: Int -> YiString -> YiStringSource

Takes the first n given characters.

drop :: Int -> YiString -> YiStringSource

Drops the first n characters.

length :: YiString -> IntSource

Length of the whole underlying string.

Amortized constant time.

reverse :: YiString -> YiStringSource

Reverse the whole underlying string.

This involves reversing the order of the chunks as well as content of the chunks. We use a little optimisation here that re-uses the content of each chunk but this exposes a potential problem: after many transformations, our chunks size might become quite varied (but never more than the default size), perhaps we should periodically rechunk the tree to recover nice sizes?

countNewLines :: YiString -> IntSource

Count the number of newlines in the underlying string. This is actually amortized constant time as we cache this information in the underlying tree.

lines :: YiString -> [YiString]Source

This is like lines' but it does *not* preserve newlines.

Specifically, we just strip the newlines from the result of lines'.

This behaves slightly differently than the old split: the number of resulting strings here is equal to the number of newline characters in the underlying string. This is much more consistent than the old behaviour which blindly used ByteStrings split and stitched the result back together which was inconsistent with the rest of the interface which worked with number of newlines.

lines' :: YiString -> [YiString]Source

Splits the YiString into a list of YiString each containing a line.

Note that in old implementation this allowed an arbitrary character to split on. If you want to do that, manually convert toText and use splitOn to suit your needs. This case is optimised for newlines only which seems to have been the only use of the original function.

The newlines are preserved so this should hold:

 'toText' . 'concat' . 'lines'' ≡ 'toText'

but the underlying structure might change: notably, chunks will most likely change sizes.

unlines :: [YiString] -> YiStringSource

Joins up lines by a newline character. It does not leave a newline after the last line. If you want a more classical unlines behaviour, use map along with snoc.

splitAt :: Int -> YiString -> (YiString, YiString)Source

Splits the string at given character position.

If position <= 0 then the left string is empty and the right string contains everything else.

If position >= length of the string then the left string contains everything and the right string is empty.

Implementation note: the way this works is by splitting the underlying finger at a closest chunk that goes *over* the given position (see split). This either results in a perfect split at which point we're done or more commonly, it leaves as few characters short and we need to take few characters from the first chunk of the right side of the split. We do precisely that.

All together, this split is only as expensive as underlying split, the cost of splitting a chunk into two, the cost of one cons and one cons of a chunk and lastly the cost of splitAt of the underlying Text. It turns out to be fairly fast all together.

splitAtLine :: Int -> YiString -> (YiString, YiString)Source

Splits the underlying string before the given line number. Zero-indexed lines.

Splitting at line <= 0 gives you an empty string. Splitting at n > 0 gives you the first n lines.

Also see splitAtLine'.

cons :: Char -> YiString -> YiStringSource

Add a Char in front of a YiString.

We add the character to the front of the first chunk. This does mean that a lot of cons might result in an abnormally large first chunk so if you have to do that, consider using append instead..

snoc :: YiString -> Char -> YiStringSource

Add a Char in the back of a YiString.

We add the character to the end of the last chunk. This does mean that a lot of snoc might result in an abnormally large last chunk so if you have to do that, consider using append instead..

singleton :: Char -> YiStringSource

Single character YiString. Consider whether it's worth creating this, maybe you can use cons or snoc instead?

head :: YiString -> Maybe CharSource

Take the first character of the underlying string if possible.

last :: YiString -> Maybe CharSource

Take the last character of the underlying string if possible.

append :: YiString -> YiString -> YiStringSource

Append two YiStrings.

We take the extra time to optimise this append for many small insertions. With naive append of the inner fingertree with ><, it is often the case that we end up with a large collection of tiny chunks. This function instead tries to join the underlying trees at outermost chunks up to defaultChunkSize which while slower, should improve memory usage.

I suspect that this pays for itself as we'd spend more time computing over all the little chunks than few large ones anyway.

concat :: [YiString] -> YiStringSource

Concat a list of YiStrings.

any :: (Char -> Bool) -> YiString -> BoolSource

YiString specialised any.

Implementation note: this currently just does any by doing ‘TX.Text’ conversions upon consecutive chunks. We should be able to speed it up by running it in parallel over multiple chunks.

all :: (Char -> Bool) -> YiString -> BoolSource

YiString specialised all.

See the implementation note for any.

dropWhile :: (Char -> Bool) -> YiString -> YiStringSource

The usual dropWhile optimised for YiStrings.

takeWhile :: (Char -> Bool) -> YiString -> YiStringSource

The usual takeWhile optimised for YiStrings.

dropWhileEnd :: (Char -> Bool) -> YiString -> YiStringSource

As dropWhile but drops from the end instead.

takeWhileEnd :: (Char -> Bool) -> YiString -> YiStringSource

Like takeWhile but takes from the end instead.

intercalate :: YiString -> [YiString] -> YiStringSource

Concatenates the list of YiStrings after inserting the user-provided YiString between the elements.

Empty YiStrings are not ignored and will end up as strings of length 1. If you don't want this, it's up to you to pre-process the list. Just as with intersperse, it is up to the user to pre-process the list.

intersperse :: Char -> [YiString] -> YiStringSource

Intersperses the given character between the YiStrings. This is useful when you have a bunch of strings you just want to separate something with, comma or a dash. Note that it only inserts the character between the elements.

What's more, the result is a single YiString. You can easily achieve a version that blindly inserts elements to the back by mapping over the list instead of using this function.

You can think of it as a specialised version of intercalate. Note that what this does __not__ do is intersperse characters into the underlying text, you should convert and use intersperse for that instead.

filter :: (Char -> Bool) -> YiString -> YiStringSource

Filters the characters from the underlying string.

>>> filter (/= 'a') "bac"
"bc"

map :: (Char -> Char) -> YiString -> YiStringSource

Maps the characters over the underlying string.

words :: YiString -> [YiString]Source

Splits the given YiString into a list of words, where spaces are determined by isSpace. No empty strings are in the result list.

unwords :: [YiString] -> YiStringSource

Join given YiStrings with a space. Empty lines will be filtered out first.

split :: (Char -> Bool) -> YiString -> [YiString]Source

Splits the YiString on characters matching the predicate, like split.

For splitting on newlines use lines or lines' instead.

Implementation note: GHC actually makes this naive implementation about as fast and in cases with lots of splits, faster, as a hand-rolled version on chunks with appends which is quite amazing in itself.

init :: YiString -> Maybe YiStringSource

Takes every character but the last one: returns Nothing on empty string.

tail :: YiString -> Maybe YiStringSource

Takes the tail of the underlying string. If the string is empty to begin with, returns Nothing.

span :: (Char -> Bool) -> YiString -> (YiString, YiString)Source

Returns a pair whose first element is the longest prefix (possibly empty) of t of elements that satisfy p, and whose second is the remainder of the string. See also span.

This implementation uses splitAt which actually is just as fast as hand-unrolling the tree. GHC sure is great!

break :: (Char -> Bool) -> YiString -> (YiString, YiString)Source

Just like span but with the predicate negated.

foldl' :: (a -> Char -> a) -> a -> YiString -> aSource

replicate :: Int -> YiString -> YiStringSource

Replicate the given YiString set number of times, concatenating the results. Also see replicateChar.

replicateChar :: Int -> Char -> YiStringSource

Replicate the given character set number of times and pack the result into a YiString.

>>> replicateChar 4 ' '
"    "

IO

data ConverterName Source

ConverterName is used to convey information about the underlying Converter used within the buffer to encode and decode text. It is mostly here due to the lack of Binary instance for Converter itself.

Instances

Eq ConverterName 
Ord ConverterName 
Read ConverterName 
Show ConverterName 
Typeable ConverterName 
Binary ConverterName

Simply 'put's/'get's the underlying String.

readFile :: FilePath -> IO (Either Text (YiString, ConverterName))Source

Reads file into the rope, also returning the ConverterName that was used for decoding. You should resupply this to writeFile if you're aiming to preserve the original encoding.

If we fail to guess the encoding used, error message is given instead.

It is up to the user to handle exceptions not directly related to character decoding.

writeFile :: FilePath -> YiString -> ConverterName -> IO ()Source

Writes the given YiString to the given file, according to the Converter specified by ConverterName. You can obtain a ConverterName through readFile. If you have a Converter, use writeFileWithConverter instead.

If you don't care about the encoding used such as when creating a brand new file, you can use writeFileUsingText.

It's up to the user to handle exceptions.

writeFileUsingText :: FilePath -> YiString -> IO ()Source

Write a YiString into the given file. This function uses writeFile to do the writing: if you have special needs for preserving encoding/decoding, use writeFile instead.

It's up to the user to handle exceptions.

writeFileWithConverter :: FilePath -> YiString -> Converter -> IO ()Source

As writeFile but using the provided Converter rather than creating one from a ConverterName.

It's up to the user to handle exceptions.

Escape latches to underlying content. Note that these are safe

unsafeWithText :: (Text -> Text) -> YiString -> YiStringSource

Maps over each __chunk__ which means this function is UNSAFE! If you use this with functions which don't preserve Size, that is the chunk length and number of newlines, things will break really, really badly. You should not need to use this.

Also see unsafeFmap