Z-Data-0.6.1.0: Array, vector and text
Copyright(c) Dong Han 2017-2020
LicenseBSD
Maintainerwinterland1989@gmail.com
Stabilityexperimental
Portabilitynon-portable
Safe HaskellNone
LanguageHaskell2010

Z.Data.Text

Description

A Text wrap a Bytes which will be interpreted using UTF-8 encoding. User should always use validate / validateMaybe to construt a Text (instead of using construtor directly or coercing), otherwise illegal UTF-8 encoded codepoints will cause undefined behaviours.

This library also provide simple unicode processing based on utf8rewind, see normalize, caseFold (current using unicode 13 databases).

Synopsis

Text type

data Text Source #

Text represented as UTF-8 encoded Bytes

Instances

Instances details
IsList Text Source # 
Instance details

Defined in Z.Data.Text.Base

Associated Types

type Item Text #

Methods

fromList :: [Item Text] -> Text #

fromListN :: Int -> [Item Text] -> Text #

toList :: Text -> [Item Text] #

Eq Text Source # 
Instance details

Defined in Z.Data.Text.Base

Methods

(==) :: Text -> Text -> Bool #

(/=) :: Text -> Text -> Bool #

Ord Text Source # 
Instance details

Defined in Z.Data.Text.Base

Methods

compare :: Text -> Text -> Ordering #

(<) :: Text -> Text -> Bool #

(<=) :: Text -> Text -> Bool #

(>) :: Text -> Text -> Bool #

(>=) :: Text -> Text -> Bool #

max :: Text -> Text -> Text #

min :: Text -> Text -> Text #

Read Text Source #

Accepted syntax and escaping rules are same with String, which is different from Show instance.

Instance details

Defined in Z.Data.Text.Base

Show Text Source #

The escaping rules is different from String 's Show instance, see "Z.Data.Text.Builder.escapeTextJSON"

Instance details

Defined in Z.Data.Text.Base

Methods

showsPrec :: Int -> Text -> ShowS #

show :: Text -> String #

showList :: [Text] -> ShowS #

IsString Text Source # 
Instance details

Defined in Z.Data.Text.Base

Methods

fromString :: String -> Text #

Semigroup Text Source # 
Instance details

Defined in Z.Data.Text.Base

Methods

(<>) :: Text -> Text -> Text #

sconcat :: NonEmpty Text -> Text #

stimes :: Integral b => b -> Text -> Text #

Monoid Text Source # 
Instance details

Defined in Z.Data.Text.Base

Methods

mempty :: Text #

mappend :: Text -> Text -> Text #

mconcat :: [Text] -> Text #

Arbitrary Text Source # 
Instance details

Defined in Z.Data.Text.Base

Methods

arbitrary :: Gen Text #

shrink :: Text -> [Text] #

CoArbitrary Text Source # 
Instance details

Defined in Z.Data.Text.Base

Methods

coarbitrary :: Text -> Gen b -> Gen b #

FoldCase Text Source #

case fold with default locale.

Instance details

Defined in Z.Data.Text.Base

Methods

foldCase :: Text -> Text #

foldCaseList :: [Text] -> [Text]

NFData Text Source # 
Instance details

Defined in Z.Data.Text.Base

Methods

rnf :: Text -> () #

Hashable Text Source # 
Instance details

Defined in Z.Data.Text.Base

Methods

hashWithSalt :: Int -> Text -> Int #

hash :: Text -> Int #

Print Text Source #

The escaping rules is same with Show instance: we reuse JSON escaping rules here, so it will be faster.

Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Text -> Builder () Source #

JSON Text Source # 
Instance details

Defined in Z.Data.JSON.Base

JSON a => JSON (Map Text a) Source # 
Instance details

Defined in Z.Data.JSON.Base

JSON a => JSON (HashMap Text a) Source #

default instance prefer later key

Instance details

Defined in Z.Data.JSON.Base

JSON a => JSON (FlatMap Text a) Source #

default instance prefer later key

Instance details

Defined in Z.Data.JSON.Base

type Item Text Source # 
Instance details

Defined in Z.Data.Text.Base

type Item Text = Char

getUTF8Bytes :: Text -> Bytes Source #

Extract UTF-8 encoded Bytes from Text

validate :: HasCallStack => Bytes -> Text Source #

O(n) Validate a sequence of bytes is UTF-8 encoded.

Throw InvalidUTF8Exception in case of invalid codepoint.

validateASCII :: HasCallStack => Bytes -> Text Source #

O(n) Validate a sequence of bytes is all ascii char byte(<128).

Throw InvalidASCIIException in case of invalid byte, It's not always faster than validate, use it only if you want to validate ASCII char sequences.

validateMaybe :: Bytes -> Maybe Text Source #

O(n) Validate a sequence of bytes is UTF-8 encoded.

Return Nothing in case of invalid codepoint.

validateASCIIMaybe :: Bytes -> Maybe Text Source #

O(n) Validate a sequence of bytes is all ascii char byte(<128).

Return Nothing in case of invalid byte.

index :: HasCallStack => Text -> Int -> Char Source #

O(n) Get the nth codepoint from Text, throw IndexOutOfTextRange when out of bound.

indexMaybe :: Text -> Int -> Maybe Char Source #

O(n) Get the nth codepoint from Text.

indexR :: HasCallStack => Text -> Int -> Char Source #

O(n) Get the nth codepoint from Text counting from the end, throw IndexOutOfVectorRange n callStack when out of bound.

indexMaybeR :: Text -> Int -> Maybe Char Source #

O(n) Get the nth codepoint from Text counting from the end.

Basic creating

empty :: Text Source #

O(1). Empty text.

singleton :: Char -> Text Source #

O(1). Single char text.

copy :: Text -> Text Source #

O(n). Copy a text from slice.

replicate :: Int -> Char -> Text Source #

O(n) replicate char n time.

cycleN :: Int -> Text -> Text Source #

O(n*m) cycleN a text n times.

Conversion between list

pack :: String -> Text Source #

O(n) Convert a string into a text

Alias for packN defaultInitSize, will be rewritten to a memcpy if possible.

packN :: Int -> String -> Text Source #

O(n) Convert a list into a text with an approximate size(in bytes, not codepoints).

If the encoded bytes length is larger than the size given, we simply double the buffer size and continue building.

This function is a good consumer in the sense of build/foldr fusion.

packR :: String -> Text Source #

O(n) Alias for packRN defaultInitSize.

packRN :: Int -> String -> Text Source #

O(n) packN in reverse order.

This function is a good consumer in the sense of build/foldr fusion.

unpack :: Text -> String Source #

O(n) Convert text to a char list.

Unpacking is done lazily. i.e. we will retain reference to the array until all element are consumed.

This function is a good producer in the sense of build/foldr fusion.

unpackR :: Text -> String Source #

O(n) Convert text to a list in reverse order.

This function is a good producer in the sense of build/foldr fusion.

Conversion between codepoint vector

fromVector :: PrimVector Char -> Text Source #

O(n) convert from a char vector.

toVector :: Text -> PrimVector Char Source #

O(n) convert to a char vector.

Print class

class Print a where Source #

A class similar to Show, serving the purpose that quickly convert a data type to a Text value.

You can use newtype or generic deriving to implement instance of this class quickly:

 {-# LANGUAGE GeneralizedNewtypeDeriving #-}
 {-# LANGUAGE DeriveAnyClass #-}
 {-# LANGUAGE DeriveGeneric #-}
 {-# LANGUAGE DerivingStrategies #-}

 import GHC.Generics

 newtype FooInt = FooInt Int deriving (Generic)
                           deriving anyclass Print

> toText (FooInt 3)
> "FooInt 3"

 newtype FooInt = FooInt Int deriving (Generic)
                           deriving newtype Print

> toText (FooInt 3)
> "3"

Minimal complete definition

Nothing

Methods

toUTF8BuilderP :: Int -> a -> Builder () Source #

Convert data to Builder with precendence.

You should return a Builder writing in UTF8 encoding only, i.e.

Z.Data.Text.validateMaybe (Z.Data.Builder.buildBytes (toUTF8BuilderP p a)) /= Nothing

default toUTF8BuilderP :: (Generic a, GToText (Rep a)) => Int -> a -> Builder () Source #

Instances

Instances details
Print Bool Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Bool -> Builder () Source #

Print Char Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Char -> Builder () Source #

Print Double Source # 
Instance details

Defined in Z.Data.Text.Print

Print Float Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Float -> Builder () Source #

Print Int Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Int -> Builder () Source #

Print Int8 Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Int8 -> Builder () Source #

Print Int16 Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Int16 -> Builder () Source #

Print Int32 Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Int32 -> Builder () Source #

Print Int64 Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Int64 -> Builder () Source #

Print Integer Source # 
Instance details

Defined in Z.Data.Text.Print

Print Natural Source # 
Instance details

Defined in Z.Data.Text.Print

Print Ordering Source # 
Instance details

Defined in Z.Data.Text.Print

Print Word Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Word -> Builder () Source #

Print Word8 Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Word8 -> Builder () Source #

Print Word16 Source # 
Instance details

Defined in Z.Data.Text.Print

Print Word32 Source # 
Instance details

Defined in Z.Data.Text.Print

Print Word64 Source # 
Instance details

Defined in Z.Data.Text.Print

Print CallStack Source # 
Instance details

Defined in Z.Data.Text.Print

Print () Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> () -> Builder () Source #

Print Version Source # 
Instance details

Defined in Z.Data.Text.Print

Print ExitCode Source # 
Instance details

Defined in Z.Data.Text.Print

Print CChar Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> CChar -> Builder () Source #

Print CSChar Source # 
Instance details

Defined in Z.Data.Text.Print

Print CUChar Source # 
Instance details

Defined in Z.Data.Text.Print

Print CShort Source # 
Instance details

Defined in Z.Data.Text.Print

Print CUShort Source # 
Instance details

Defined in Z.Data.Text.Print

Print CInt Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> CInt -> Builder () Source #

Print CUInt Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> CUInt -> Builder () Source #

Print CLong Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> CLong -> Builder () Source #

Print CULong Source # 
Instance details

Defined in Z.Data.Text.Print

Print CLLong Source # 
Instance details

Defined in Z.Data.Text.Print

Print CULLong Source # 
Instance details

Defined in Z.Data.Text.Print

Print CBool Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> CBool -> Builder () Source #

Print CFloat Source # 
Instance details

Defined in Z.Data.Text.Print

Print CDouble Source # 
Instance details

Defined in Z.Data.Text.Print

Print CPtrdiff Source # 
Instance details

Defined in Z.Data.Text.Print

Print CSize Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> CSize -> Builder () Source #

Print CWchar Source # 
Instance details

Defined in Z.Data.Text.Print

Print CSigAtomic Source # 
Instance details

Defined in Z.Data.Text.Print

Print CClock Source # 
Instance details

Defined in Z.Data.Text.Print

Print CTime Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> CTime -> Builder () Source #

Print CUSeconds Source # 
Instance details

Defined in Z.Data.Text.Print

Print CSUSeconds Source # 
Instance details

Defined in Z.Data.Text.Print

Print CIntPtr Source # 
Instance details

Defined in Z.Data.Text.Print

Print CUIntPtr Source # 
Instance details

Defined in Z.Data.Text.Print

Print CIntMax Source # 
Instance details

Defined in Z.Data.Text.Print

Print CUIntMax Source # 
Instance details

Defined in Z.Data.Text.Print

Print Scientific Source # 
Instance details

Defined in Z.Data.Text.Print

Print Text Source #

The escaping rules is same with Show instance: we reuse JSON escaping rules here, so it will be faster.

Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Text -> Builder () Source #

Print FlatIntSet Source # 
Instance details

Defined in Z.Data.Vector.FlatIntSet

Print RegexOpts Source # 
Instance details

Defined in Z.Data.Text.Regex

Print Regex Source # 
Instance details

Defined in Z.Data.Text.Regex

Methods

toUTF8BuilderP :: Int -> Regex -> Builder () Source #

Print Value Source # 
Instance details

Defined in Z.Data.JSON.Value

Methods

toUTF8BuilderP :: Int -> Value -> Builder () Source #

Print ConvertError Source # 
Instance details

Defined in Z.Data.JSON.Converter

Print HexBytes Source # 
Instance details

Defined in Z.Data.Vector.Hex

Print CBytes Source #

This instance provide UTF8 guarantee, illegal codepoints will be written as replacementChars.

Escaping rule is same with String.

Instance details

Defined in Z.Data.CBytes

Print a => Print [a] Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> [a] -> Builder () Source #

Print a => Print (Maybe a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Maybe a -> Builder () Source #

(Print a, Integral a) => Print (Ratio a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Ratio a -> Builder () Source #

Print (Ptr a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Ptr a -> Builder () Source #

Print (ForeignPtr a) Source # 
Instance details

Defined in Z.Data.Text.Print

Print a => Print (Min a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Min a -> Builder () Source #

Print a => Print (Max a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Max a -> Builder () Source #

Print a => Print (First a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> First a -> Builder () Source #

Print a => Print (Last a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Last a -> Builder () Source #

Print a => Print (WrappedMonoid a) Source # 
Instance details

Defined in Z.Data.Text.Print

Print a => Print (Identity a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Identity a -> Builder () Source #

Print a => Print (First a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> First a -> Builder () Source #

Print a => Print (Last a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Last a -> Builder () Source #

Print a => Print (Dual a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Dual a -> Builder () Source #

Print a => Print (NonEmpty a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> NonEmpty a -> Builder () Source #

(Prim a, Print a) => Print (PrimArray a) Source # 
Instance details

Defined in Z.Data.Text.Print

Print a => Print (SmallArray a) Source # 
Instance details

Defined in Z.Data.Text.Print

Print a => Print (Array a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Array a -> Builder () Source #

(Prim a, Print a) => Print (PrimVector a) Source # 
Instance details

Defined in Z.Data.Text.Print

Print a => Print (Vector a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Vector a -> Builder () Source #

Print v => Print (FlatSet v) Source # 
Instance details

Defined in Z.Data.Vector.FlatSet

Methods

toUTF8BuilderP :: Int -> FlatSet v -> Builder () Source #

Print v => Print (FlatIntMap v) Source # 
Instance details

Defined in Z.Data.Vector.FlatIntMap

(Print a, Print b) => Print (Either a b) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Either a b -> Builder () Source #

(Print a, Print b) => Print (a, b) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> (a, b) -> Builder () Source #

HasResolution a => Print (Fixed a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Fixed a -> Builder () Source #

Print (Proxy a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Proxy a -> Builder () Source #

(PrimUnlifted a, Print a) => Print (UnliftedArray a) Source # 
Instance details

Defined in Z.Data.Text.Print

(Print k, Print v) => Print (FlatMap k v) Source # 
Instance details

Defined in Z.Data.Vector.FlatMap

Methods

toUTF8BuilderP :: Int -> FlatMap k v -> Builder () Source #

(Print a, Print b, Print c) => Print (a, b, c) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> (a, b, c) -> Builder () Source #

Print a => Print (Const a b) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Const a b -> Builder () Source #

Print b => Print (Tagged a b) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Tagged a b -> Builder () Source #

(Print a, Print b, Print c, Print d) => Print (a, b, c, d) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> (a, b, c, d) -> Builder () Source #

(Print (f a), Print (g a)) => Print (Product f g a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Product f g a -> Builder () Source #

(Print (f a), Print (g a), Print a) => Print (Sum f g a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Sum f g a -> Builder () Source #

(Print a, Print b, Print c, Print d, Print e) => Print (a, b, c, d, e) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> (a, b, c, d, e) -> Builder () Source #

Print (f (g a)) => Print (Compose f g a) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> Compose f g a -> Builder () Source #

(Print a, Print b, Print c, Print d, Print e, Print f) => Print (a, b, c, d, e, f) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> (a, b, c, d, e, f) -> Builder () Source #

(Print a, Print b, Print c, Print d, Print e, Print f, Print g) => Print (a, b, c, d, e, f, g) Source # 
Instance details

Defined in Z.Data.Text.Print

Methods

toUTF8BuilderP :: Int -> (a, b, c, d, e, f, g) -> Builder () Source #

toText :: Print a => a -> Text Source #

Convert data to Text.

toString :: Print a => a -> String Source #

Convert data to String, faster show replacement.

toUTF8Builder :: Print a => a -> Builder () Source #

Convert data to Builder.

toUTF8Bytes :: Print a => a -> Bytes Source #

Convert data to Bytes in UTF8 encoding.

Basic interface

null :: Text -> Bool Source #

O(1) Test whether a text is empty.

length :: Text -> Int Source #

O(n) The char length of a text.

append :: Text -> Text -> Text Source #

O(m+n)

There's no need to guard empty vector because we guard them for you, so appending empty text are no-ops.

map' :: (Char -> Char) -> Text -> Text Source #

O(n) map f t is the Text obtained by applying f to each char of t. Performs replacement on invalid scalar values.

imap' :: (Int -> Char -> Char) -> Text -> Text Source #

Strict mapping with index.

foldl' :: (b -> Char -> b) -> b -> Text -> b Source #

Strict left to right fold.

ifoldl' :: (b -> Int -> Char -> b) -> b -> Text -> b Source #

Strict left to right fold with index.

foldr' :: (Char -> b -> b) -> b -> Text -> b Source #

Strict right to left fold

ifoldr' :: (Int -> Char -> b -> b) -> b -> Text -> b Source #

Strict right to left fold with index

NOTE: the index is counting from 0, not backwards

concat :: [Text] -> Text Source #

O(n) Concatenate a list of text.

Note: concat have to force the entire list to filter out empty text and calculate the length for allocation.

concatMap :: (Char -> Text) -> Text -> Text Source #

Map a function over a text and concatenate the results

Special folds

count :: Char -> Text -> Int Source #

O(n) count returns count of an element from a text.

all :: (Char -> Bool) -> Text -> Bool Source #

O(n) Applied to a predicate and text, all determines if all chars of the text satisfy the predicate.

any :: (Char -> Bool) -> Text -> Bool Source #

O(n) Applied to a predicate and a text, any determines if any chars of the text satisfy the predicate.

Text display width

displayWidth :: Text -> Int Source #

Get the display width of a piece of text.

You shouldn't pass texts with control characters(<0x20, \DEL), which are counted with -1 width.

>>> displayWidth "你好世界!"
>>> 10
>>> displayWidth "hello world!"
>>> 12

Slice manipulation

cons :: Char -> Text -> Text Source #

O(n) cons is analogous to (:) for lists, but of different complexity, as it requires making a copy.

snoc :: Text -> Char -> Text Source #

O(n) Append a char to the end of a text.

uncons :: Text -> Maybe (Char, Text) Source #

O(1) Extract the head and tail of a text, return Nothing if it is empty.

unsnoc :: Text -> Maybe (Text, Char) Source #

O(1) Extract the init and last of a text, return Nothing if text is empty.

headMaybe :: Text -> Maybe Char Source #

O(1) Extract the first char of a text.

tailMayEmpty :: Text -> Text Source #

O(1) Extract the chars after the head of a text.

NOTE: tailMayEmpty return empty text in the case of an empty text.

lastMaybe :: Text -> Maybe Char Source #

O(1) Extract the last char of a text.

initMayEmpty :: Text -> Text Source #

O(1) Extract the chars before of the last one.

NOTE: initMayEmpty return empty text in the case of an empty text.

head :: Text -> Char Source #

O(1) Extract the first char of a text.

Throw EmptyText if text is empty.

tail :: Text -> Text Source #

O(1) Extract the chars after the head of a text.

Throw EmptyText if text is empty.

last :: Text -> Char Source #

O(1) Extract the last char of a text.

Throw EmptyText if text is empty.

init :: Text -> Text Source #

O(1) Extract the chars before of the last one.

Throw EmptyText if text is empty.

inits :: Text -> [Text] Source #

O(n) Return all initial segments of the given text, empty first.

tails :: Text -> [Text] Source #

O(n) Return all final segments of the given text, whole text first.

take :: Int -> Text -> Text Source #

O(1) take n, applied to a text xs, returns the prefix of xs of length n, or xs itself if n > length xs.

drop :: Int -> Text -> Text Source #

O(1) drop n xs returns the suffix of xs after the first n char, or [] if n > length xs.

takeR :: Int -> Text -> Text Source #

O(1) takeR n, applied to a text xs, returns the suffix of xs of length n, or xs itself if n > length xs.

dropR :: Int -> Text -> Text Source #

O(1) dropR n xs returns the prefix of xs before the last n char, or [] if n > length xs.

slice :: Int -> Int -> Text -> Text Source #

O(1) Extract a sub-range text with give start index and length.

This function is a total function just like 'takedrop', indexlength exceeds range will be ingored, e.g.

slice 1 3 "hello"   == "ell"
slice -1 -1 "hello" == ""
slice -2 2 "hello"  == ""
slice 2 10 "hello"  == "llo"

This holds for all x y: slice x y vs == drop x . take (x+y) vs

splitAt :: Int -> Text -> (Text, Text) Source #

O(n) splitAt n xs is equivalent to (take n xs, drop n xs).

takeWhile :: (Char -> Bool) -> Text -> Text Source #

O(n) Applied to a predicate p and a text t, returns the longest prefix (possibly empty) of t of elements that satisfy p.

takeWhileR :: (Char -> Bool) -> Text -> Text Source #

O(n) Applied to a predicate p and a text t, returns the longest suffix (possibly empty) of t of elements that satisfy p.

dropWhile :: (Char -> Bool) -> Text -> Text Source #

O(n) Applied to a predicate p and a text vs, returns the suffix (possibly empty) remaining after takeWhile p vs.

dropWhileR :: (Char -> Bool) -> Text -> Text Source #

O(n) Applied to a predicate p and a text vs, returns the prefix (possibly empty) remaining before takeWhileR p vs.

dropAround :: (Char -> Bool) -> Text -> Text Source #

O(n) dropAround f = dropWhile f . dropWhileR f

break :: (Char -> Bool) -> Text -> (Text, Text) Source #

O(n) Split the text into the longest prefix of elements that do not satisfy the predicate and the rest without copying.

span :: (Char -> Bool) -> Text -> (Text, Text) Source #

O(n) Split the text into the longest prefix of elements that satisfy the predicate and the rest without copying.

breakR :: (Char -> Bool) -> Text -> (Text, Text) Source #

breakR behaves like break but from the end of the text.

breakR p == spanR (not.p)

spanR :: (Char -> Bool) -> Text -> (Text, Text) Source #

spanR behaves like span but from the end of the text.

breakOn :: Text -> Text -> (Text, Text) Source #

Break a text on a subtext, returning a pair of the part of the text prior to the match, and the rest of the text, e.g.

break "wor" "hello, world" = ("hello, ", "world")

breakOnAll Source #

Arguments

:: Text

needle to search for

-> Text

haystack in which to search

-> [(Text, Text)] 

O(n+m) Find all non-overlapping instances of needle in haystack. Each element of the returned list consists of a pair:

  • The entire string prior to the kth match (i.e. the prefix)
  • The kth match, followed by the remainder of the string

Examples:

breakOnAll "::" ""
==> []
breakOnAll "" "abc"
==> [("a", "bc"), ("ab", "c"), ("abc", "/")]

The result list is lazy, search is performed when you force the list.

group :: Text -> [Text] Source #

The group function takes a text and returns a list of texts such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,

group Mississippi = [M,"i","ss","i","ss","i","pp","i"]

It is a special case of groupBy, which allows the programmer to supply their own equality test.

groupBy :: (Char -> Char -> Bool) -> Text -> [Text] Source #

The groupBy function is the non-overloaded version of group.

stripPrefix :: Text -> Text -> Maybe Text Source #

O(n) The stripPrefix function takes two texts and returns Just the remainder of the second iff the first is its prefix, and otherwise Nothing.

stripSuffix :: Text -> Text -> Maybe Text Source #

O(n) The stripSuffix function takes two texts and returns Just the remainder of the second iff the first is its suffix, and otherwise Nothing.

split :: Char -> Text -> [Text] Source #

O(n) Break a text into pieces separated by the delimiter element consuming the delimiter. I.e.

split '\n' "a\nb\nd\ne" == ["a","b","d","e"]
split 'a'  "aXaXaXa"    == ["","X","X","X",""]
split 'x'  "x"          == ["",""]

and

intercalate [c] . split c == id
split == splitWith . (==)

NOTE, this function behavior different with bytestring's. see #56.

splitWith :: (Char -> Bool) -> Text -> [Text] Source #

O(n) Splits a text into components delimited by separators, where the predicate returns True for a separator char. The resulting components do not contain the separators. Two adjacent separators result in an empty component in the output. eg.

splitWith (=='a') "aabbaca" == ["","","bb","c",""]
splitWith (=='a') []        == [""]

splitOn :: Text -> Text -> [Text] Source #

O(m+n) Break haystack into pieces separated by needle.

Note: An empty needle will essentially split haystack element by element.

Examples:

>>> splitOn "\r\n" "a\r\nb\r\nd\r\ne"
["a","b","d","e"]
>>> splitOn "aaa"  "aaaXaaaXaaaXaaa"
["","X","X","X",""]
>>> splitOn "x"  "x"
["",""]

and

intercalate s . splitOn s         == id
splitOn (singleton c)             == split (==c)

isPrefixOf :: Text -> Text -> Bool Source #

The isPrefix function returns True if the first argument is a prefix of the second.

isSuffixOf :: Text -> Text -> Bool Source #

O(n) The isSuffixOf function takes two text and returns True if the first is a suffix of the second.

isInfixOf :: Text -> Text -> Bool Source #

Check whether one text is a subtext of another.

needle isInfixOf haystack === null haystack || indices needle haystake /= [].

commonPrefix :: Text -> Text -> (Text, Text, Text) Source #

O(n) Find the longest non-empty common prefix of two strings and return it, along with the suffixes of each string at which they no longer match. e.g.

>>> commonPrefix "foobar" "fooquux"
("foo","bar","quux")
>>> commonPrefix "veeble" "fetzer"
("","veeble","fetzer")

words :: Text -> [Text] Source #

O(n) Breaks a Bytes up into a list of words, delimited by unicode space.

lines :: Text -> [Text] Source #

O(n) Breaks a text up into a list of lines, delimited by ascii n.

unwords :: [Text] -> Text Source #

O(n) Joins words with ascii space.

unlines :: [Text] -> Text Source #

O(n) Joins lines with ascii n.

NOTE: This functions is different from unlines, it DOES NOT add a trailing n.

padLeft :: Int -> Char -> Text -> Text Source #

Add padding to the left so that the whole text's length is at least n.

padRight :: Int -> Char -> Text -> Text Source #

Add padding to the right so that the whole text's length is at least n.

Transform

reverse :: Text -> Text Source #

O(n) Reverse the characters of a string.

intersperse :: Char -> Text -> Text Source #

O(n) The intersperse function takes a character and places it between the characters of a Text. Performs replacement on invalid scalar values.

intercalate :: Text -> [Text] -> Text Source #

O(n) The intercalate function takes a Text and a list of Texts and concatenates the list after interspersing the first argument between each element of the list.

transpose :: [Text] -> [Text] Source #

The transpose function transposes the rows and columns of its text argument.

Search

searching by equality

elem :: Char -> Text -> Bool Source #

O(n) elem test if given char is in given text.

notElem :: Char -> Text -> Bool Source #

O(n) not . elem

element-wise search

find Source #

Arguments

:: (Char -> Bool) 
-> Text 
-> (Int, Maybe Char)

(char index, matching char)

O(n) find the first char matching the predicate in a text from left to right, if there isn't one, return the text length.

findR Source #

Arguments

:: (Char -> Bool) 
-> Text 
-> (Int, Maybe Char)

(char index(counting backwards), matching char)

O(n) find the first char matching the predicate in a text from right to left.

filter :: (Char -> Bool) -> Text -> Text Source #

O(n) filter, applied to a predicate and a text, returns a text containing those chars that satisfy the predicate.

partition :: (Char -> Bool) -> Text -> (Text, Text) Source #

O(n) The partition function takes a predicate, a text, returns a pair of text with codepoints which do and do not satisfy the predicate, respectively; i.e.,

partition p txt == (filter p txt, filter (not . p) txt)

Unicode processing

normalization

data NormalizationResult Source #

Instances

Instances details
Eq NormalizationResult Source # 
Instance details

Defined in Z.Data.Text.UTF8Rewind

Ord NormalizationResult Source # 
Instance details

Defined in Z.Data.Text.UTF8Rewind

Show NormalizationResult Source # 
Instance details

Defined in Z.Data.Text.UTF8Rewind

Generic NormalizationResult Source # 
Instance details

Defined in Z.Data.Text.UTF8Rewind

Associated Types

type Rep NormalizationResult :: Type -> Type #

type Rep NormalizationResult Source # 
Instance details

Defined in Z.Data.Text.UTF8Rewind

type Rep NormalizationResult = D1 ('MetaData "NormalizationResult" "Z.Data.Text.UTF8Rewind" "Z-Data-0.6.1.0-JeXatJjxwtL6zC0T0lLei4" 'False) (C1 ('MetaCons "NormalizedYes" 'PrefixI 'False) (U1 :: Type -> Type) :+: (C1 ('MetaCons "NormalizedMaybe" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "NormalizedNo" 'PrefixI 'False) (U1 :: Type -> Type)))

data NormalizeMode Source #

These are the Unicode Normalization Forms:

Form                         | Description
---------------------------- | ---------------------------------------------
Normalization Form D (NFD)   | Canonical decomposition
Normalization Form C (NFC)   | Canonical decomposition, followed by canonical composition
Normalization Form KD (NFKD) | Compatibility decomposition
Normalization Form KC (NFKC) | Compatibility decomposition, followed by canonical composition

Constructors

NFC 
NFKC 
NFD 
NFKD 

Instances

Instances details
Eq NormalizeMode Source # 
Instance details

Defined in Z.Data.Text.UTF8Rewind

Ord NormalizeMode Source # 
Instance details

Defined in Z.Data.Text.UTF8Rewind

Show NormalizeMode Source # 
Instance details

Defined in Z.Data.Text.UTF8Rewind

Generic NormalizeMode Source # 
Instance details

Defined in Z.Data.Text.UTF8Rewind

Associated Types

type Rep NormalizeMode :: Type -> Type #

type Rep NormalizeMode Source # 
Instance details

Defined in Z.Data.Text.UTF8Rewind

type Rep NormalizeMode = D1 ('MetaData "NormalizeMode" "Z.Data.Text.UTF8Rewind" "Z-Data-0.6.1.0-JeXatJjxwtL6zC0T0lLei4" 'False) ((C1 ('MetaCons "NFC" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "NFKC" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "NFD" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "NFKD" 'PrefixI 'False) (U1 :: Type -> Type)))

isNormalized :: Text -> NormalizationResult Source #

Check if a string is stable in the NFC (Normalization Form C).

isNormalizedTo :: NormalizeMode -> Text -> NormalizationResult Source #

Check if a string is stable in the specified Unicode Normalization Form.

This function can be used as a preprocessing step, before attempting to normalize a string. Normalization is a very expensive process, it is often cheaper to first determine if the string is unstable in the requested normalization form.

The result of the check will be YES if the string is stable and MAYBE or NO if it is unstable. If the result is MAYBE, the string does not necessarily have to be normalized.

For more information, please review <http://www.unicode.org/reports/tr15/ Unicode Standard Annex #15 - Unicode Normalization Forms>.

normalize :: Text -> Text Source #

Normalize a string to NFC (Normalization Form C).

normalizeTo :: NormalizeMode -> Text -> Text Source #

Normalize a string to the specified Unicode Normalization Form.

The Unicode standard defines two standards for equivalence between characters: canonical and compatibility equivalence. Canonically equivalent characters and sequence represent the same abstract character and must be rendered with the same appearance and behavior. Compatibility equivalent characters have a weaker equivalence and may be rendered differently.

Unicode Normalization Forms are formally defined standards that can be used to test whether any two strings of characters are equivalent to each other. This equivalence may be canonical or compatibility.

The algorithm puts all combining marks into a specified order and uses the rules for decomposition and composition to transform the string into one of four Unicode Normalization Forms. A binary comparison can then be used to determine equivalence.

Case conversion

envLocale :: IO Locale Source #

Get environment locale

caseFold :: Text -> Text Source #

Remove case distinction from UTF-8 encoded text with default locale.

caseFoldWith :: Locale -> Text -> Text Source #

Remove case distinction from UTF-8 encoded text.

Case folding is the process of eliminating differences between code points concerning case mapping. It is most commonly used for comparing strings in a case-insensitive manner. Conversion is fully compliant with the Unicode 7.0 standard.

Although similar to lowercasing text, there are significant differences. For one, case folding does _not_ take locale into account when converting. In some cases, case folding can be up to 20% faster than lowercasing the same text, but the result cannot be treated as correct lowercased text.

Only two locale-specific exception are made when case folding text. In Turkish, U+0049 LATIN CAPITAL LETTER I maps to U+0131 LATIN SMALL LETTER DOTLESS I and U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE maps to U+0069 LATIN SMALL LETTER I.

Although most code points can be case folded without changing length, there are notable exceptions. For example, U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE) maps to "U+0069 U+0307" (LATIN SMALL LETTER I and COMBINING DOT ABOVE) when converted to lowercase.

Only a handful of scripts make a distinction between upper- and lowercase. In addition to modern scripts, such as Latin, Greek, Armenian and Cyrillic, a few historic or archaic scripts have case. The vast majority of scripts do not have case distinctions.

toLower :: Text -> Text Source #

Convert UTF-8 encoded text to lowercase with default locale.

toLowerWith :: Locale -> Text -> Text Source #

Convert UTF-8 encoded text to lowercase.

This function allows conversion of UTF-8 encoded strings to lowercase without first changing the encoding to UTF-32. Conversion is fully compliant with the Unicode 7.0 standard.

Although most code points can be converted to lowercase with changing length, there are notable exceptions. For example, U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE) maps to "U+0069 U+0307" (LATIN SMALL LETTER I and COMBINING DOT ABOVE) when converted to lowercase.

Only a handful of scripts make a distinction between upper- and lowercase. In addition to modern scripts, such as Latin, Greek, Armenian and Cyrillic, a few historic or archaic scripts have case. The vast majority of scripts do not have case distinctions.

Case mapping is not reversible. That is, toUpper(toLower(x)) != toLower(toUpper(x)).

Certain code points (or combinations of code points) apply rules based on the locale. For more information about these exceptional code points, please refer to the Unicode standard: ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt

toUpper :: Text -> Text Source #

Convert UTF-8 encoded text to uppercase with default locale.

toUpperWith :: Locale -> Text -> Text Source #

Convert UTF-8 encoded text to uppercase.

Conversion is fully compliant with the Unicode 7.0 standard.

Although most code points can be converted without changing length, there are notable exceptions. For example, U+00DF (LATIN SMALL LETTER SHARP S) maps to "U+0053 U+0053" (LATIN CAPITAL LETTER S and LATIN CAPITAL LETTER S) when converted to uppercase.

Only a handful of scripts make a distinction between upper and lowercase. In addition to modern scripts, such as Latin, Greek, Armenian and Cyrillic, a few historic or archaic scripts have case. The vast majority of scripts do not have case distinctions.

Case mapping is not reversible. That is, toUpper(toLower(x)) != toLower(toUpper(x)).

Certain code points (or combinations of code points) apply rules based on the locale. For more information about these exceptional code points, please refer to the Unicode standard: ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt

toTitle :: Text -> Text Source #

Convert UTF-8 encoded text to titlecase with default locale.

toTitleWith :: Locale -> Text -> Text Source #

Convert UTF-8 encoded text to titlecase.

This function allows conversion of UTF-8 encoded strings to titlecase. Conversion is fully compliant with the Unicode 7.0 standard.

Titlecase requires a bit more explanation than uppercase and lowercase, because it is not a common text transformation. Titlecase uses uppercase for the first letter of each word and lowercase for the rest. Words are defined as "collections of code points with general category Lu, Ll, Lt, Lm or Lo according to the Unicode database".

Effectively, any type of punctuation can break up a word, even if this is not grammatically valid. This happens because the titlecasing algorithm does not and cannot take grammar rules into account.

Text                                 | Titlecase
-------------------------------------|-------------------------------------
The running man                      | The Running Man
NATO Alliance                        | Nato Alliance
You're amazing at building libraries | You'Re Amazing At Building Libraries

Although most code points can be converted to titlecase without changing length, there are notable exceptions. For example, U+00DF (LATIN SMALL LETTER SHARP S) maps to "U+0053 U+0073" (LATIN CAPITAL LETTER S and LATIN SMALL LETTER S) when converted to titlecase.

Certain code points (or combinations of code points) apply rules based on the locale. For more information about these exceptional code points, please refer to the Unicode standard: ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt

Unicode category

isCategory :: Category -> Text -> Bool Source #

Check if the input string conforms to the category specified by the flags.

This function can be used to check if the code points in a string are part of a category. Valid flags are members of the "list of categories". The category for a code point is defined as part of the entry in UnicodeData.txt, the data file for the Unicode code point database.

By default, the function will treat grapheme clusters as a single code point. This means that the following string:

Code point | Canonical combining class | General category      | Name
---------- | ------------------------- | --------------------- | ----------------------
U+0045     | 0                         | Lu (Uppercase letter) | LATIN CAPITAL LETTER E
U+0300     | 230                       | Mn (Non-spacing mark) | COMBINING GRAVE ACCENT

Will match with CategoryLetterUppercase in its entirety, because the COMBINING GRAVE ACCENT is treated as part of the grapheme cluster. This is useful when e.g. creating a text parser, because you do not have to normalize the text first.

If this is undesired behavior, specify the CategoryIgnoreGraphemeCluster flag.

In order to maintain backwards compatibility with POSIX functions like isdigit and isspace, compatibility flags have been provided. Note, however, that the result is only guaranteed to be correct for code points in the Basic Latin range, between U+0000 and 0+007F. Combining a compatibility flag with a regular category flag will result in undefined behavior.

spanCategory :: Category -> Text -> (Text, Text) Source #

Try to match as many code points with the matching category flags as possible and return the prefix and suffix.

Constants

Locale

type Locale = CSize Source #

Locale for case mapping.

Category

type Category = CSize Source #

Unicode categories.

See isCategory, you can combine categories with bitwise or.