text-2.1: An efficient packed Unicode text type.
Safe HaskellSafe-Inferred
LanguageHaskell2010

Data.Text.Internal.Validate

Description

Test whether or not a sequence of bytes is a valid UTF-8 byte sequence. In the GHC Haskell ecosystem, there are several representations of byte sequences. The only one that the stable text API concerns itself with is ByteString. Part of bytestring-to-text decoding is isValidUtf8ByteString, a high-performance UTF-8 validation routine written in C++ with fallbacks for various platforms. The C++ code backing this routine is nontrivial, so in the interest of reuse, this module additionally exports functions for working with the GC-managed ByteArray type. These ByteArray functions are not used anywhere else in text. They are for the benefit of library and application authors who do not use ByteString but still need to interoperate with text.

Synopsis

ByteString

isValidUtf8ByteString :: ByteString -> Bool Source #

Is the ByteString a valid UTF-8 byte sequence?

ByteArray

Is the slice of a byte array a valid UTF-8 byte sequence? These functions all accept an offset and a length.

isValidUtf8ByteArray Source #

Arguments

:: ByteArray

Bytes

-> Int

Offset

-> Int

Length

-> Bool 

For pinned byte arrays larger than 128KiB, this switches to the safe FFI so that it does not prevent GC. This threshold (128KiB) was chosen somewhat arbitrarily and may change in the future.

isValidUtf8ByteArrayUnpinned Source #

Arguments

:: ByteArray

Bytes

-> Int

Offset

-> Int

Length

-> Bool 

This uses the unsafe FFI. GC waits for all unsafe FFI calls to complete before starting. Consequently, an unsafe FFI call does not run concurrently with GC and is not interrupted by GC. Since relocation cannot happen concurrently with an unsafe FFI call, it is safe to call this function with an unpinned byte array argument. It is also safe to call this with a pinned ByteArray argument.

isValidUtf8ByteArrayPinned Source #

Arguments

:: ByteArray

Bytes

-> Int

Offset

-> Int

Length

-> Bool 

This uses the safe FFI. GC may run concurrently with safe FFI calls. Consequently, unpinned objects may be relocated while a safe FFI call is executing. The byte array argument must be pinned, and the calling context is responsible for enforcing this. If the byte array is not pinned, this function's behavior is undefined.