zenacy-unicode: Unicode utilities for Haskell

[ library, mit, web ] [ Propose Tags ] [ Report a vulnerability ]

Zenacy Unicode includes tools for checking byte order marks (BOM) and cleaning data to remove invalid bytes. These tools can help ensure that data pulled from the web can be parsed and converted to text.


[Skip to Readme]

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 1.0.0, 1.0.1, 1.0.2
Change log CHANGES.md
Dependencies base (>=4 && <5), bytestring (>=0.10.6.0 && <0.12), vector (>=0.11 && <0.14), word8 (>=0.1.2 && <0.2) [details]
License MIT
Copyright Copyright (C) 2015-2021 Michael P Williams
Author Michael Williams <mlcfp@icloud.com>
Maintainer Michael Williams <mlcfp@icloud.com>
Category Web
Home page https://github.com/mlcfp/zenacy-unicode
Source repo head: git clone https://github.com/mlcfp/zenacy-unicode.git
Uploaded by mlcfp at 2022-11-24T17:38:34Z
Distributions LTSHaskell:1.0.2, NixOS:1.0.2
Downloads 589 total (23 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2022-11-24 [all 1 reports]

Readme for zenacy-unicode-1.0.2

[back to package description]

Zenacy Unicode

hackage-shield stackage-shield linux-shield packdeps-shield

Zenacy Unicode includes tools for checking byte order marks (BOM) and cleaning data to remove invalid bytes. These tools can help ensure that data pulled from the web can be parsed and converted to text.

The following is an example of converting dubious data to a text.

textDecode :: ByteString -> Text
textDecode b =
  case bomStrip b of
    (Nothing, s)           -> T.decodeUtf8 $ unicodeCleanUTF8 s -- Assume UTF8
    (Just BOM_UTF8, s)     -> T.decodeUtf8 $ unicodeCleanUTF8 s
    (Just BOM_UTF16_BE, s) -> T.decodeUtf16BE s
    (Just BOM_UTF16_LE, s) -> T.decodeUtf16LE s
    (Just BOM_UTF32_BE, s) -> T.decodeUtf32BE s
    (Just BOM_UTF32_LE, s) -> T.decodeUtf32LE s