sv-0.1: Encode and decode separated values (CSV, PSV, ...)

Copyright(C) CSIRO 2017-2018
LicenseBSD3
MaintainerGeorge Wilson <george.wilson@data61.csiro.au>
Stabilityexperimental
Portabilitynon-portable
Safe HaskellNone
LanguageHaskell2010

Data.Sv.Parse

Description

 

Synopsis

Documentation

parseSv :: ParseOptions ByteString -> ByteString -> Either ByteString (Sv ByteString) Source #

Parse a ByteString as an Sv.

This version uses Trifecta, hence it assumes its input is UTF-8 encoded.

parseSv' :: SvParser s -> ParseOptions s -> s -> Either s (Sv s) Source #

Parse some text as an Sv.

This version lets you choose which parsing library to use by providing an SvParser. Common selections are trifecta and attoparsecByteString.

parseSvFromFile :: MonadIO m => ParseOptions ByteString -> FilePath -> m (Either ByteString (Sv ByteString)) Source #

Load a file and parse it as an Sv.

This version uses Trifecta, hence it assumes its input is UTF-8 encoded.

parseSvFromFile' :: MonadIO m => SvParser s -> ParseOptions s -> FilePath -> m (Either s (Sv s)) Source #

Load a file and parse it as an Sv.

This version lets you choose which parsing library to use by providing an SvParser. Common selections are trifecta and attoparsecByteString.

separatedValues :: CharParsing m => ParseOptions s -> m (Sv s) Source #

Parse an Sv

data SvParser s Source #

Which parsing library should be used to parse the document?

The parser is written in terms of the parsers library, meaning it can be instantiated to several different parsing libraries. By default, we use trifecta, because Text.Trifectas error messages are so helpful. attoparsecByteString is faster though, if your input is ASCII and you care a lot about speed.

It is worth noting that Trifecta assumes UTF-8 encoding of the input data. UTF-8 is backwards-compatible with 7-bit ASCII, so this will work for many documents. However, not all documents are ASCII or UTF-8. For example, our species.csv test file is Windows-1252, which is a non-ISO extension of latin1 8-bit ASCII. For documents encoded as Windows-1252, Trifecta's assumption is invalid and parse errors result. Attoparsec works fine for this character encoding, but it wouldn't work well on a UTF-8 encoded document including non-ASCII characters.

Constructors

SvParser 

trifecta :: SvParser ByteString Source #

An SvParser that uses Text.Trifecta. Trifecta assumes its input is UTF-8, and provides helpful clang-style error messages.

attoparsecByteString :: SvParser ByteString Source #

An SvParser that uses Data.Attoparsec.ByteString. This is the fastest provided SvParser, but it has poorer error messages.

attoparsecText :: SvParser Text Source #

An SvParser that uses Data.Attoparsec.Text. This is helpful if your input is in the form of Text.