Safe Haskell | Safe-Infered |
---|
This module implements encoding and decoding of CSV data. The implementation is RFC 4180 compliant, with the following extensions:
- Empty lines are ignored.
- Non-escaped fields may contain any characters except double-quotes, commas, carriage returns, and newlines
- Escaped fields may contain any characters (but double-quotes need to be escaped).
- decode :: FromRecord a => ByteString -> Either String (Vector a)
- decodeByName :: FromNamedRecord a => ByteString -> Either String (Header, Vector a)
- encode :: ToRecord a => Vector a -> ByteString
- encodeByName :: ToNamedRecord a => Header -> Vector a -> ByteString
- data DecodeOptions = DecodeOptions {
- decDelimiter :: !Word8
- defaultDecodeOptions :: DecodeOptions
- decodeWith :: FromRecord a => DecodeOptions -> ByteString -> Either String (Vector a)
- decodeByNameWith :: FromNamedRecord a => DecodeOptions -> ByteString -> Either String (Header, Vector a)
- data EncodeOptions = EncodeOptions {
- encDelimiter :: !Word8
- defaultEncodeOptions :: EncodeOptions
- encodeWith :: ToRecord a => EncodeOptions -> Vector a -> ByteString
- encodeByNameWith :: ToNamedRecord a => EncodeOptions -> Header -> Vector a -> ByteString
- type Csv = Vector Record
- type Record = Vector Field
- type Field = ByteString
- type Header = Vector Name
- type Name = ByteString
- type NamedRecord = HashMap ByteString ByteString
- class FromRecord a where
- parseRecord :: Record -> Parser a
- data Parser a
- (.!) :: FromField a => Record -> Int -> Parser a
- class ToRecord a where
- record :: [ByteString] -> Record
- newtype Only a = Only {
- fromOnly :: a
- class FromNamedRecord a where
- parseNamedRecord :: NamedRecord -> Parser a
- (.:) :: FromField a => NamedRecord -> ByteString -> Parser a
- class ToNamedRecord a where
- toNamedRecord :: a -> NamedRecord
- namedRecord :: [(ByteString, ByteString)] -> NamedRecord
- (.=) :: ToField a => ByteString -> a -> (ByteString, ByteString)
- class FromField a where
- parseField :: Field -> Parser a
- class ToField a where
Usage example
A short encoding usage example:
>>> encode
$ fromList [("John" :: Text, 27), ("Jane", 28)]
Chunk "John,27\r\nJane,28\r\n" Empty
Since string literals are overloaded we have to supply a type
signature as the compiler couldn't deduce which string type (i.e.
String
or Text
) we want to use. In most cases type inference
will infer the type from the context and you can omit type
signatures.
A short decoding usage example:
>>> decode
"John,27\r\nJane,28\r\n" :: Either String (Vector (Text, Int))
Right (fromList [("John",27),("Jane",28)])
Encoding and decoding
Encoding and decoding is a two step process. To encode a value, it
is first converted to a generic representation, using either
ToRecord
or ToNamedRecord
. The generic representation is then
encoded as CSV data. To decode a value the process is reversed and
either FromRecord
or FromNamedRecord
is used instead. Both
these steps are combined in the encode
and decode
functions.
decode :: FromRecord a => ByteString -> Either String (Vector a)Source
Efficiently deserialize CSV records from a lazy ByteString
.
If this fails due to incomplete or invalid input,
is
returned. Equivalent to Left
msg
.
decodeWith
defaultDecodeOptions
decodeByName :: FromNamedRecord a => ByteString -> Either String (Header, Vector a)Source
Efficiently deserialize CSV records from a lazy ByteString
.
If this fails due to incomplete or invalid input,
is
returned. The data is assumed to be preceeded by a header.
Equivalent to Left
msg
.
decodeByNameWith
defaultDecodeOptions
encode :: ToRecord a => Vector a -> ByteStringSource
Efficiently serialize CVS records as a lazy ByteString
.
encodeByName :: ToNamedRecord a => Header -> Vector a -> ByteStringSource
Efficiently serialize CVS records as a lazy ByteString
. The
header is written before any records and dictates the field order.
Encoding and decoding options
These functions can be used to control how data is encoded and decoded. For example, they can be used to encode data in a tab-separated format instead of in a comma-separated format.
data DecodeOptions Source
Options that controls how data is decoded. These options can be used to e.g. decode tab-separated data instead of comma-separated data.
DecodeOptions | |
|
defaultDecodeOptions :: DecodeOptionsSource
Decoding options for parsing CSV files.
decodeWith :: FromRecord a => DecodeOptions -> ByteString -> Either String (Vector a)Source
Like decode
, but lets you customize how the CSV data is parsed.
decodeByNameWith :: FromNamedRecord a => DecodeOptions -> ByteString -> Either String (Header, Vector a)Source
Like decodeByName
, but lets you customize how the CSV data is
parsed.
data EncodeOptions Source
Options that controls how data is encoded. These options can be used to e.g. encode data in a tab-separated format instead of in a comma-separated format.
EncodeOptions | |
|
defaultEncodeOptions :: EncodeOptionsSource
Encoding options for CSV files.
encodeWith :: ToRecord a => EncodeOptions -> Vector a -> ByteStringSource
Like encode
, but lets you customize how the CSV data is
encoded.
encodeByNameWith :: ToNamedRecord a => EncodeOptions -> Header -> Vector a -> ByteStringSource
Like encodeByName
, but lets you customize how the CSV data is
encoded.
Core CSV types
type Field = ByteStringSource
A single field within a record.
type Header = Vector NameSource
The header corresponds to the first line a CSV file. Not all CSV files have a header.
type Name = ByteStringSource
A header has one or more names, describing the data in the column following the name.
type NamedRecord = HashMap ByteString ByteStringSource
A record corresponds to a single line in a CSV file, indexed by the column name rather than the column index.
Type conversion
There are two ways to convert CSV records to and from and user-defined data types: index-based conversion and name-based conversion.
Index-based record conversion
Index-based conversion lets you convert CSV records to and from user-defined data types by referring to a field's position (its index) in the record. The first column in a CSV file is given index 0, the second index 1, and so on.
class FromRecord a whereSource
A type that can be converted from a single CSV record, with the possibility of failure.
When writing an instance, use empty
, mzero
, or fail
to make a
conversion fail, e.g. if a Record
has the wrong number of
columns.
Given this example data:
John,56 Jane,55
here's an example type and instance:
data Person = Person { name :: Text, age :: Int } instance FromRecord Person where parseRecord v |length
v == 2 = Person<$>
v.!
0<*>
v.!
1 | otherwise = mzero
parseRecord :: Record -> Parser aSource
FromField a => FromRecord [a] | |
FromField a => FromRecord (Vector a) | |
FromField a => FromRecord (Only a) | |
(FromField a, FromField b) => FromRecord (a, b) | |
(FromField a, FromField b, FromField c) => FromRecord (a, b, c) | |
(FromField a, FromField b, FromField c, FromField d) => FromRecord (a, b, c, d) | |
(FromField a, FromField b, FromField c, FromField d, FromField e) => FromRecord (a, b, c, d, e) | |
(FromField a, FromField b, FromField c, FromField d, FromField e, FromField f) => FromRecord (a, b, c, d, e, f) | |
(FromField a, FromField b, FromField c, FromField d, FromField e, FromField f, FromField g) => FromRecord (a, b, c, d, e, f, g) |
Conversion of a field to a value might fail e.g. if the field is
malformed. This possibility is captured by the Parser
type, which
lets you compose several field conversions together in such a way
that if any of them fail, the whole record conversion fails.
(.!) :: FromField a => Record -> Int -> Parser aSource
Retrieve the nth field in the given record. The result is
empty
if the value cannot be converted to the desired type.
Raises an exception if the index is out of bounds.
A type that can be converted to a single CSV record.
An example type and instance:
data Person = Person { name :: Text, age :: Int } instance ToRecord Person where toRecord (Person name age) =record
[toField
name,toField
age]
Outputs data on this form:
John,56 Jane,55
ToField a => ToRecord [a] | |
ToField a => ToRecord (Vector a) | |
ToField a => ToRecord (Only a) | |
(ToField a, ToField b) => ToRecord (a, b) | |
(ToField a, ToField b, ToField c) => ToRecord (a, b, c) | |
(ToField a, ToField b, ToField c, ToField d) => ToRecord (a, b, c, d) | |
(ToField a, ToField b, ToField c, ToField d, ToField e) => ToRecord (a, b, c, d, e) | |
(ToField a, ToField b, ToField c, ToField d, ToField e, ToField f) => ToRecord (a, b, c, d, e, f) | |
(ToField a, ToField b, ToField c, ToField d, ToField e, ToField f, ToField g) => ToRecord (a, b, c, d, e, f, g) |
record :: [ByteString] -> RecordSource
Construct a record from a list of ByteString
s. Use toField
to convert values to ByteString
s for use with record
.
Haskell lacks a single-element tuple type, so if you CSV data
with just one column you can use the Only
type to represent a
single-column result.
Name-based record conversion
Name-based conversion lets you convert CSV records to and from user-defined data types by referring to a field's name. The names of the fields are defined by the first line in the file, also known as the header. Name-based conversion is more robust to changes in the file structure e.g. to reording or addition of columns, but can be a bit slower.
class FromNamedRecord a whereSource
A type that can be converted from a single CSV record, with the possibility of failure.
When writing an instance, use empty
, mzero
, or fail
to make a
conversion fail, e.g. if a Record
has the wrong number of
columns.
Given this example data:
name,age John,56 Jane,55
here's an example type and instance:
{-# LANGUAGE OverloadedStrings #-} data Person = Person { name :: Text, age :: Int } instance FromRecord Person where parseNamedRecord m = Person<$>
m.:
"name"<*>
m.:
"age"
Note the use of the OverloadedStrings
language extension which
enables ByteString
values to be written as string literals.
parseNamedRecord :: NamedRecord -> Parser aSource
FromField a => FromNamedRecord (Map ByteString a) | |
FromField a => FromNamedRecord (HashMap ByteString a) |
(.:) :: FromField a => NamedRecord -> ByteString -> Parser aSource
Retrieve a field in the given record by name. The result is
empty
if the field is missing or if the value cannot be converted
to the desired type.
class ToNamedRecord a whereSource
A type that can be converted to a single CSV record.
An example type and instance:
data Person = Person { name :: Text, age :: Int } instance ToRecord Person where toNamedRecord (Person name age) =namedRecord
[ "name".=
name, "age".=
age]
toNamedRecord :: a -> NamedRecordSource
ToField a => ToNamedRecord (Map ByteString a) | |
ToField a => ToNamedRecord (HashMap ByteString a) |
namedRecord :: [(ByteString, ByteString)] -> NamedRecordSource
Construct a named record from a list of name-value ByteString
pairs. Use .=
to construct such a pair from a name and a value.
(.=) :: ToField a => ByteString -> a -> (ByteString, ByteString)Source
Construct a pair from a name and a value. For use with
namedRecord
.
Field conversion
A type that can be converted from a single CSV field, with the possibility of failure.
When writing an instance, use empty
, mzero
, or fail
to make a
conversion fail, e.g. if a Field
can't be converted to the given
type.
Example type and instance:
{-# LANGUAGE OverloadedStrings #-} data Color = Red | Green | Blue instance FromField Color where parseField s | s == "R" = pure Red | s == "G" = pure Green | s == "B" = pure Blue | otherwise = mzero
parseField :: Field -> Parser aSource
FromField Char | |
FromField Double | |
FromField Float | |
FromField Int | |
FromField Int8 | |
FromField Int16 | |
FromField Int32 | |
FromField Int64 | |
FromField Integer | |
FromField Word | |
FromField Word8 | |
FromField Word16 | |
FromField Word32 | |
FromField Word64 | |
FromField ByteString | |
FromField ByteString | |
FromField Text | Assumes UTF-8 encoding. |
FromField Text | Assumes UTF-8 encoding. |
FromField [Char] | Assumes UTF-8 encoding. |
A type that can be converted to a single CSV field.
Example type and instance:
{-# LANGUAGE OverloadedStrings #-} data Color = Red | Green | Blue instance ToField Color where toField Red = "R" toField Green = "G" toField Blue = "B"
ToField Char | |
ToField Double | |
ToField Float | |
ToField Int | |
ToField Int8 | |
ToField Int16 | |
ToField Int32 | |
ToField Int64 | |
ToField Integer | |
ToField Word | |
ToField Word8 | |
ToField Word16 | |
ToField Word32 | |
ToField Word64 | |
ToField ByteString | |
ToField ByteString | |
ToField Text | Uses UTF-8 encoding. |
ToField Text | Uses UTF-8 encoding. |
ToField [Char] | Uses UTF-8 encoding. |