cassava-0.1.0.0: A CSV parsing and encoding library

Safe HaskellSafe-Infered

Data.Csv

Contents

Description

This module implements encoding and decoding of CSV data. The implementation is RFC 4180 compliant, with the following extensions:

  • Empty lines are ignored.
  • Non-escaped fields may contain any characters except double-quotes, commas, carriage returns, and newlines
  • Escaped fields may contain any characters (but double-quotes need to be escaped).

Synopsis

Usage example

A short encoding usage example:

 >>> encode $ fromList [("John" :: Text, 27), ("Jane", 28)]
 Chunk "John,27\r\nJane,28\r\n" Empty

Since string literals are overloaded we have to supply a type signature as the compiler couldn't deduce which string type (i.e. String or Text) we want to use. In most cases type inference will infer the type from the context and you can omit type signatures.

A short decoding usage example:

 >>> decode "John,27\r\nJane,28\r\n" :: Either String (Vector (Text, Int))
 Right (fromList [("John",27),("Jane",28)])

Encoding and decoding

Encoding and decoding is a two step process. To encode a value, it is first converted to a generic representation, using either ToRecord or ToNamedRecord. The generic representation is then encoded as CSV data. To decode a value the process is reversed and either FromRecord or FromNamedRecord is used instead. Both these steps are combined in the encode and decode functions.

decode :: FromRecord a => ByteString -> Either String (Vector a)Source

Efficiently deserialize CSV records from a lazy ByteString. If this fails due to incomplete or invalid input, Left msg is returned. Equivalent to decodeWith defaultDecodeOptions.

decodeByName :: FromNamedRecord a => ByteString -> Either String (Header, Vector a)Source

Efficiently deserialize CSV records from a lazy ByteString. If this fails due to incomplete or invalid input, Left msg is returned. The data is assumed to be preceeded by a header. Equivalent to decodeByNameWith defaultDecodeOptions.

encode :: ToRecord a => Vector a -> ByteStringSource

Efficiently serialize CVS records as a lazy ByteString.

encodeByName :: ToNamedRecord a => Header -> Vector a -> ByteStringSource

Efficiently serialize CVS records as a lazy ByteString. The header is written before any records and dictates the field order.

Encoding and decoding options

These functions can be used to control how data is encoded and decoded. For example, they can be used to encode data in a tab-separated format instead of in a comma-separated format.

data DecodeOptions Source

Options that controls how data is decoded. These options can be used to e.g. decode tab-separated data instead of comma-separated data.

Constructors

DecodeOptions 

Fields

decDelimiter :: !Word8

Field delimiter.

defaultDecodeOptions :: DecodeOptionsSource

Decoding options for parsing CSV files.

decodeWith :: FromRecord a => DecodeOptions -> ByteString -> Either String (Vector a)Source

Like decode, but lets you customize how the CSV data is parsed.

decodeByNameWith :: FromNamedRecord a => DecodeOptions -> ByteString -> Either String (Header, Vector a)Source

Like decodeByName, but lets you customize how the CSV data is parsed.

data EncodeOptions Source

Options that controls how data is encoded. These options can be used to e.g. encode data in a tab-separated format instead of in a comma-separated format.

Constructors

EncodeOptions 

Fields

encDelimiter :: !Word8

Field delimiter.

defaultEncodeOptions :: EncodeOptionsSource

Encoding options for CSV files.

encodeWith :: ToRecord a => EncodeOptions -> Vector a -> ByteStringSource

Like encode, but lets you customize how the CSV data is encoded.

encodeByNameWith :: ToNamedRecord a => EncodeOptions -> Header -> Vector a -> ByteStringSource

Like encodeByName, but lets you customize how the CSV data is encoded.

Core CSV types

type Csv = Vector RecordSource

CSV data represented as a Haskell vector of vector of bytestrings.

type Record = Vector FieldSource

A record corresponds to a single line in a CSV file.

type Field = ByteStringSource

A single field within a record.

type Header = Vector NameSource

The header corresponds to the first line a CSV file. Not all CSV files have a header.

type Name = ByteStringSource

A header has one or more names, describing the data in the column following the name.

type NamedRecord = HashMap ByteString ByteStringSource

A record corresponds to a single line in a CSV file, indexed by the column name rather than the column index.

Type conversion

There are two ways to convert CSV records to and from and user-defined data types: index-based conversion and name-based conversion.

Index-based record conversion

Index-based conversion lets you convert CSV records to and from user-defined data types by referring to a field's position (its index) in the record. The first column in a CSV file is given index 0, the second index 1, and so on.

class FromRecord a whereSource

A type that can be converted from a single CSV record, with the possibility of failure.

When writing an instance, use empty, mzero, or fail to make a conversion fail, e.g. if a Record has the wrong number of columns.

Given this example data:

 John,56
 Jane,55

here's an example type and instance:

data Person = Person { name :: Text, age :: Int }

instance FromRecord Person where
     parseRecord v
         | length v == 2 = Person <$>
                           v .! 0 <*>
                           v .! 1
         | otherwise     = mzero

Instances

FromField a => FromRecord [a] 
FromField a => FromRecord (Vector a) 
FromField a => FromRecord (Only a) 
(FromField a, FromField b) => FromRecord (a, b) 
(FromField a, FromField b, FromField c) => FromRecord (a, b, c) 
(FromField a, FromField b, FromField c, FromField d) => FromRecord (a, b, c, d) 
(FromField a, FromField b, FromField c, FromField d, FromField e) => FromRecord (a, b, c, d, e) 
(FromField a, FromField b, FromField c, FromField d, FromField e, FromField f) => FromRecord (a, b, c, d, e, f) 
(FromField a, FromField b, FromField c, FromField d, FromField e, FromField f, FromField g) => FromRecord (a, b, c, d, e, f, g) 

data Parser a Source

Conversion of a field to a value might fail e.g. if the field is malformed. This possibility is captured by the Parser type, which lets you compose several field conversions together in such a way that if any of them fail, the whole record conversion fails.

(.!) :: FromField a => Record -> Int -> Parser aSource

Retrieve the nth field in the given record. The result is empty if the value cannot be converted to the desired type. Raises an exception if the index is out of bounds.

class ToRecord a whereSource

A type that can be converted to a single CSV record.

An example type and instance:

data Person = Person { name :: Text, age :: Int }

instance ToRecord Person where
     toRecord (Person name age) = record [
        toField name, toField age]

Outputs data on this form:

 John,56
 Jane,55

Methods

toRecord :: a -> RecordSource

Instances

ToField a => ToRecord [a] 
ToField a => ToRecord (Vector a) 
ToField a => ToRecord (Only a) 
(ToField a, ToField b) => ToRecord (a, b) 
(ToField a, ToField b, ToField c) => ToRecord (a, b, c) 
(ToField a, ToField b, ToField c, ToField d) => ToRecord (a, b, c, d) 
(ToField a, ToField b, ToField c, ToField d, ToField e) => ToRecord (a, b, c, d, e) 
(ToField a, ToField b, ToField c, ToField d, ToField e, ToField f) => ToRecord (a, b, c, d, e, f) 
(ToField a, ToField b, ToField c, ToField d, ToField e, ToField f, ToField g) => ToRecord (a, b, c, d, e, f, g) 

record :: [ByteString] -> RecordSource

Construct a record from a list of ByteStrings. Use toField to convert values to ByteStrings for use with record.

newtype Only a Source

Haskell lacks a single-element tuple type, so if you CSV data with just one column you can use the Only type to represent a single-column result.

Constructors

Only 

Fields

fromOnly :: a
 

Instances

Eq a => Eq (Only a) 
Ord a => Ord (Only a) 
Read a => Read (Only a) 
Show a => Show (Only a) 
ToField a => ToRecord (Only a) 
FromField a => FromRecord (Only a) 

Name-based record conversion

Name-based conversion lets you convert CSV records to and from user-defined data types by referring to a field's name. The names of the fields are defined by the first line in the file, also known as the header. Name-based conversion is more robust to changes in the file structure e.g. to reording or addition of columns, but can be a bit slower.

class FromNamedRecord a whereSource

A type that can be converted from a single CSV record, with the possibility of failure.

When writing an instance, use empty, mzero, or fail to make a conversion fail, e.g. if a Record has the wrong number of columns.

Given this example data:

 name,age
 John,56
 Jane,55

here's an example type and instance:

{-# LANGUAGE OverloadedStrings #-}

data Person = Person { name :: Text, age :: Int }

instance FromRecord Person where
     parseNamedRecord m = Person <$>
                          m .: "name" <*>
                          m .: "age"

Note the use of the OverloadedStrings language extension which enables ByteString values to be written as string literals.

(.:) :: FromField a => NamedRecord -> ByteString -> Parser aSource

Retrieve a field in the given record by name. The result is empty if the field is missing or if the value cannot be converted to the desired type.

class ToNamedRecord a whereSource

A type that can be converted to a single CSV record.

An example type and instance:

data Person = Person { name :: Text, age :: Int }

instance ToRecord Person where
     toNamedRecord (Person name age) = namedRecord [
        "name" .= name, "age" .= age]

namedRecord :: [(ByteString, ByteString)] -> NamedRecordSource

Construct a named record from a list of name-value ByteString pairs. Use .= to construct such a pair from a name and a value.

(.=) :: ToField a => ByteString -> a -> (ByteString, ByteString)Source

Construct a pair from a name and a value. For use with namedRecord.

Field conversion

class FromField a whereSource

A type that can be converted from a single CSV field, with the possibility of failure.

When writing an instance, use empty, mzero, or fail to make a conversion fail, e.g. if a Field can't be converted to the given type.

Example type and instance:

{-# LANGUAGE OverloadedStrings #-}

data Color = Red | Green | Blue

instance FromField Color where
     parseField s
         | s == "R"  = pure Red
         | s == "G"  = pure Green
         | s == "B"  = pure Blue
         | otherwise = mzero

class ToField a whereSource

A type that can be converted to a single CSV field.

Example type and instance:

{-# LANGUAGE OverloadedStrings #-}

data Color = Red | Green | Blue

instance ToField Color where
     toField Red   = "R"
     toField Green = "G"
     toField Blue  = "B"

Methods

toField :: a -> FieldSource