Copyright	(c) Dominik Schrempf 2017
License	GPLv3
Maintainer	dominik.schrempf@gmail.com
Stability	unstable
Portability	non-portable (not tested)
Safe Haskell	None
Language	Haskell2010

ELynx.Export.Sequence.CountsFile

Description

TODO: Import.

The Counts Format

The input of PoMo is allele frequency data. Especially, when populations have many individuals it is preferable to count the number of bases at each position. This decreases file size and speeds up the parser.

Counts files contain:

One headerline that specifies the file as counts file and states the number of populations as well as the number of sites (separated by white space).
A second headerline with white space separated headers: CRHOM (chromosome), POS (position) and sequence names.
Many lines with counts of A, C, G and T bases and their respective positions.

Comments:

Lines starting with # before the first headerline are treated as comments.

A toy example:

    COUNTSFILE  NPOP 5   NSITES N
    CHROM  POS  Sheep    BlackSheep  RedSheep  Wolf     RedWolf
    1      1    0,0,1,0  0,0,1,0     0,0,1,0   0,0,5,0  0,0,0,1
    1      2    0,0,0,1  0,0,0,1     0,0,0,1   0,0,0,5  0,0,0,1
    .
    .
    .
    9      8373 0,0,0,1  1,0,0,0     0,1,0,0   0,1,4,0  0,0,1,0
    .
    .
    .
    Y      9999 0,0,0,1  0,1,0,0     0,1,0,0   0,5,0,0  0,0,1,0

Synopsis

type Chrom = ByteString
type Pos = Int
type DataOneSite = [State]
type PopulationNames = [ByteString]
toCountsFile :: PopulationNames -> [(Maybe Chrom, Maybe Pos, DataOneSite)] -> ByteString

Documentation

type Chrom = ByteString Source #

The chromosome name.

type Pos = Int Source #

The position on the chromosome.

type DataOneSite = [State] Source #

The set of boundary states for one site.

type PopulationNames = [ByteString] Source #

The names of the populations.

toCountsFile :: PopulationNames -> [(Maybe Chrom, Maybe Pos, DataOneSite)] -> ByteString Source #

Convert data to a counts file.