elynx-seq-0.0.1: Handle molecular sequences

Copyright(c) Dominik Schrempf 2017
LicenseGPLv3
Maintainerdominik.schrempf@gmail.com
Stabilityunstable
Portabilitynon-portable (not tested)
Safe HaskellNone
LanguageHaskell2010

ELynx.Export.Sequence.CountsFile

Description

TODO: Import.

  • The Counts Format

The input of PoMo is allele frequency data. Especially, when populations have many individuals it is preferable to count the number of bases at each position. This decreases file size and speeds up the parser.

Counts files contain:

  • One headerline that specifies the file as counts file and states the number of populations as well as the number of sites (separated by white space).
  • A second headerline with white space separated headers: CRHOM (chromosome), POS (position) and sequence names.
  • Many lines with counts of A, C, G and T bases and their respective positions.

Comments:

  • Lines starting with # before the first headerline are treated as comments.

A toy example:

    COUNTSFILE  NPOP 5   NSITES N
    CHROM  POS  Sheep    BlackSheep  RedSheep  Wolf     RedWolf
    1      1    0,0,1,0  0,0,1,0     0,0,1,0   0,0,5,0  0,0,0,1
    1      2    0,0,0,1  0,0,0,1     0,0,0,1   0,0,0,5  0,0,0,1
    .
    .
    .
    9      8373 0,0,0,1  1,0,0,0     0,1,0,0   0,1,4,0  0,0,1,0
    .
    .
    .
    Y      9999 0,0,0,1  0,1,0,0     0,1,0,0   0,5,0,0  0,0,1,0
Synopsis

Documentation

type Chrom = ByteString Source #

The chromosome name.

type Pos = Int Source #

The position on the chromosome.

type DataOneSite = [State] Source #

The set of boundary states for one site.

type PopulationNames = [ByteString] Source #

The names of the populations.

toCountsFile :: PopulationNames -> [(Maybe Chrom, Maybe Pos, DataOneSite)] -> ByteString Source #

Convert data to a counts file.