Copyright | (c) Dominik Schrempf 2017 |
---|---|
License | GPLv3 |
Maintainer | dominik.schrempf@gmail.com |
Stability | unstable |
Portability | non-portable (not tested) |
Safe Haskell | None |
Language | Haskell2010 |
TODO: Import.
- The Counts Format
The input of PoMo is allele frequency data. Especially, when populations have many individuals it is preferable to count the number of bases at each position. This decreases file size and speeds up the parser.
Counts files contain:
- One headerline that specifies the file as counts file and states the number of populations as well as the number of sites (separated by white space).
- A second headerline with white space separated headers: CRHOM (chromosome), POS (position) and sequence names.
- Many lines with counts of A, C, G and T bases and their respective positions.
Comments:
- Lines starting with # before the first headerline are treated as comments.
A toy example:
COUNTSFILE NPOP 5 NSITES N CHROM POS Sheep BlackSheep RedSheep Wolf RedWolf 1 1 0,0,1,0 0,0,1,0 0,0,1,0 0,0,5,0 0,0,0,1 1 2 0,0,0,1 0,0,0,1 0,0,0,1 0,0,0,5 0,0,0,1 . . . 9 8373 0,0,0,1 1,0,0,0 0,1,0,0 0,1,4,0 0,0,1,0 . . . Y 9999 0,0,0,1 0,1,0,0 0,1,0,0 0,5,0,0 0,0,1,0
Synopsis
- type Chrom = ByteString
- type Pos = Int
- type DataOneSite = [State]
- type PopulationNames = [ByteString]
- toCountsFile :: PopulationNames -> [(Maybe Chrom, Maybe Pos, DataOneSite)] -> ByteString
Documentation
type Chrom = ByteString Source #
The chromosome name.
type DataOneSite = [State] Source #
The set of boundary states for one site.
type PopulationNames = [ByteString] Source #
The names of the populations.
toCountsFile :: PopulationNames -> [(Maybe Chrom, Maybe Pos, DataOneSite)] -> ByteString Source #
Convert data to a counts file.