biohazard-0.6.5: bioinformatics support library

Safe HaskellNone



Parsers and Printers for BAM and SAM. We employ an Iteratee interface, and we strive to support everything possible in BAM. So far, the implementation of the nucleotides is somewhat lacking: we do not have support for ambiguity codes, and the "=" symbol is not understood.



data BamRaw Source

Bam record in its native encoding along with virtual address.

bamRaw :: FileOffset -> ByteString -> BamRaw Source

Smart constructor. Makes sure we got a at least a full record.

data BamRec Source

internal representation of a BAM record



data Cigar Source

Cigar line in BAM coding Bam encodes an operation and a length into a single integer, we keep those integers in an array.


!CigOp :* !Int infix 9 

alignedLength :: Vector v Cigar => v Cigar -> Int Source

extracts the aligned length from a cigar line This gives the length of an alignment as measured on the reference, which is different from the length on the query or the length of the alignment.

newtype Nucleotides Source

A nucleotide base in an alignment. Experience says we're dealing with Ns and gaps all the type, so purity be damned, they are included as if they were real bases.

To allow Nucleotidess to be unpacked and incorporated into containers, we choose to represent them the same way as the BAM file format: as a 4 bit wide field. Gaps are encoded as 0 where they make sense, N is 15. The contained Word8 is guaranteed to be 0..15.




unNs :: Word8

data Vector_Nucs_half a Source

A vector that packs two Nucleotides into one byte, just like Bam does.

type Extensions = [(BamKey, Ext)] Source

A collection of extension fields. The key is actually only two Chars, but that proved impractical. (Hmm... we could introduce a Key type that is a 16 bit int, then give it an instance IsString... practical?)

deleteE :: BamKey -> Extensions -> Extensions Source

Deletes all occurences of some extension field.

insertE :: BamKey -> Ext -> Extensions -> Extensions Source

Blindly inserts an extension field. This can create duplicates (and there is no telling how other tools react to that).

updateE :: BamKey -> Ext -> Extensions -> Extensions Source

Deletes all occurences of an extension field, then inserts it with a new value. This is safer than insertE, but also more expensive.

adjustE :: (Ext -> Ext) -> BamKey -> Extensions -> Extensions Source

Adjusts a named extension by applying a function.

progressBam :: MonadIO m => String -> (String -> IO ()) -> Refs -> Enumeratee [BamRaw] [BamRaw] m a Source

A simple progress indicator that prints sequence id and position.