.TH DAMAGE-PATTERNS "1" "November 2010" "damage-patterns" "User Commands" .SH NAME damage-patterns \fB\-\-\fR plot or calculate substitution patterns .SH SYNPOSIS .BI "damage-patterns [" option ...] " file " ... .SH DESCRIPTION .I damage-patterns reads BAM or SAM files containing alignments, accumulates statistics about substitution patterns and prints them in table form or creates a colorful plot. Six plots are created: the base composition around the 5' and the 3' end, all possible substitutions near the 5' and the 3' end, and all possible substitutions at CpG motivs near the 5' and the 3'end. The base composition is that of the reference. Substitutions use the position in the read as coordinate (in case indels make a difference). The values are plotted as lines and 95% confidence intervals as shaded regions. Sometimes plots are skipped because of a lack of that particular kind of data. .SH OPTIONS .SS Common Options .IP "\fB\-V, --version" Output the version number and exit. .IP "\fB\-h, --help" Display usage information and exit. .SS General Options .IP "\fB\-o, --output\fR PATH" Sets the prefix of output files to .IR PATH . The default is './', i.e. output goes to the current directory and no prefix is prepended. .IP "\fB--leeHom\fR" To compensate for the fact that the primary author of .I leeHom is a moron, this option causes .I damage-patterns to pretend that every unpaired read is the result of adapter trimming. If you mutilated your data by misguidedly running .IR leeHom , you likely want to set this option. .IP "\fB\-C, --context\fR NUM" Displays the base composition for an additional .I NUM bases of context around either end of the aligned query. Note that context requires that a genome file is given. .IP "\fB\-G, --genome\fR FILE" Declares that the reference genome is available in .I FILE in .I 2bit format. If a genome is given, context becomes available and the input does not need to have a valid .I MD field. .IP "\fB\-x, --xrange\fR NUM" Sets the range of the x axis to .IR [0..NUM] . Plots get the appropriate scale, tables contain the appropriate lines. The default is 80. .IP "\fB\-f, --fractions\fR" Causes numbers in tables to be shown as common fractions (instead of decimal fractions) where applicable. Common fractions allow an estimate of confidence intervals, decimal fractions don't. .SS Graphical Only Options The following options are only applicable if .I damage-patterns was compiled with support for plotting. A version without plotting capability will not accept these and supports only the 'txt' output format. .IP "\fB\-y, --yrange\fR NUM" Sets the range of the y axis of substitution plots to .IR [0..NUM] . Note that .I NUM is a floating point number, not a percentage. The default is to adjust automatically. .IP "\fB\-Y, --yrange1\fR NUM" Sets the range of the y axis for base composition plots to .IR [0.25-NUM..0.25+NUM] , that is, symmetrical around one quarter. The default is to adjust automatically. .IP "\fB\-t, --title\fR TEXT" Sets the title of this data set to .I TEXT. Plots will mention the data set title in their heading. .IP "\fB\-F, --format\fR FORMAT" Sets the output format of plots to .IR FORMAT . Legal values are .IR svg ", " ps ", " pdf ", " png " and " txt . The .I txt format will not actually produce a plot, but a table with the values that would have been plotted. The default is .IR svg . .IP "\fB\-W, --width\fR NUM" Sets the width of plots to .I NUM pixels. .IP "\fB\-H, --height\fR NUM" Sets the height of plots to .I NUM pixels. .SH "FILES" .IP BAM is the most standard format for aligned reads, chiefly from high throughput sequencing platforms. If possible, use Bam as input. See .I http://samtools.github.io for more information. .I damage-patterns will read the reference sequence from the .I 2bit file if given. Else it will reconstruct the reference from the .I MD field. Bam files with strangely miscoded .I MD fields have been spotted in the wild. If you get errors about inconsistent .I MD fields, either fix them by running .I samtools calmd or simply supply a .I 2bit file. .IP SAM is effectively the text form of BAM, and is pretty much equivalent to it, but bloated. Use BAM if you can, but SAM is fine if and only if it saves a needless conversion step. .IP 2bit is a compact format for genomes used by .IR BLAT and other software by UCSC. See .I https://genome.ucsc.edu/goldenpath/help/blatSpec.html for details. (The alternative to .I 2bit used by .I samtools would be indexed .IR FastA . Clearly, .I 2bit is superior in every conceivable way.) .SH "AUTHOR" Written by Udo Stenzel .