Glossary

alignment variant

The result of aligning a BAM file using a rotated reference. The word rotated implies that the reference is considered to have a circular topology (unless, of course, the angle of the rotation is 0). If the rotation angle is 0 degrees/radians, i.e. no rotation is applied to the reference, the result of the alignment is called straight in PacBio Data Processing. If a rotation angle of 180 degrees (or π radians) is applied to the refereence, the resulting alignment is called pi-shifted, or π-shifted.

Command Line Interface (CLI)

An interface between a system and its user based on the command line, i.e. the system’s behaviour is controled by instructions passed to it as text through the keyboard. See Command Line Interface (CLI).

Command Line Option

A flag that can be used in a Command Line Interface (CLI) to customize the behaviour of the program. In Unix a command line option typically begins by either - for short option names, e.g. -h or by -- for long option names, e.g. --help. A command line option might accept a value, e.g. -N 3. That depends on the nature of the option.

CSV file

A Comma Separated Values file. As its name suggests, the file is structured in a table-like fashion, but, interestingly, the separator must not be a comma, although the comma is a very common choice. The CSV standard is defined in RFC 4180.

FASTA

Text based file format to store sequences of DNA, or in general, nucleotides or amino acids. See the Wikipedia page on FASTA format, and references therein.

GFF

A file format to encode genetic features. See the GFF3 definition.

Graphical User Interface (GUI)

An interface between a system and its user based on graphical icons, where the mouse is typically involved. See Graphical User Interface (GUI).

MD5 checksum

A checksum based on the MD5 algorithm. Used only in PacBio Data Processing as a mechanism to protect the data integrity against unintentional corruption.

molecule

In the context of PacBio Data Processing molecule refers to a fragment of DNA that was captured in a hole, aka ZMW, in the sequencing machine. Each molecule in a BAM file is identified with a positive integer and typically spans several subreads.

reference

A DNA sequence used as a reference for the single molecule analysis stored as a file in the FASTA format.

subread

A single line in the BAM file. Each subread belongs to one molecule.

summary report

An HTML file created by Single molecule analysis with the sm-analysis program with basic statistics about the input BAM, the input reference and the output produced by the sm-analysis program during its analysis. It includes also some intermediate details of the process and selected plots that provide a visual help for some quantities or additional information about a certain distribution or quantity.

variant

See alignment variant.