Glossary¶
- alignment variant¶
The result of aligning a BAM file using a rotated reference. The word rotated implies that the reference is considered to have a circular topology (unless, of course, the angle of the rotation is
0
). If the rotation angle is0
degrees/radians, i.e. no rotation is applied to the reference, the result of the alignment is called straight in PacBio Data Processing. If a rotation angle of180
degrees (orπ
radians) is applied to the refereence, the resulting alignment is called pi-shifted, or π-shifted.- Bash¶
The default shell of a GNU operating system, as its documentation declares. If the target OS is Linux, Bash is probably the shell, or command line interface, that you are using to enter commands.
- Command Line Interface (CLI)¶
An interface between a system and its user based on the command line, i.e. the system’s behaviour is controled by instructions passed to it as text through the keyboard. See Command Line Interface (CLI).
- Command Line Option¶
A flag that can be used in a Command Line Interface (CLI) to customize the behaviour of the program. In Unix a command line option typically begins by either
-
for short option names, e.g.-h
or by--
for long option names, e.g.--help
. A command line option might accept a value, e.g.-N 3
. That depends on the nature of the option.- CSV file¶
A Comma Separated Values file. As its name suggests, the file is structured in a table-like fashion, but, interestingly, the separator must not be a comma, although the comma is a very common choice. The CSV standard is defined in RFC 4180.
- exit status¶
The exit status of an executed command is the value returned by it (actually, by a
waitpid
system call or equivalent function). From the shell, the$?
variable holds the value returned by the last executed command. Typeecho $?
right after the command you are interested in terminates, to find out its exit status. The exit statuses are integers in the range0-255
. A value of0
means success. Non-zero values indicate failure.- FASTA¶
Text based file format to store sequences of DNA, or in general, nucleotides or amino acids. See the Wikipedia page on FASTA format, and references therein.
- GFF¶
A file format to encode genetic features. See the GFF3 definition.
- Graphical User Interface (GUI)¶
An interface between a system and its user based on graphical icons, where the mouse is typically involved. See Graphical User Interface (GUI).
- MD5 checksum¶
A checksum based on the MD5 algorithm. Used only in PacBio Data Processing as a mechanism to protect the data integrity against unintentional corruption.
- molecule¶
In the context of PacBio Data Processing molecule refers to a fragment of DNA that was captured in a hole, aka ZMW, in the sequencing machine. Each molecule in a BAM file is identified with a positive integer and typically spans several subreads.
- PATH¶
An environment variable that contains the search path for commands. It is a colon-separated list of directories in which the shell looks for commands. Type
man bash
orinfo bash
in your shell for more details.- reference¶
A DNA sequence used as a reference for the single molecule analysis stored as a file in the FASTA format.
- subread¶
A single line in the BAM file. Each subread belongs to one molecule.
- summary report¶
An HTML file created by Single molecule analysis with the sm-analysis program with basic statistics about the input BAM, the input reference and the output produced by the sm-analysis program during its analysis. It includes also some intermediate details of the process and selected plots that provide a visual help for some quantities or additional information about a certain distribution or quantity.
- variant¶
See alignment variant.