Quickstart on a cluster¶
This document describes briefly how to install and use the PacBio Data Processing software on a cluster. If you need more details, please consult the references.
Starting with a PacBio sequencing file (bam file) and a reference sequence (fasta file) you can generate a dataframe (csv file) with columns containing properties for each molecule that overcame good quality filters.
Additional to this, a summary report is generated containing information related to the input and output files for each process.
Open a cluster access account (see Using PacBio Data Processing on a cluster).
Open a terminal and login to access to the cluster (see Using PacBio Data Processing on a cluster).
Install python 3.9 in the cluster (see the Installation document).
Create a virtual environment (see the Installation document).
Install the external dependences pbindex, blasr and ccs (see the Using PacBio Data Processing on a cluster document).
Install PacBio Data Processing (see the Installation document).
Transfer the input files to the cluster. Assuming you want to process a file called
pbsequencing.bam
and your reference is stored in a file calledreference.fasta
(with its companion indexreference.fasta.fai
), run the following command in a terminal:scp pbsequencing.bam reference.fasta{,.fai} velazquez@goethe.hhlr-gu.de:/home/fuchs/darmstadt/velazquez
the path will change depending on the name on your account, and the wanted destination directory.
Running a Job (see Using PacBio Data Processing on a cluster).
Transfer the output files to your personal computer:
scp velazquez@goethe.hhlr-gu.de:/home/fuchs/darmstadt/velazquez/[file to transfer] .
where the trailing
.
(dot) can be replaced by any other local path, of course. The special case of.
means current working directory.Or you can synchronize the remote location with your current working directory like:
rsync -av velazquez@goethe.hhlr-gu.de:/home/fuchs/darmstadt/velazquez/ ./