About¶
(master)
(develop)
MAVIS is a pipeline to merge and validate input from different structural variant callers into a single report. The pipeline consists five of main steps
Getting started¶
There are 3 major steps to setting up and installing MAVIS
- Install non-python dependencies
Before MAVIS can be installed, the non-python dependencies will need to be installed. After these have been installed MAVIS itself can be installed through pip
These include: an aligner (blat or bwa mem) and samtools.
- Install MAVIS
The easiest way to install MAVIS is through the python package manager, pip
pip install git+https://github.com/bcgsc/mavis.git@vX.X.X#egg=mavis-X.X.X
Where X.X.X is the version number (for example 1.3.0). This will install mavis and its python dependencies.
- Build reference files
After MAVIS is installed the reference files must be generated (or downloaded) before it can be run.
Once the above 3 steps are complete MAVIS is ready to be run. See running the pipeline.
Help¶
If you have a question or issue that is not answered in the FAQs please submit an issue to our github page or contact us by email at mavis@bcgsc.ca
Non-python dependencies¶
Aligner¶
In addition to the python package dependencies, MAVIS also requires an aligner to be installed. Currently the only aligners supported are blat and bwa mem. For MAVIS to run successfully the aligner must be installed and accessible on the path. If you have a non-std install you may find it useful to edit the PATH environment variable. For example
export PATH=/path/to/directory/containing/blat/binary:$PATH
Blat is the default aligner. To configure MAVIS to use bwa mem as a default instead, use the MAVIS environment variables. Both the aligner and aligner reference settings should be specified
export MAVIS_ALIGNER='bwa mem'
export MAVIS_ALIGNER_REFERENCE=/path/to/mem/fasta/ref/file
Note
Although MAVIS does attempt to standardize alignments there will still be some difference in the coordinates of the final call set dependent on the aligner used to align putatative contigs. Additionally the aligner used on the input bam will have a more significant impact as it will affect the reads collected in addition to the coordintates of all non-contig calls.
Samtools¶
Samtools is only used in sorting and indexing the intermediary output bams. Eventually this will hopefully be accomplished through pysam only.
Resource Requirements¶
MAVIS has been tested on both unix and linux systems. For the standard pipeline, the validation stage is the most computationally expensive. This will vary depending on the size of your input bam file and the number of events input to be validated. There are a number of settings that can be adjusted to reduce memory and cpu requirements depending on what the user is trying to analyze.
Uninformative Filter¶
For example, if the user is only interested in events in genes, then the uninformative_filter can be used. This will drop all events that are not within a certain distance (max_proximity) to any annotation in the annotations reference file. These events will be dropped prior to the validation stage which results in significant speed up.
This can be set using the environment variable
export MAVIS_UNINFORMATIVE_FILTER=True
or in the pipeline config file
[cluster]
uninformative_filter = True
or as a command line argument to the cluster stage
mavis cluster --uninformative_filter True ....
Splitting Validation into Cluster Jobs¶
MAVIS chooses the number of jobs to split validate/annotate stages into based on two settings: max_files and min_clusters_per_file.
For example, in the following situation say you have: 1000 clusters, max_files=10
, and min_clusters_per_file=10
. Then
MAVIS will set up 10 validation jobs each with 100 events.
However, if min_clusters_per_file=500
, then MAVIS would only set up 2 jobs each with 500 events. This is because
min_clusters_per_file takes precedence over max_files.