About


master branch build Status

(master)

develop branch build status

(develop)

MAVIS is a pipeline to merge and validate input from different structural variant callers into a single report. The pipeline consists five of main steps

Getting started

There are 3 major steps to setting up and installing MAVIS

  1. Install non-python dependencies

Before MAVIS can be installed, the non-python dependencies will need to be installed. After these have been installed MAVIS itself can be installed through pip

These include: an aligner (blat or bwa mem) and samtools.

  1. Install MAVIS

The easiest way to install MAVIS is through the python package manager, pip

pip install git+https://github.com/bcgsc/mavis.git@vX.X.X#egg=mavis-X.X.X

Where X.X.X is the version number (for example 1.3.0). This will install mavis and its python dependencies.

  1. Build reference files

After MAVIS is installed the reference files must be generated (or downloaded) before it can be run.

Once the above 3 steps are complete MAVIS is ready to be run. See running the pipeline.

Help

If you have a question or issue that is not answered in the FAQs please submit an issue to our github page or contact us by email at mavis@bcgsc.ca

Non-python dependencies

Aligner

In addition to the python package dependencies, MAVIS also requires an aligner to be installed. Currently the only aligners supported are blat and bwa mem. For MAVIS to run successfully the aligner must be installed and accessible on the path. If you have a non-std install you may find it useful to edit the PATH environment variable. For example

export PATH=/path/to/directory/containing/blat/binary:$PATH

Blat is the default aligner. To configure MAVIS to use bwa mem as a default instead, use the MAVIS environment variables. Both the aligner and aligner reference settings should be specified

export MAVIS_ALIGNER='bwa mem'
export MAVIS_ALIGNER_REFERENCE=/path/to/mem/fasta/ref/file

Note

Although MAVIS does attempt to standardize alignments there will still be some difference in the coordinates of the final call set dependent on the aligner used to align putatative contigs. Additionally the aligner used on the input bam will have a more significant impact as it will affect the reads collected in addition to the coordintates of all non-contig calls.

Samtools

Samtools is only used in sorting and indexing the intermediary output bams. Eventually this will hopefully be accomplished through pysam only.

Resource Requirements

MAVIS has been tested on both unix and linux systems. For the standard pipeline, the validation stage is the most computationally expensive. This will vary depending on the size of your input bam file and the number of events input to be validated. There are a number of settings that can be adjusted to reduce memory and cpu requirements depending on what the user is trying to analyze.

Uninformative Filter

For example, if the user is only interested in events in genes, then the uninformative_filter can be used. This will drop all events that are not within a certain distance (max_proximity) to any annotation in the annotations reference file. These events will be dropped prior to the validation stage which results in significant speed up.

This can be set using the environment variable

export MAVIS_UNINFORMATIVE_FILTER=True

or in the pipeline config file

[cluster]
uninformative_filter = True

or as a command line argument to the cluster stage

mavis cluster --uninformative_filter True ....

Splitting Validation into Cluster Jobs

MAVIS chooses the number of jobs to split validate/annotate stages into based on two settings: max_files and min_clusters_per_file.

For example, in the following situation say you have: 1000 clusters, max_files=10, and min_clusters_per_file=10. Then MAVIS will set up 10 validation jobs each with 100 events.

However, if min_clusters_per_file=500, then MAVIS would only set up 2 jobs each with 500 events. This is because min_clusters_per_file takes precedence over max_files.