OmicsPipe on the Amazon Cloud (AWS EC2) Tutorial

OmicsPipe on AWS uses a custom StarCluster image, created with docker.io ( which installs docker.io, environment-modules, and easybuild on an AWS EC2 cluster. All you have to do is get the docker image, upload your data, launch the Amazon cluster and run a single command to analyze all of your data according to published, best-practice methods.

Step 1: Create an AWS Account

  1. Create an AWS account by following the instructions at Amazon-AWS
  2. Note your AWS ACCESS KEY ID, AWS SECRET ACCESS KEY and AWS USER ID

Step 2: Load the the OmicsPipe on AWS docker image on your machine

Note

If you are on a local Ubuntu installation, skip this step and install the StarCluster client directly.

  1. Download docker.io following the instructions at Get-Docker

Note

If you are using Windows, please update your BIOS to enable virtualization before installing Docker

  1. From inside the Docker environment, run the command:

    docker run -i -t omicspipe/aws_readymade /bin/bash
    

Note

If you want to share a file from your local computer with the docker container, follow the instructions for Docker Folder Sharing, put your desired file within the shared folder and run the command below (this is recommended for saving your /.starcluster/config file from the next step:

docker run -it --volumes-from NameofSharedDataFolder omicspipe/aws_readymade /bin/bash

Step 3: Configure StarCluster

  1. After running the omicspipe/aws_readymade Docker container, run the command below to edit the StarCluster configuration file:

        nano ~/.starcluster/config
    
    Or if you prefer vim::
    
        vim ~/.starcluster/config
    
  2. Enter your “AWS ACCESS KEY ID”, “AWS SECRET ACCESS KEY”, and “AWS USER ID”

  3. Change the AWS REGION NAME and AWS REGION HOST variables if you do not live in the AWS us-west region to the appropriate region AWS Regions.

  4. Select your desired pre-configured cluster by editing the “DEFAULT_TEMPLATE” variable or creating a custom cluster. The default is a test cluster with 5 c3.large nodes.

  5. Save the edited file (Instructions for Nano and for Vim)

  6. Run the command:

    starcluster createkey omicspipe -o ~/.ssh/omicspipe.rsa
    

Note

For more information on editing the StarCluster configuration file, see the StarCluster website.

Step 4: Create AWS Volumes

  1. Create AWS volumes to store the raw data and results of your analyses. From within the Docker environment, run:

    starcluster createvolume --name=data -i ami-52112317 -d -s <volume size in GB> us-west-1b
    
    starcluster createvolume --name=results -i ami-52112317 -d -s <volume size in GB> us-west-1b
    

Note

  • Specify the <volume size in GB> as a number large enough to accomodate all of your raw data and ~4x that size for your results folder
  • Change us-west-1b to your region as described in AWS Regions.
  1. Make a snapshot of the provided reference databases (currently only supports H. sapiens)
  • Go to the AWS-Console
  • Click on the EC2 option
  • Click on Volumes
  • Click on “Create Volume”
  • Set availability zone
  • In Snapshot ID search for “omicspipe_db” and click on the resulting Snapshot ID
  • Click Create
  • From the Volumes tab, note the “VOLUME_ID” of the database snapshot
  1. Edit your StarCluster configuration file to add your volume IDs. Run the command below and edit the VOLUME_ID variables for Data and Results at the top of the file, and then scroll down to [volume database] about 2/3 of the way down and enter the Database snapshot VOLUME_ID

    nano ~/.starcluster/config
    

    # Edit the fields below:

       [volume results]
       VOLUME_ID =
       MOUNT_PATH = /data/results
    
       [volume data]
       VOLUME_ID =
       MOUNT_PATH = /data/data
    
    # Scroll down to edit this
       [volume database]
       VOLUME_ID =
       MOUNT_PATH = /data/database
    

Step 5: Launch the Cluster

  1. From the Docker container, run the command below to start a new cluster with the name “mypipe”:

    starcluster start mypipe
    
  2. Optional but Recommended: To load balance the cluster, type the command below:

    starcluster loadbalance mypipe

  3. SSH into the cluster by running the command below:

    starcluster sshmaster mypipe
    

Step 6: Upload data to the cluster

Now that you are in your cluster, you can use it like any other cluster. Before running omics pipe on your own data (you can skip this step if you are running the test data, you will want to upload your data. There are two options to upload your data:

  1. Upload data from your local machine or cluster using StarCluster put:

    starcluster put mypipe <myfile> /data/raw
    

Note

  • If you are using the Docker container, you must have your files loaded into the Docker container through the Shared Folder option above
  1. Use Webmin to transfer files from your local system to the cluster (recommended for small files only, like parameter files).

    • In the AWS Management Console go to “Security Groups”
    • Select the “StarCluster-0_95_5” group associated with your cluster’s name
    • On the Inbound tab click on “Edit”
    • Click on “Add Rule” and a new “Custom TCP Rule” will apear. On “Port Range” enter “10000” and on “Source” select “My IP”
    • Hit “Save”
    • Selct Instances in the AWS managemnt console and note the “Public IP” of your instance
    • In a Web browser, enter https://the_public_ip:10000. Type in the Login info when prompted: user: root password: sulab
    • This will take a few seconds to load, and once it does, you can navigate your cluster’s file structure with the tabs on the left
    • To upload a file from your local file system, click “upload” and specify the directory /data/data to upload your data.

To build your own docker image using the Dockerfile

  1. Download docker.io following the instructions at Get-Docker

  2. Run the command:

    docker build -t <Repository Name> https://bitbucket.org/sulab/omics_pipe/downloads/Dockerfile_AWS_custombuild
    

This will store the dockercluster image in the Repository Name of your choice.