BanzaiDB v0.2.0 - Database tool for the Banzai NGS pipeline (http://github.com/mscook/BanzaiDB)
Create the CLI parser
Returns: | a parser with subparsers: init, populate, update & query ——– |
---|
Create a new RethinkDB database and initialise (default) tables
Parameters: | args – an argparse argument (force) |
---|
Main function - essentially calls the CLI parser & directs execution
Populate the RethinkDB
This is essentially a placeholder that directs the input data to its specific populate method.
Parameters: | args – an argparse argument (run_type) |
---|
Populate database with a mapping run. Only support for Nesoni at the moment
TODO: This should also handle BWA. Will need to differentiate between Nesoni & BWA runs and handle VCF files.
Parameters: | args – an argparse argument (run_path) which is the full path as a string to the Banzai run (inclusive of $PROJECTBASE). For example: /$PROJECTBASE/map/$REF.2014-04-28-mon-16-41-51 |
---|
Converts a single JSON element to CSV
Note
this will not handle nested JSON. Will need to used something like https://github.com/evidens/json2csv to achieve this
Parameters: |
|
---|
Extracts out the data from a consequences line
NOTE: This was originally the core of Nesoni_report_to_JSON. However, as v_class is singular BUT substitution states are observed in deletion states and other similar we refactored this method out.
Parameters: | |
---|---|
Returns: | a data list (containing a controlled set of results) |
Convert a nesoni nway.any file that has been reportified to JSON
See: tables.rst for info on what is stored in RethinkDB
Parameters: | reportified – the reportified nway.any file (been through |
---|
nway_reportify()). This is essentially a list of tuples
Returns: | a list of JSON |
---|
Convert a nway.any to something similar to report.txt
This converts the nway.any which contains richer information (i.e. N calls) into something similar to report.txt
TODO: Add a simple example of input vs output of this method.
ref_id, position, strains, ref_base, v_class, changes, evidence, consequences
Parameters: | nway_any_file (string) – full path as a string the the file nway.any file |
---|---|
Returns: | a list of tuples. Each list element refers to a variant position while the tuple contains the states of each strain |
From genome reference (GBK format) convert CDS, gene & RNA features to JSON
Note
also see tables.rst for detailed description of the JSON schema
Warning
do not think that this handles misc_features
Parameters: | genome_file – the fullpath as a string to the genbank file |
---|---|
Returns: | a JSON representing the the reference and a list of JSON containing information on the features |
Make a connection to the RethinkDB database
Pulls settings (host, port, database name & auth_key from BanzaiDBConfig())
Returns: | a connection context manager |
---|
Bases: exceptions.Exception
The conversion only takes a single JSON element, not a list of elements
Bases: exceptions.Exception
RethinkDB only likes database names that match “^[a-zA-Z0-9_]+$”
Bases: exceptions.Exception
The conversion of JSON to CSV does not support nested JSON
Generate a list of length number of distinct “good” random colors
Parameters: | number – int |
---|---|
Type: | int |
Return type: | a list of lists in the form: [[243, 137, 121], [232, 121, 243], [216, 121, 243]] |
Functions to parse a nesoni report .txt file
From an evidence string/element return a dictionary or obs/counts
Updated where to handle 0 coverage in an ‘N’ call! In this case we set N = -1
Parameters: | evidence – an evidence string. It looks something like this - Ax27 AGCAx1 AGCAATTAATTAAAATAAx |
---|
Return fields for syn, non-syn or correlated