Module xi_covutils.blast_api
Makes Blast API calls
From Blast Api documentation
variable | value |
---|---|
Parameter | QUERY |
Definition | Search query |
Type | String |
CMD | Put * |
Allowed values | "Accession, GI, or FASTA." |
Parameter | DATABASE |
Definition | BLAST database |
Type | String |
CMD | Put * |
Allowed values | Database from appendix 2 or one uploaded to blastdb_custom |
(see appendix 4) | |
Parameter | PROGRAM |
Definition | BLAST program |
Type | String |
CMD | Put * |
Allowed values | One of blastn, blastp, blastx, tblastn, tblastx. To |
enable megablast, use PROGRAM=blastn&MEGABLAST=on. | |
Parameter | FILTER |
Definition | Low complexity filtering |
Type | String |
CMD | Put |
Allowed values | F to disable. T or L to enable. Prepend “m” for mask at |
lookup (e.g., mL) | |
Parameter | FORMAT_TYPE |
Definition | Report type |
Type | String |
CMD | "Put, Get" |
Allowed values | HTML, Text, XML, XML2, JSON2, or Tabular. HTML is the |
default. | |
Parameter | EXPECT |
Definition | Expect value |
Type | Double |
CMD | Put |
Allowed values | Number greater than zero. |
Parameter | NUCL_REWARD |
Definition | Reward for matching bases (BLASTN and megaBLAST) |
Type | Integer |
CMD | Put |
Allowed values | Integer greater than zero. |
Parameter | NUCL_PENALTY |
Definition | Cost for mismatched bases (BLASTN and megaBLAST) |
Type | Integer |
CMD | Put |
Allowed values | Integer less than zero. |
Parameter | GAPCOSTS |
Definition | Gap existence and extension costs |
Type | String |
CMD | Put |
Allowed values | Pair of positive integers separated by a space such as |
“11 | |
Parameter | MATRIX |
Definition | Scoring matrix name |
Type | String |
CMD | Put |
Allowed values | One of BLOSUM45, BLOSUM50, BLOSUM62, BLOSUM80,BLOSUM90, |
PAM250, PAM30 or PAM70. Default: BLOSUM62 for all | |
applicable programs | |
Parameter | HITLIST_SIZE |
Definition | Number of databases sequences to keep |
Type | Integer |
CMD | "Put,Get" |
Allowed values | Integer greater than zero. |
Parameter | DESCRIPTIONS |
Definition | Number of descriptions to print (applies to HTML and Text) |
Type | Integer |
CMD | "Put,Get" |
Allowed values | Integer greater than zero. |
Parameter | ALIGNMENTS |
Definition | Number of alignments to print (applies to HTML and Text) |
Type | Integer |
CMD | "Put,Get" |
Allowed values | Integer greater than zero. |
Parameter | NCBI_GI |
Definition | Show NCBI GIs in report |
Type | String |
CMD | "Put, Get" |
Allowed values | T or F |
Parameter | RID |
Definition | BLAST search request identifier |
Type | String |
CMD | "Get , Delete " |
Allowed values | The Request ID (RID) returned when the search was |
submitted | |
Parameter | THRESHOLD |
Definition | Neighboring score for initial words |
Type | Integer |
CMD | Put |
Allowed values | Positive integer (BLASTP default is 11). Does not apply to |
BLASTN or MegaBLAST. | |
Parameter | WORD_SIZE |
Definition | Size of word for initial matches |
Type | Integer |
CMD | Put |
Allowed values | Positive integer. |
Parameter | COMPOSITION_BASED_STATISTICS |
Definition | Composition based statistics algorithm to use |
Type | Integer |
CMD | Put |
Allowed values | One of 0, 1, 2, or 3. See comp_based_stats command line |
option in the BLAST+ user manual for details. | |
Parameter | FORMAT_OBJECT |
Definition | Object type |
Type | String |
CMD | Get |
Allowed values | SearchInfo (status check) or Alignment (report formatting) |
Parameter | NUM_THREADS |
Definition | Number of virtual CPUs to use |
Type | Integer |
CMD | Put |
Allowed values | Integer greater than zero and less than the maximum number |
of cores on the instance (default is the maximum number of | |
cores on the instance). Supported only on the cloud. |
Expand source code
"""
Makes Blast API calls
From Blast Api documentation
| variable | value |
| -------- | ----------------- |
| Parameter | QUERY |
| Definition | Search query |
| Type | String |
| CMD | Put * |
| Allowed values | "Accession, GI, or FASTA." |
| Parameter | DATABASE |
| Definition | BLAST database |
| Type | String |
| CMD | Put * |
| Allowed values | Database from appendix 2 or one uploaded to blastdb_custom |
| | (see appendix 4) |
| Parameter | PROGRAM |
| Definition | BLAST program |
| Type | String |
| CMD | Put * |
| Allowed values | One of blastn, blastp, blastx, tblastn, tblastx. To |
| | enable megablast, use PROGRAM=blastn&MEGABLAST=on. |
| Parameter | FILTER |
| Definition | Low complexity filtering |
| Type | String |
| CMD | Put |
| Allowed values | F to disable. T or L to enable. Prepend “m” for mask at |
| | lookup (e.g., mL) |
| Parameter | FORMAT_TYPE |
| Definition | Report type |
| Type | String |
| CMD | "Put, Get" |
| Allowed values | HTML, Text, XML, XML2, JSON2, or Tabular. HTML is the |
| | default. |
| Parameter | EXPECT |
| Definition | Expect value |
| Type | Double |
| CMD | Put |
| Allowed values | Number greater than zero. |
| Parameter | NUCL_REWARD |
| Definition | Reward for matching bases (BLASTN and megaBLAST) |
| Type | Integer |
| CMD | Put |
| Allowed values | Integer greater than zero. |
| Parameter | NUCL_PENALTY |
| Definition | Cost for mismatched bases (BLASTN and megaBLAST) |
| Type | Integer |
| CMD | Put |
| Allowed values | Integer less than zero. |
| Parameter | GAPCOSTS |
| Definition | Gap existence and extension costs |
| Type | String |
| CMD | Put |
| Allowed values | Pair of positive integers separated by a space such as |
| | “11 | 1”. |
| Parameter | MATRIX |
| Definition | Scoring matrix name |
| Type | String |
| CMD | Put |
| Allowed values | One of BLOSUM45, BLOSUM50, BLOSUM62, BLOSUM80,BLOSUM90, |
| | PAM250, PAM30 or PAM70. Default: BLOSUM62 for all |
| | applicable programs |
| Parameter | HITLIST_SIZE |
| Definition | Number of databases sequences to keep |
| Type | Integer |
| CMD | "Put,Get" |
| Allowed values | Integer greater than zero. |
| Parameter | DESCRIPTIONS |
| Definition | Number of descriptions to print (applies to HTML and Text) |
| Type | Integer |
| CMD | "Put,Get" |
| Allowed values | Integer greater than zero. |
| Parameter | ALIGNMENTS |
| Definition | Number of alignments to print (applies to HTML and Text) |
| Type | Integer |
| CMD | "Put,Get" |
| Allowed values | Integer greater than zero. |
| Parameter | NCBI_GI |
| Definition | Show NCBI GIs in report |
| Type | String |
| CMD | "Put, Get" |
| Allowed values | T or F |
| Parameter | RID |
| Definition | BLAST search request identifier |
| Type | String |
| CMD | "Get *, Delete *" |
| Allowed values | The Request ID (RID) returned when the search was |
| | submitted |
| Parameter | THRESHOLD |
| Definition | Neighboring score for initial words |
| Type | Integer |
| CMD | Put |
| Allowed values | Positive integer (BLASTP default is 11). Does not apply to |
| | BLASTN or MegaBLAST. |
| Parameter | WORD_SIZE |
| Definition | Size of word for initial matches |
| Type | Integer |
| CMD | Put |
| Allowed values | Positive integer. |
| Parameter | COMPOSITION_BASED_STATISTICS |
| Definition | Composition based statistics algorithm to use |
| Type | Integer |
| CMD | Put |
| Allowed values | One of 0, 1, 2, or 3. See comp_based_stats command line |
| | option in the BLAST+ user manual for details. |
| Parameter | FORMAT_OBJECT |
| Definition | Object type |
| Type | String |
| CMD | Get |
| Allowed values | SearchInfo (status check) or Alignment (report formatting) |
| Parameter | NUM_THREADS |
| Definition | Number of virtual CPUs to use |
| Type | Integer |
| CMD | Put |
| Allowed values | Integer greater than zero and less than the maximum number |
| | of cores on the instance (default is the maximum number of |
| | cores on the instance). Supported only on the cloud. |
"""
from dataclasses import dataclass
import json
import time
from enum import Enum
import re
from typing import Optional, Dict
import requests
from requests.models import Response
qblastinfo_pattern:re.Pattern = re.compile(
r".*(QBlastInfoBegin.*QBlastInfoEnd).*",
re.IGNORECASE|re.MULTILINE|re.DOTALL
)
def get_qblast_info(resp: Response) -> Optional[dict[str, str]]:
"""
Gets the content of a QBlastInfo Block in a http response.
Args:
resp (Response): A Response instance after callling NCBI web server.
Returns:
Optional[dict[str, str]]: A dictionary with the content of the QBlastInfo
Block.
"""
match = re.match(
qblastinfo_pattern,
resp.text
)
if match:
lines = match.group(1).split("\n")
lines = [line.split("=") for line in lines]
lines = [x for x in lines if len(x)==2]
result = {
x[0].strip(): x[1].strip()
for x in lines
}
return result
return None
class NcbiProgram(Enum):
"""
The available NCBI programs.
"""
BLASTN = "blastn"
BLASTP = "blastp"
BLASTX = "blastx"
TBLASTN = "tblastn"
TBLASTX = "tblastx"
class NcbiDatabase(Enum):
"""
Available NCBI databases.
"""
NT = "nt"
NR = "nr"
REFSEQ_RNA = "refseq_rna"
REFSEQ_PROTEIN = "refseq_protein"
SWISSPROT = "swissprot"
PDBAA = "pdbaa"
PDBNT = "pdbnt"
class NcbiBlast:
"""
The NCBI Blast Class to call the API.
"""
url = "https://blast.ncbi.nlm.nih.gov/Blast.cgi"
def __init__(self) -> None:
self.megablast:bool = False
self.program:NcbiProgram = NcbiProgram.BLASTN
self.database:Optional[NcbiDatabase] = None
self.rid: Optional[str] = None
self.built: bool = False
self.last_response_time: Optional[float] = None
self.output_buffer: Optional[bytes] = None
def set_program(self, program: NcbiProgram) -> 'NcbiBlast':
"""
Sets ths NCBI program.
Args:
program (NcbiProgram): A program name.
Returns:
'NcbiBlast': Returns self, part of the builder pattern.
"""
self.program = program
return self
def set_megablast(self) -> 'NcbiBlast':
"""
Sets megablast option.
Returns:
'NcbiBlast': Returns self, part of the builder pattern.
"""
self.program = NcbiProgram.BLASTN
self.megablast = True
return self
def unset_megablast(self) -> 'NcbiBlast':
"""
Unsets the megablast option.
Returns:
'NcbiBlast': Returns self, part of the builder pattern.
"""
self.megablast = False
return self
def set_database(self, database:NcbiDatabase) -> 'NcbiBlast':
"""
Set the database for the search.
Args:
database (NcbiDatabase): An Ncbi database.
Returns:
'NcbiBlast': Returns self, part of the builder pattern.
"""
self.database = database
return self
def build(self) -> 'NcbiBlast':
"""
Builds a NcbiBlast object with all the information set.
Returns:
'NcbiBlast': A proper NCBI object.
Throws:
ValueError: Returns self, part of the builder pattern.
"""
if not self.database:
raise ValueError("No database specified")
if self.megablast and self.program != NcbiProgram.BLASTN:
raise ValueError("Megablast can only be set with blastn program")
self.built = True
return self
def query(self, query:str) -> tuple[str, str]:
"""
Submits a query to NCBI server and returns the job id and rtoe value.
Args:
query (str): A GI, Accession or sequence.
Returns:
tuple[str, str]: The RID (job id) and RTOE (estimated time to job
completion)
Throws:
ValueError: If database is not set or not job data available.
HTTPError: If request could not be processed.
"""
if not self.database:
raise ValueError("Database not set")
pars = {
"PROGRAM": self.program.value,
"DATABASE": self.database.value,
"CMD": "Put",
"QUERY": query
}
if self.megablast:
pars["MEGABLAST"] = "on"
resp = requests.put(NcbiBlast.url, params=pars, timeout=120)
resp.raise_for_status()
job_data = get_qblast_info(resp)
if not job_data:
raise ValueError("There was no job data available in NCBI response")
self.rid = job_data["RID"]
self.last_response_time = time.time()
return (job_data["RID"], job_data["RTOE"])
@staticmethod
def job_is_ready(rid:str) -> bool:
"""
Check if a job is ready.
Args:
rid (str): The Job Id.
Returns:
bool: True is the job is ready.
"""
pars = {
"CMD": "Get",
"FORMAT_OBJECT": "SearchInfo",
"RID": rid
}
resp = requests.get(NcbiBlast.url, pars, timeout=120)
job_data = get_qblast_info(resp)
if not job_data:
return False
status_key, status_value = next(iter(job_data.items()))
status_key = status_key.upper()
status_value = status_value.upper()
result = False
if status_key == "STATUS" and status_value == "WAITING":
result = False
if status_key == "STATUS" and status_value == "UNKNOWN":
result = False
if status_key == "STATUS" and status_value == "READY":
result = True
if status_key == "THEREAREHITS" and status_value == 'YES':
result = True
if status_key == "THEREAREHITS" and status_value == 'NO':
result = False
return result
def fetch_results(self) -> bool:
"""
Fetch results of the complete job.
Does not check that the result actually finished.
Returns:
bool: True if the request was succesful.
"""
pars = {
"CMD": "GET",
"RID": self.rid,
"FORMAT_TYPE": "JSON2"
}
resp = requests.get(
NcbiBlast.url,
params=pars,
timeout=120
)
try:
resp.raise_for_status()
except requests.HTTPError:
return False
self.output_buffer = resp.content
return True
def wait_until_finnish(self, max_time:float=5) -> bool:
"""
Waits Until the job is done or failed.
Checks if the job completed every 60 seconds.
Args:
max_time (float): Max time of waiting, in minutes.
Returns:
bool: True if the job is complete. False is there was an error or the
maximum waiting time is reached.
"""
if not self.last_response_time or not self.rid:
return False
max_time = max_time * 60
initial_time = self.last_response_time
while True:
current_time = time.time()
if current_time - initial_time > max_time:
return False
next_time = self.last_response_time + 60.0 - current_time
if next_time > 0:
print(next_time)
time.sleep(next_time)
self.last_response_time = time.time()
if NcbiBlast.job_is_ready(self.rid):
return self.fetch_results()
def get_output_buffer(self) -> bytes:
"""
Returns the buffer with the downloaded output.
Returns:
bytes: The output buffer data if there is any.
"""
if not self.output_buffer:
return bytes()
return self.output_buffer
def write_results(self, outfile:str) -> int:
"""
Writes output buffer to disk.
Args:
outfile (str): Output file to write.
Throws:
ValueError: If buffer is empty.
OSError: If file is not writable.
"""
if not self.output_buffer:
raise ValueError("No data in output buffer")
with open(outfile, 'wb') as f_out:
return f_out.write(self.output_buffer)
@dataclass
class BlastHit:
"""
Blast hit class. Contains information about individual blast hits results.
"""
def __init__(self, hit_data: Dict):
self.num = hit_data["num"]
self.description = [
BlastDescription(desc) for desc in hit_data["description"]
]
self.len = hit_data["len"]
self.hsps = [BlastHsp(hsp) for hsp in hit_data["hsps"]]
@dataclass
class BlastDescription:
"""
Blast Description class. contains information about the a sequence in a
blast hit.
"""
def __init__(self, desc_data: Dict):
self.identifier = desc_data["id"]
self.accession = desc_data["accession"]
self.title = desc_data["title"]
self.taxid = desc_data["taxid"]
self.sciname = desc_data["sciname"]
@dataclass
class BlastHsp:
"""
Blast High Scoring Pair. Contains information about the exact match between
the query sequence and the subject sequence.
"""
#pylint: disable=too-many-instance-attributes
def __init__(self, hsp_data: Dict):
self.num = hsp_data["num"]
self.bit_score = hsp_data["bit_score"]
self.score = hsp_data["score"]
self.evalue = hsp_data["evalue"]
self.identity = hsp_data["identity"]
self.query_from = hsp_data["query_from"]
self.query_to = hsp_data["query_to"]
self.query_strand = hsp_data["query_strand"]
self.hit_from = hsp_data["hit_from"]
self.hit_to = hsp_data["hit_to"]
self.hit_strand = hsp_data["hit_strand"]
self.align_len = hsp_data["align_len"]
self.gaps = hsp_data["gaps"]
self.qseq = hsp_data["qseq"]
self.hseq = hsp_data["hseq"]
self.midline = hsp_data["midline"]
@dataclass
class BlastStats:
"""
Blast Stats class. Contains information about general statistics of a
blast search.
"""
def __init__(self, stats_data: Dict):
self.db_num = stats_data["db_num"]
self.db_len = stats_data["db_len"]
self.hsp_len = stats_data["hsp_len"]
self.eff_space = stats_data["eff_space"]
self.kappa = stats_data["kappa"]
self.lambda_ = stats_data["lambda"]
self.entropy = stats_data["entropy"]
@dataclass
class BlastResults:
"""
Blast Results class. Contains information about the results of a blast
search, including statistics, hits and HSPs.
"""
def __init__(self, search_data: Dict):
self.query_id = search_data["query_id"]
self.query_len = search_data["query_len"]
self.hits = [BlastHit(hit) for hit in search_data["hits"]]
self.stat = BlastStats(search_data["stat"])
@dataclass
class BlastParams:
"""
Blast Parameters claass. Contains information about specific
search parameters used in the blast search.
"""
def __init__(self, params_data: Dict):
self.expect = params_data["expect"]
self.sc_match = params_data["sc_match"]
self.sc_mismatch = params_data["sc_mismatch"]
self.gap_open = params_data["gap_open"]
self.gap_extend = params_data["gap_extend"]
self.filter = params_data["filter"]
@dataclass
class BlastReport:
"""
Blast Report class. Contains general information about the blast program and
the database used for the blast search.
"""
def __init__(self, report_data: Dict):
self.program = report_data["program"]
self.version = report_data["version"]
self.reference = report_data["reference"]
self.search_target = report_data["search_target"]
self.params = BlastParams(report_data["params"])
@dataclass
class BlastOutput:
"""
Blast output class. is the main class to navigate a blast output.
"""
def __init__(
self,
report: BlastReport,
params: BlastParams,
results: BlastResults
):
self.report = report
self.params = params
self.results = results
@staticmethod
def from_json(filename: str) -> 'BlastOutput':
"""
Creates a blast output object from a json file.
Args:
filename (str): The input json file.
Returns:
BlastOutput2: A blast output object.
"""
with open(filename, 'r', encoding="utf-8") as f_in:
data = json.load(f_in)
report_data = data['BlastOutput2']['report']
params_data = report_data['params']
results_data = report_data['results']['search']
report = BlastReport(report_data)
params = BlastParams(params_data)
results = BlastResults(results_data)
return BlastOutput(report, params, results)
Functions
def get_qblast_info(resp: requests.models.Response) ‑> Optional[dict]
-
Gets the content of a QBlastInfo Block in a http response.
Args
resp
:Response
- A Response instance after callling NCBI web server.
Returns
Optional[dict[str, str]]
- A dictionary with the content of the QBlastInfo Block.
Expand source code
def get_qblast_info(resp: Response) -> Optional[dict[str, str]]: """ Gets the content of a QBlastInfo Block in a http response. Args: resp (Response): A Response instance after callling NCBI web server. Returns: Optional[dict[str, str]]: A dictionary with the content of the QBlastInfo Block. """ match = re.match( qblastinfo_pattern, resp.text ) if match: lines = match.group(1).split("\n") lines = [line.split("=") for line in lines] lines = [x for x in lines if len(x)==2] result = { x[0].strip(): x[1].strip() for x in lines } return result return None
Classes
class BlastDescription (desc_data: Dict)
-
Blast Description class. contains information about the a sequence in a blast hit.
Expand source code
@dataclass class BlastDescription: """ Blast Description class. contains information about the a sequence in a blast hit. """ def __init__(self, desc_data: Dict): self.identifier = desc_data["id"] self.accession = desc_data["accession"] self.title = desc_data["title"] self.taxid = desc_data["taxid"] self.sciname = desc_data["sciname"]
class BlastHit (hit_data: Dict)
-
Blast hit class. Contains information about individual blast hits results.
Expand source code
@dataclass class BlastHit: """ Blast hit class. Contains information about individual blast hits results. """ def __init__(self, hit_data: Dict): self.num = hit_data["num"] self.description = [ BlastDescription(desc) for desc in hit_data["description"] ] self.len = hit_data["len"] self.hsps = [BlastHsp(hsp) for hsp in hit_data["hsps"]]
class BlastHsp (hsp_data: Dict)
-
Blast High Scoring Pair. Contains information about the exact match between the query sequence and the subject sequence.
Expand source code
@dataclass class BlastHsp: """ Blast High Scoring Pair. Contains information about the exact match between the query sequence and the subject sequence. """ #pylint: disable=too-many-instance-attributes def __init__(self, hsp_data: Dict): self.num = hsp_data["num"] self.bit_score = hsp_data["bit_score"] self.score = hsp_data["score"] self.evalue = hsp_data["evalue"] self.identity = hsp_data["identity"] self.query_from = hsp_data["query_from"] self.query_to = hsp_data["query_to"] self.query_strand = hsp_data["query_strand"] self.hit_from = hsp_data["hit_from"] self.hit_to = hsp_data["hit_to"] self.hit_strand = hsp_data["hit_strand"] self.align_len = hsp_data["align_len"] self.gaps = hsp_data["gaps"] self.qseq = hsp_data["qseq"] self.hseq = hsp_data["hseq"] self.midline = hsp_data["midline"]
class BlastOutput (report: BlastReport, params: BlastParams, results: BlastResults)
-
Blast output class. is the main class to navigate a blast output.
Expand source code
@dataclass class BlastOutput: """ Blast output class. is the main class to navigate a blast output. """ def __init__( self, report: BlastReport, params: BlastParams, results: BlastResults ): self.report = report self.params = params self.results = results @staticmethod def from_json(filename: str) -> 'BlastOutput': """ Creates a blast output object from a json file. Args: filename (str): The input json file. Returns: BlastOutput2: A blast output object. """ with open(filename, 'r', encoding="utf-8") as f_in: data = json.load(f_in) report_data = data['BlastOutput2']['report'] params_data = report_data['params'] results_data = report_data['results']['search'] report = BlastReport(report_data) params = BlastParams(params_data) results = BlastResults(results_data) return BlastOutput(report, params, results)
Static methods
def from_json(filename: str) ‑> BlastOutput
-
Creates a blast output object from a json file.
Args
filename
:str
- The input json file.
Returns
BlastOutput2
- A blast output object.
Expand source code
@staticmethod def from_json(filename: str) -> 'BlastOutput': """ Creates a blast output object from a json file. Args: filename (str): The input json file. Returns: BlastOutput2: A blast output object. """ with open(filename, 'r', encoding="utf-8") as f_in: data = json.load(f_in) report_data = data['BlastOutput2']['report'] params_data = report_data['params'] results_data = report_data['results']['search'] report = BlastReport(report_data) params = BlastParams(params_data) results = BlastResults(results_data) return BlastOutput(report, params, results)
class BlastParams (params_data: Dict)
-
Blast Parameters claass. Contains information about specific search parameters used in the blast search.
Expand source code
@dataclass class BlastParams: """ Blast Parameters claass. Contains information about specific search parameters used in the blast search. """ def __init__(self, params_data: Dict): self.expect = params_data["expect"] self.sc_match = params_data["sc_match"] self.sc_mismatch = params_data["sc_mismatch"] self.gap_open = params_data["gap_open"] self.gap_extend = params_data["gap_extend"] self.filter = params_data["filter"]
class BlastReport (report_data: Dict)
-
Blast Report class. Contains general information about the blast program and the database used for the blast search.
Expand source code
@dataclass class BlastReport: """ Blast Report class. Contains general information about the blast program and the database used for the blast search. """ def __init__(self, report_data: Dict): self.program = report_data["program"] self.version = report_data["version"] self.reference = report_data["reference"] self.search_target = report_data["search_target"] self.params = BlastParams(report_data["params"])
class BlastResults (search_data: Dict)
-
Blast Results class. Contains information about the results of a blast search, including statistics, hits and HSPs.
Expand source code
@dataclass class BlastResults: """ Blast Results class. Contains information about the results of a blast search, including statistics, hits and HSPs. """ def __init__(self, search_data: Dict): self.query_id = search_data["query_id"] self.query_len = search_data["query_len"] self.hits = [BlastHit(hit) for hit in search_data["hits"]] self.stat = BlastStats(search_data["stat"])
class BlastStats (stats_data: Dict)
-
Blast Stats class. Contains information about general statistics of a blast search.
Expand source code
@dataclass class BlastStats: """ Blast Stats class. Contains information about general statistics of a blast search. """ def __init__(self, stats_data: Dict): self.db_num = stats_data["db_num"] self.db_len = stats_data["db_len"] self.hsp_len = stats_data["hsp_len"] self.eff_space = stats_data["eff_space"] self.kappa = stats_data["kappa"] self.lambda_ = stats_data["lambda"] self.entropy = stats_data["entropy"]
class NcbiBlast
-
The NCBI Blast Class to call the API.
Expand source code
class NcbiBlast: """ The NCBI Blast Class to call the API. """ url = "https://blast.ncbi.nlm.nih.gov/Blast.cgi" def __init__(self) -> None: self.megablast:bool = False self.program:NcbiProgram = NcbiProgram.BLASTN self.database:Optional[NcbiDatabase] = None self.rid: Optional[str] = None self.built: bool = False self.last_response_time: Optional[float] = None self.output_buffer: Optional[bytes] = None def set_program(self, program: NcbiProgram) -> 'NcbiBlast': """ Sets ths NCBI program. Args: program (NcbiProgram): A program name. Returns: 'NcbiBlast': Returns self, part of the builder pattern. """ self.program = program return self def set_megablast(self) -> 'NcbiBlast': """ Sets megablast option. Returns: 'NcbiBlast': Returns self, part of the builder pattern. """ self.program = NcbiProgram.BLASTN self.megablast = True return self def unset_megablast(self) -> 'NcbiBlast': """ Unsets the megablast option. Returns: 'NcbiBlast': Returns self, part of the builder pattern. """ self.megablast = False return self def set_database(self, database:NcbiDatabase) -> 'NcbiBlast': """ Set the database for the search. Args: database (NcbiDatabase): An Ncbi database. Returns: 'NcbiBlast': Returns self, part of the builder pattern. """ self.database = database return self def build(self) -> 'NcbiBlast': """ Builds a NcbiBlast object with all the information set. Returns: 'NcbiBlast': A proper NCBI object. Throws: ValueError: Returns self, part of the builder pattern. """ if not self.database: raise ValueError("No database specified") if self.megablast and self.program != NcbiProgram.BLASTN: raise ValueError("Megablast can only be set with blastn program") self.built = True return self def query(self, query:str) -> tuple[str, str]: """ Submits a query to NCBI server and returns the job id and rtoe value. Args: query (str): A GI, Accession or sequence. Returns: tuple[str, str]: The RID (job id) and RTOE (estimated time to job completion) Throws: ValueError: If database is not set or not job data available. HTTPError: If request could not be processed. """ if not self.database: raise ValueError("Database not set") pars = { "PROGRAM": self.program.value, "DATABASE": self.database.value, "CMD": "Put", "QUERY": query } if self.megablast: pars["MEGABLAST"] = "on" resp = requests.put(NcbiBlast.url, params=pars, timeout=120) resp.raise_for_status() job_data = get_qblast_info(resp) if not job_data: raise ValueError("There was no job data available in NCBI response") self.rid = job_data["RID"] self.last_response_time = time.time() return (job_data["RID"], job_data["RTOE"]) @staticmethod def job_is_ready(rid:str) -> bool: """ Check if a job is ready. Args: rid (str): The Job Id. Returns: bool: True is the job is ready. """ pars = { "CMD": "Get", "FORMAT_OBJECT": "SearchInfo", "RID": rid } resp = requests.get(NcbiBlast.url, pars, timeout=120) job_data = get_qblast_info(resp) if not job_data: return False status_key, status_value = next(iter(job_data.items())) status_key = status_key.upper() status_value = status_value.upper() result = False if status_key == "STATUS" and status_value == "WAITING": result = False if status_key == "STATUS" and status_value == "UNKNOWN": result = False if status_key == "STATUS" and status_value == "READY": result = True if status_key == "THEREAREHITS" and status_value == 'YES': result = True if status_key == "THEREAREHITS" and status_value == 'NO': result = False return result def fetch_results(self) -> bool: """ Fetch results of the complete job. Does not check that the result actually finished. Returns: bool: True if the request was succesful. """ pars = { "CMD": "GET", "RID": self.rid, "FORMAT_TYPE": "JSON2" } resp = requests.get( NcbiBlast.url, params=pars, timeout=120 ) try: resp.raise_for_status() except requests.HTTPError: return False self.output_buffer = resp.content return True def wait_until_finnish(self, max_time:float=5) -> bool: """ Waits Until the job is done or failed. Checks if the job completed every 60 seconds. Args: max_time (float): Max time of waiting, in minutes. Returns: bool: True if the job is complete. False is there was an error or the maximum waiting time is reached. """ if not self.last_response_time or not self.rid: return False max_time = max_time * 60 initial_time = self.last_response_time while True: current_time = time.time() if current_time - initial_time > max_time: return False next_time = self.last_response_time + 60.0 - current_time if next_time > 0: print(next_time) time.sleep(next_time) self.last_response_time = time.time() if NcbiBlast.job_is_ready(self.rid): return self.fetch_results() def get_output_buffer(self) -> bytes: """ Returns the buffer with the downloaded output. Returns: bytes: The output buffer data if there is any. """ if not self.output_buffer: return bytes() return self.output_buffer def write_results(self, outfile:str) -> int: """ Writes output buffer to disk. Args: outfile (str): Output file to write. Throws: ValueError: If buffer is empty. OSError: If file is not writable. """ if not self.output_buffer: raise ValueError("No data in output buffer") with open(outfile, 'wb') as f_out: return f_out.write(self.output_buffer)
Class variables
var url
Static methods
def job_is_ready(rid: str) ‑> bool
-
Check if a job is ready.
Args
rid
:str
- The Job Id.
Returns
bool
- True is the job is ready.
Expand source code
@staticmethod def job_is_ready(rid:str) -> bool: """ Check if a job is ready. Args: rid (str): The Job Id. Returns: bool: True is the job is ready. """ pars = { "CMD": "Get", "FORMAT_OBJECT": "SearchInfo", "RID": rid } resp = requests.get(NcbiBlast.url, pars, timeout=120) job_data = get_qblast_info(resp) if not job_data: return False status_key, status_value = next(iter(job_data.items())) status_key = status_key.upper() status_value = status_value.upper() result = False if status_key == "STATUS" and status_value == "WAITING": result = False if status_key == "STATUS" and status_value == "UNKNOWN": result = False if status_key == "STATUS" and status_value == "READY": result = True if status_key == "THEREAREHITS" and status_value == 'YES': result = True if status_key == "THEREAREHITS" and status_value == 'NO': result = False return result
Methods
def build(self) ‑> NcbiBlast
-
Builds a NcbiBlast object with all the information set.
Returns
'NcbiBlast': A proper NCBI object.
Throws
ValueError: Returns self, part of the builder pattern.
Expand source code
def build(self) -> 'NcbiBlast': """ Builds a NcbiBlast object with all the information set. Returns: 'NcbiBlast': A proper NCBI object. Throws: ValueError: Returns self, part of the builder pattern. """ if not self.database: raise ValueError("No database specified") if self.megablast and self.program != NcbiProgram.BLASTN: raise ValueError("Megablast can only be set with blastn program") self.built = True return self
def fetch_results(self) ‑> bool
-
Fetch results of the complete job. Does not check that the result actually finished.
Returns
bool
- True if the request was succesful.
Expand source code
def fetch_results(self) -> bool: """ Fetch results of the complete job. Does not check that the result actually finished. Returns: bool: True if the request was succesful. """ pars = { "CMD": "GET", "RID": self.rid, "FORMAT_TYPE": "JSON2" } resp = requests.get( NcbiBlast.url, params=pars, timeout=120 ) try: resp.raise_for_status() except requests.HTTPError: return False self.output_buffer = resp.content return True
def get_output_buffer(self) ‑> bytes
-
Returns the buffer with the downloaded output.
Returns
bytes
- The output buffer data if there is any.
Expand source code
def get_output_buffer(self) -> bytes: """ Returns the buffer with the downloaded output. Returns: bytes: The output buffer data if there is any. """ if not self.output_buffer: return bytes() return self.output_buffer
def query(self, query: str) ‑> tuple
-
Submits a query to NCBI server and returns the job id and rtoe value.
Args
query
:str
- A GI, Accession or sequence.
Returns
tuple[str, str]
- The RID (job id) and RTOE (estimated time to job completion)
Throws
ValueError: If database is not set or not job data available. HTTPError: If request could not be processed.
Expand source code
def query(self, query:str) -> tuple[str, str]: """ Submits a query to NCBI server and returns the job id and rtoe value. Args: query (str): A GI, Accession or sequence. Returns: tuple[str, str]: The RID (job id) and RTOE (estimated time to job completion) Throws: ValueError: If database is not set or not job data available. HTTPError: If request could not be processed. """ if not self.database: raise ValueError("Database not set") pars = { "PROGRAM": self.program.value, "DATABASE": self.database.value, "CMD": "Put", "QUERY": query } if self.megablast: pars["MEGABLAST"] = "on" resp = requests.put(NcbiBlast.url, params=pars, timeout=120) resp.raise_for_status() job_data = get_qblast_info(resp) if not job_data: raise ValueError("There was no job data available in NCBI response") self.rid = job_data["RID"] self.last_response_time = time.time() return (job_data["RID"], job_data["RTOE"])
def set_database(self, database: NcbiDatabase) ‑> NcbiBlast
-
Set the database for the search.
Args
database
:NcbiDatabase
- An Ncbi database.
Returns
'NcbiBlast': Returns self, part of the builder pattern.
Expand source code
def set_database(self, database:NcbiDatabase) -> 'NcbiBlast': """ Set the database for the search. Args: database (NcbiDatabase): An Ncbi database. Returns: 'NcbiBlast': Returns self, part of the builder pattern. """ self.database = database return self
def set_megablast(self) ‑> NcbiBlast
-
Sets megablast option.
Returns
'NcbiBlast': Returns self, part of the builder pattern.
Expand source code
def set_megablast(self) -> 'NcbiBlast': """ Sets megablast option. Returns: 'NcbiBlast': Returns self, part of the builder pattern. """ self.program = NcbiProgram.BLASTN self.megablast = True return self
def set_program(self, program: NcbiProgram) ‑> NcbiBlast
-
Sets ths NCBI program.
Args
program
:NcbiProgram
- A program name.
Returns
'NcbiBlast': Returns self, part of the builder pattern.
Expand source code
def set_program(self, program: NcbiProgram) -> 'NcbiBlast': """ Sets ths NCBI program. Args: program (NcbiProgram): A program name. Returns: 'NcbiBlast': Returns self, part of the builder pattern. """ self.program = program return self
def unset_megablast(self) ‑> NcbiBlast
-
Unsets the megablast option.
Returns
'NcbiBlast': Returns self, part of the builder pattern.
Expand source code
def unset_megablast(self) -> 'NcbiBlast': """ Unsets the megablast option. Returns: 'NcbiBlast': Returns self, part of the builder pattern. """ self.megablast = False return self
def wait_until_finnish(self, max_time: float = 5) ‑> bool
-
Waits Until the job is done or failed. Checks if the job completed every 60 seconds.
Args
max_time
:float
- Max time of waiting, in minutes.
Returns
bool
- True if the job is complete. False is there was an error or the maximum waiting time is reached.
Expand source code
def wait_until_finnish(self, max_time:float=5) -> bool: """ Waits Until the job is done or failed. Checks if the job completed every 60 seconds. Args: max_time (float): Max time of waiting, in minutes. Returns: bool: True if the job is complete. False is there was an error or the maximum waiting time is reached. """ if not self.last_response_time or not self.rid: return False max_time = max_time * 60 initial_time = self.last_response_time while True: current_time = time.time() if current_time - initial_time > max_time: return False next_time = self.last_response_time + 60.0 - current_time if next_time > 0: print(next_time) time.sleep(next_time) self.last_response_time = time.time() if NcbiBlast.job_is_ready(self.rid): return self.fetch_results()
def write_results(self, outfile: str) ‑> int
-
Writes output buffer to disk.
Args
outfile
:str
- Output file to write.
Throws
ValueError: If buffer is empty. OSError: If file is not writable.
Expand source code
def write_results(self, outfile:str) -> int: """ Writes output buffer to disk. Args: outfile (str): Output file to write. Throws: ValueError: If buffer is empty. OSError: If file is not writable. """ if not self.output_buffer: raise ValueError("No data in output buffer") with open(outfile, 'wb') as f_out: return f_out.write(self.output_buffer)
class NcbiDatabase (value, names=None, *, module=None, qualname=None, type=None, start=1)
-
Available NCBI databases.
Expand source code
class NcbiDatabase(Enum): """ Available NCBI databases. """ NT = "nt" NR = "nr" REFSEQ_RNA = "refseq_rna" REFSEQ_PROTEIN = "refseq_protein" SWISSPROT = "swissprot" PDBAA = "pdbaa" PDBNT = "pdbnt"
Ancestors
- enum.Enum
Class variables
var NR
var NT
var PDBAA
var PDBNT
var REFSEQ_PROTEIN
var REFSEQ_RNA
var SWISSPROT
class NcbiProgram (value, names=None, *, module=None, qualname=None, type=None, start=1)
-
The available NCBI programs.
Expand source code
class NcbiProgram(Enum): """ The available NCBI programs. """ BLASTN = "blastn" BLASTP = "blastp" BLASTX = "blastx" TBLASTN = "tblastn" TBLASTX = "tblastx"
Ancestors
- enum.Enum
Class variables
var BLASTN
var BLASTP
var BLASTX
var TBLASTN
var TBLASTX