Demo module
- demo.fetch_api_data(request_url, params=None)[source]
Fetches data from an API.
- Parameters:
request_url (str) – The URL of the API endpoint.
params (dict, optional) – Parameters to be sent with the request. Defaults to None.
- Returns:
JSON response from the API.
- Return type:
dict
- Raises:
Exception – If there’s an issue with the request or parsing the response.
- demo.get_api_params(demo_number=0, use_gene_list=False, gene_list=None, custom_params=None, api_url=None)[source]
Generates API request parameters and URL based on the specified demo number, gene list usage, and custom parameters.
- Parameters:
demo_number (int) – The demo number specifying which example query to use.
use_gene_list (bool) – Whether to use a predefined gene list for the query.
gene_list (list, optional) – List of genes to use for the query. Defaults to None.
custom_params (dict, optional) – Custom API parameters to override or supplement defaults. Defaults to None.
api_url (str, optional) – Custom API URL to fetch the data from. Defaults to None.
- Returns:
A tuple containing the API parameters and request URL.
- Return type:
tuple
- demo.get_example_graph(demo_number=0, use_gene_list=False, gene_list=None, custom_params=None, api_url=None)[source]
Retrieves and parses an example graph from the STRING database based on the specified demo number, gene list usage, and additional API parameters.
- Parameters:
demo_number (int) – The demo number specifying which example query to use.
use_gene_list (bool) – Whether to use a predefined gene list for the query.
gene_list (list, optional) – List of genes to use for the query. Defaults to None.
custom_params (dict, optional) – Custom API parameters to override or supplement defaults. Defaults to None.
api_url (str, optional) – Custom API URL to fetch the data from. Defaults to None.
- Returns:
A NetworkX graph representing the retrieved data.
- Return type:
nx.Graph
- demo.parse_args()[source]
Parse command-line arguments for graph compression demo script.
Command-line arguments:
- Parameters:
--demo (int) – Demo number to run (0-6). Specifies which example query to use for fetching data from the STRING database.
--use-gene-list (bool) – Flag to use a predefined gene list for the query.
--gene-list (list of str) – List of genes to use for the query. Ignored if –use-gene-list is specified.
--custom-params (list of str) – Custom parameters to be sent with the API request. Only applicable when –use-gene-list is set. Should be in the format ‘key1=value1 key2=value2’.
--api-url (str) – Custom API URL to fetch the data from. If not specified, the default STRING API URL is used.
--input (str) – Path to the input file. Default is “data/input/simple_graph.txt”.
--output (str) – Path to the output folder. Default is “data/output”.
--output-format (str) – Output file format. Options: ‘.edgelist’, ‘.txt’, ‘.csv’, ‘.tsv’, ‘.json’, ‘.gpickle’, ‘.gml’, ‘.graphml’, ‘.net’, ‘.pajek’, ‘.gexf’, ‘.yaml’, ‘.yml’. Default is ‘txt’.
--resolution (float) – Resolution parameter for Louvain and greedy methods.
--k (int) – Number of clusters for clustering methods. Only applicable for methods requiring a cluster count (e.g., ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘nmf’).
--is-weighted (bool) – Flag to indicate the use of a weighted graph.
--is-gene-network (bool) – Flag to indicate the use of a gene network.
--is-lossless (bool) – Flag to indicate the use of lossless compression.
--no-plot (bool) – Flag to disable plotting the original and compressed graphs. Default is False.
--no-interactive-plot (bool) – Flag to disable plotting the graph in interactive mode.
--plot-disconnected (bool) – Flag to plot all nodes in a disconnected graph, not just the largest connected component.
--method (list of str) – List of compression methods to run. Options: ‘louvain’, ‘greedy’ (default), ‘label_propagation’, ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘cpm’, ‘nmf’. If ‘all’, all the implemented compression methods will be used.
- Returns:
Parsed command-line arguments.
- Return type:
args (argparse.Namespace)
- demo.parse_graph_data(json_data)[source]
Parses data from the STRING API response into a NetworkX graph.
- Parameters:
json_data (list) – List of dictionaries containing data from the API response.
- Returns:
A NetworkX graph representing the parsed data.
- Return type:
nx.Graph
- Raises:
ValueError – If the JSON data is invalid or missing required fields.
- demo.run_demo(demo=None, use_gene_list=False, gene_list=None, custom_params=None, api_url=None, input='data/input/simple_graph.txt', output='data/output', output_format='txt', resolution=None, k=None, is_weighted=False, is_gene_network=False, is_lossless=False, no_plot=False, no_interactive_plot=False, plot_disconnected=False, separate_communities=False, method=['greedy'])[source]
Main function for the GraphPack tool demo script.
This script demonstrates the usage of the GraphPack tool by fetching data from the STRING database, parsing it into a NetworkX graph, and performing various compression methods on the graph.
- Parameters:
demo (int, optional) – Demo number to run (0-6). Specifies which example query to use for fetching data from the STRING database.
use_gene_list (bool, optional) – Flag to use a predefined gene list for the query.
gene_list (list of str, optional) – List of genes to use for the query.
custom_params (dict, optional) – Custom parameters to be sent with the API request. Should be in the format ‘{“key1”: “value1”, “key2”: “value2”}’.
api_url (str, optional) – Custom API URL to fetch the data from.
input (str, optional) – Path to the input file. Default is “data/input/simple_graph.txt”.
output (str, optional) – Path to the output folder. Default is “data/output”.
output_format (str) – File format to save the network files. Options: ‘.edgelist’, ‘.txt’ (default), ‘.csv’, ‘.tsv’, ‘.json’, ‘.gpickle’, ‘.gml’, ‘.graphml’, ‘.net’, ‘.pajek’, ‘.gexf’, ‘.yaml’, ‘.yml’.
resolution (float, optional) – Resolution parameter for Louvain and greedy methods.
k (int, optional) – Number of clusters for clustering methods. Only applicable for methods requiring a cluster count (e.g., ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘nmf’).
is_weighted (bool, optional) – Flag to indicate the use of a weighted graph.
is_gene_network (bool, optional) – Flag to indicate the use of a gene network.
is_lossless (bool, optional) – Flag to indicate the use of lossless compression.
no_plot (bool, optional) – Flag to disable plotting the original and compressed graphs. Default is False.
no_interactive_plot (bool, optional) – Flag to disable plotting the graph in interactive mode.
plot_disconnected (bool, optional) – Flag to plot all nodes in a disconnected graph, not just the largest connected component.
separate_communities (bool, optional) – Flag to enforce separation of identified communities in the plots.
method (list of str, optional) – List of compression methods to run. Options: ‘louvain’, ‘greedy’ (default), ‘label_propagation’, ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘cpm’, ‘nmf’.
- Returns:
None
Examples
>>> from graphpack.demo.demo import *
>>> # Run a default demo, greedy method >>> run_demo(demo=5, input='./', output='results') >>> # Run a demo with predefined gene list, weighted, non default method and parameters >>> run_demo(demo=3, use_gene_list=True, method=['hclust'], k=11, is_gene_network=True, is_weighted=True, input='./', output='results') >>> # Run demo pipeline with custom gene list, multiple methods >>> run_demo(use_gene_list=True, gene_list=['TP53', 'BRCA1', 'EGFR', 'MYC', 'AKT1', 'PIK3CA', 'PTEN', 'RB1', 'KRAS', 'MAPK1'], is_gene_network=True, method=['louvain', 'greedy', 'hclust'], resolution=1.0, k=4, input='./', output='results') >>> # Run demo pipeline with custom gene network, suppress plots >>> run_demo(input="path/to/edgelist.txt", is_gene_network=True, is_weighted=True, no_plot=True, output='results') >>> # Run demo pipeline with custom API URL and non defgault output format >>> run_demo(api_url="https://string-db.org/api/json/interaction_partners?identifiers=TP53%0dBRCA1%0dEGFR%0dMYC&required_score=990", output_format="json", input='./', output='results') >>> # Run demo pipeline with predefined gene list, custom parameters >>> run_demo(demo=1, use_gene_list=True, custom_params={"required_score": 100}, input='./', output='results') >>> # Run demo pipeline with custom gene list, custom parameters >>> run_demo(demo=1, use_gene_list=True, custom_params={"required_score": 100}, input='./', output='results')
Sankey module
- sankey.create_sankey_plot(transitions, min_size, method, input_graph, output_folder)[source]
Create and save the Sankey plot.
- Parameters:
transitions (list) – List of transitions between clusters.
min_size (int) – Minimum cluster size to be considered significant.
parameter (str) – Clustering parameter name.
method (str) – Clustering method used.
input_graph (str) – Knowledge graph identifier.
output_folder (str) – Path to the output folder.
- Returns:
None
- sankey.lighten_color(color, amount=0.5)[source]
Lighten the given color by the specified amount.
- Parameters:
color (str) – A color name or RGBA string.
amount (float, optional) – The amount to lighten the color (0.0 to 1.0).
- Returns:
The lightened color as an RGBA string.
- Return type:
str
- sankey.load_data(input_path, graph, method, parameter, parameters)[source]
Load compression mappings and group labels for each parameter.
- Parameters:
input_path (str) – Path to the input files.
graph (str) – Input graph identifier.
method (str) – Clustering method used.
parameter (str) – Clustering parameter name.
parameters (list) – List of parameters to be analyzed.
- Returns:
Two dictionaries, one for compression mappings and one for group labels, and the (eventually updated) list of parameter’s values.
- Return type:
tuple
- sankey.map_transitions(compression_mappings, groups, parameters, min_size)[source]
Map transitions between consecutive parameters.
- Parameters:
compression_mappings (dict) – Compression mappings for each parameter.
groups (dict) – Group labels for each parameter.
parameters (list) – List of parameters to be analyzed.
min_size (int) – Minimum cluster size to be considered significant.
- Returns:
List of transitions between clusters.
- Return type:
list
- sankey.parse_args()[source]
Parse command-line arguments for Sankey plot script.
Command-line arguments:
- Parameters:
--graph (str) – Input graph identifier. Required argument.
--input-path (str) – Path to the input files. Default is “data/output”.
--output-folder (str) – Path to the output folder. Default is “sankey”.
--min-size (int) – Minimum cluster size to be considered significant. Default is 100.
--method (str) – Clustering method used. Default is “louvain”.
--parameter (str) – Clustering parameter name. Default is “r”.
--parameters (list of float) – List of parameters to be analyzed. Default is [1.25, 3.0, 5.0, 10.0, 20.0, 30.0].
- Returns:
Parsed command-line arguments.
- Return type:
args (argparse.Namespace)
- sankey.produce_sankey(graph, input_path='data/output', output_folder='sankey', min_size=100, method='louvain', parameter='r', values=[1.25, 3.0, 5.0, 10.0, 20.0, 30.0])[source]
Main function for the GraphPack tool Sankey plot script.
This script generates a Sankey plot to visualize gene community membership transitions across different clustering resolutions for a given network.
- Parameters:
graph (str) – Input graph identifier. Required parameter.
input_path (str, optional) – Path to the input files. Default is “data/output”.
output_folder (str, optional) – Path to the output folder. Default is “sankey”.
min_size (int, optional) – Minimum cluster size to be considered significant. Default is 100.
method (str, optional) – Clustering method used. Default is “louvain”.
parameter (str, optional) – Clustering parameter name, as it appears in the subfolders’ names. Default is “r”.
values (list of float, optional) – List of parameters to be analyzed. Default is [1.25, 3.0, 5.0, 10.0, 20.0, 30.0].
Examples
>>> from graphpack.demo.sankey import * >>> produce_sankey(input_path="./", output_folder="results", min_size=50, graph='simple_graph', method='hclust', parameter='k', values=[10, 50, 100, 250])