Demo module

demo.fetch_api_data(request_url, params=None)[source]

Fetches data from an API.

Parameters:
  • request_url (str) – The URL of the API endpoint.

  • params (dict, optional) – Parameters to be sent with the request. Defaults to None.

Returns:

JSON response from the API.

Return type:

dict

Raises:

Exception – If there’s an issue with the request or parsing the response.

demo.get_api_params(demo_number=0, use_gene_list=False, gene_list=None, custom_params=None, api_url=None)[source]

Generates API request parameters and URL based on the specified demo number, gene list usage, and custom parameters.

Parameters:
  • demo_number (int) – The demo number specifying which example query to use.

  • use_gene_list (bool) – Whether to use a predefined gene list for the query.

  • gene_list (list, optional) – List of genes to use for the query. Defaults to None.

  • custom_params (dict, optional) – Custom API parameters to override or supplement defaults. Defaults to None.

  • api_url (str, optional) – Custom API URL to fetch the data from. Defaults to None.

Returns:

A tuple containing the API parameters and request URL.

Return type:

tuple

demo.get_example_graph(demo_number=0, use_gene_list=False, gene_list=None, custom_params=None, api_url=None)[source]

Retrieves and parses an example graph from the STRING database based on the specified demo number, gene list usage, and additional API parameters.

Parameters:
  • demo_number (int) – The demo number specifying which example query to use.

  • use_gene_list (bool) – Whether to use a predefined gene list for the query.

  • gene_list (list, optional) – List of genes to use for the query. Defaults to None.

  • custom_params (dict, optional) – Custom API parameters to override or supplement defaults. Defaults to None.

  • api_url (str, optional) – Custom API URL to fetch the data from. Defaults to None.

Returns:

A NetworkX graph representing the retrieved data.

Return type:

nx.Graph

demo.main()[source]
demo.parse_args()[source]

Parse command-line arguments for graph compression demo script.

Command-line arguments:

Parameters:
  • --demo (int) – Demo number to run (0-6). Specifies which example query to use for fetching data from the STRING database.

  • --use-gene-list (bool) – Flag to use a predefined gene list for the query.

  • --gene-list (list of str) – List of genes to use for the query. Ignored if –use-gene-list is specified.

  • --custom-params (list of str) – Custom parameters to be sent with the API request. Only applicable when –use-gene-list is set. Should be in the format ‘key1=value1 key2=value2’.

  • --api-url (str) – Custom API URL to fetch the data from. If not specified, the default STRING API URL is used.

  • --input (str) – Path to the input file. Default is “data/input/simple_graph.txt”.

  • --output (str) – Path to the output folder. Default is “data/output”.

  • --output-format (str) – Output file format. Options: ‘.edgelist’, ‘.txt’, ‘.csv’, ‘.tsv’, ‘.json’, ‘.gpickle’, ‘.gml’, ‘.graphml’, ‘.net’, ‘.pajek’, ‘.gexf’, ‘.yaml’, ‘.yml’. Default is ‘txt’.

  • --resolution (float) – Resolution parameter for Louvain and greedy methods.

  • --k (int) – Number of clusters for clustering methods. Only applicable for methods requiring a cluster count (e.g., ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘nmf’).

  • --is-weighted (bool) – Flag to indicate the use of a weighted graph.

  • --is-gene-network (bool) – Flag to indicate the use of a gene network.

  • --is-lossless (bool) – Flag to indicate the use of lossless compression.

  • --no-plot (bool) – Flag to disable plotting the original and compressed graphs. Default is False.

  • --no-interactive-plot (bool) – Flag to disable plotting the graph in interactive mode.

  • --plot-disconnected (bool) – Flag to plot all nodes in a disconnected graph, not just the largest connected component.

  • --method (list of str) – List of compression methods to run. Options: ‘louvain’, ‘greedy’ (default), ‘label_propagation’, ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘cpm’, ‘nmf’. If ‘all’, all the implemented compression methods will be used.

Returns:

Parsed command-line arguments.

Return type:

args (argparse.Namespace)

demo.parse_graph_data(json_data)[source]

Parses data from the STRING API response into a NetworkX graph.

Parameters:

json_data (list) – List of dictionaries containing data from the API response.

Returns:

A NetworkX graph representing the parsed data.

Return type:

nx.Graph

Raises:

ValueError – If the JSON data is invalid or missing required fields.

demo.run_demo(demo=None, use_gene_list=False, gene_list=None, custom_params=None, api_url=None, input='data/input/simple_graph.txt', output='data/output', output_format='txt', resolution=None, k=None, is_weighted=False, is_gene_network=False, is_lossless=False, no_plot=False, no_interactive_plot=False, plot_disconnected=False, separate_communities=False, method=['greedy'])[source]

Main function for the GraphPack tool demo script.

This script demonstrates the usage of the GraphPack tool by fetching data from the STRING database, parsing it into a NetworkX graph, and performing various compression methods on the graph.

Parameters:
  • demo (int, optional) – Demo number to run (0-6). Specifies which example query to use for fetching data from the STRING database.

  • use_gene_list (bool, optional) – Flag to use a predefined gene list for the query.

  • gene_list (list of str, optional) – List of genes to use for the query.

  • custom_params (dict, optional) – Custom parameters to be sent with the API request. Should be in the format ‘{“key1”: “value1”, “key2”: “value2”}’.

  • api_url (str, optional) – Custom API URL to fetch the data from.

  • input (str, optional) – Path to the input file. Default is “data/input/simple_graph.txt”.

  • output (str, optional) – Path to the output folder. Default is “data/output”.

  • output_format (str) – File format to save the network files. Options: ‘.edgelist’, ‘.txt’ (default), ‘.csv’, ‘.tsv’, ‘.json’, ‘.gpickle’, ‘.gml’, ‘.graphml’, ‘.net’, ‘.pajek’, ‘.gexf’, ‘.yaml’, ‘.yml’.

  • resolution (float, optional) – Resolution parameter for Louvain and greedy methods.

  • k (int, optional) – Number of clusters for clustering methods. Only applicable for methods requiring a cluster count (e.g., ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘nmf’).

  • is_weighted (bool, optional) – Flag to indicate the use of a weighted graph.

  • is_gene_network (bool, optional) – Flag to indicate the use of a gene network.

  • is_lossless (bool, optional) – Flag to indicate the use of lossless compression.

  • no_plot (bool, optional) – Flag to disable plotting the original and compressed graphs. Default is False.

  • no_interactive_plot (bool, optional) – Flag to disable plotting the graph in interactive mode.

  • plot_disconnected (bool, optional) – Flag to plot all nodes in a disconnected graph, not just the largest connected component.

  • separate_communities (bool, optional) – Flag to enforce separation of identified communities in the plots.

  • method (list of str, optional) – List of compression methods to run. Options: ‘louvain’, ‘greedy’ (default), ‘label_propagation’, ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘cpm’, ‘nmf’.

Returns:

None

Examples

>>> from graphpack.demo.demo import *
>>> # Run a default demo, greedy method
>>> run_demo(demo=5, input='./', output='results')
>>> # Run a demo with predefined gene list, weighted, non default method and parameters
>>> run_demo(demo=3, use_gene_list=True, method=['hclust'], k=11, is_gene_network=True, is_weighted=True, input='./', output='results')
>>> # Run demo pipeline with custom gene list, multiple methods
>>> run_demo(use_gene_list=True, gene_list=['TP53', 'BRCA1', 'EGFR', 'MYC', 'AKT1', 'PIK3CA', 'PTEN', 'RB1', 'KRAS', 'MAPK1'], is_gene_network=True, method=['louvain', 'greedy', 'hclust'], resolution=1.0, k=4, input='./', output='results')
>>> # Run demo pipeline with custom gene network, suppress plots
>>> run_demo(input="path/to/edgelist.txt", is_gene_network=True, is_weighted=True, no_plot=True, output='results')
>>> # Run demo pipeline with custom API URL and non defgault output format
>>> run_demo(api_url="https://string-db.org/api/json/interaction_partners?identifiers=TP53%0dBRCA1%0dEGFR%0dMYC&required_score=990", output_format="json", input='./', output='results')
>>> # Run demo pipeline with predefined gene list, custom parameters
>>> run_demo(demo=1, use_gene_list=True, custom_params={"required_score": 100}, input='./', output='results')
>>> # Run demo pipeline with custom gene list, custom parameters
>>> run_demo(demo=1, use_gene_list=True, custom_params={"required_score": 100}, input='./', output='results')

Sankey module

sankey.create_sankey_plot(transitions, min_size, method, input_graph, output_folder)[source]

Create and save the Sankey plot.

Parameters:
  • transitions (list) – List of transitions between clusters.

  • min_size (int) – Minimum cluster size to be considered significant.

  • parameter (str) – Clustering parameter name.

  • method (str) – Clustering method used.

  • input_graph (str) – Knowledge graph identifier.

  • output_folder (str) – Path to the output folder.

Returns:

None

sankey.lighten_color(color, amount=0.5)[source]

Lighten the given color by the specified amount.

Parameters:
  • color (str) – A color name or RGBA string.

  • amount (float, optional) – The amount to lighten the color (0.0 to 1.0).

Returns:

The lightened color as an RGBA string.

Return type:

str

sankey.load_data(input_path, graph, method, parameter, parameters)[source]

Load compression mappings and group labels for each parameter.

Parameters:
  • input_path (str) – Path to the input files.

  • graph (str) – Input graph identifier.

  • method (str) – Clustering method used.

  • parameter (str) – Clustering parameter name.

  • parameters (list) – List of parameters to be analyzed.

Returns:

Two dictionaries, one for compression mappings and one for group labels, and the (eventually updated) list of parameter’s values.

Return type:

tuple

sankey.main()[source]
sankey.map_transitions(compression_mappings, groups, parameters, min_size)[source]

Map transitions between consecutive parameters.

Parameters:
  • compression_mappings (dict) – Compression mappings for each parameter.

  • groups (dict) – Group labels for each parameter.

  • parameters (list) – List of parameters to be analyzed.

  • min_size (int) – Minimum cluster size to be considered significant.

Returns:

List of transitions between clusters.

Return type:

list

sankey.parse_args()[source]

Parse command-line arguments for Sankey plot script.

Command-line arguments:

Parameters:
  • --graph (str) – Input graph identifier. Required argument.

  • --input-path (str) – Path to the input files. Default is “data/output”.

  • --output-folder (str) – Path to the output folder. Default is “sankey”.

  • --min-size (int) – Minimum cluster size to be considered significant. Default is 100.

  • --method (str) – Clustering method used. Default is “louvain”.

  • --parameter (str) – Clustering parameter name. Default is “r”.

  • --parameters (list of float) – List of parameters to be analyzed. Default is [1.25, 3.0, 5.0, 10.0, 20.0, 30.0].

Returns:

Parsed command-line arguments.

Return type:

args (argparse.Namespace)

sankey.produce_sankey(graph, input_path='data/output', output_folder='sankey', min_size=100, method='louvain', parameter='r', values=[1.25, 3.0, 5.0, 10.0, 20.0, 30.0])[source]

Main function for the GraphPack tool Sankey plot script.

This script generates a Sankey plot to visualize gene community membership transitions across different clustering resolutions for a given network.

Parameters:
  • graph (str) – Input graph identifier. Required parameter.

  • input_path (str, optional) – Path to the input files. Default is “data/output”.

  • output_folder (str, optional) – Path to the output folder. Default is “sankey”.

  • min_size (int, optional) – Minimum cluster size to be considered significant. Default is 100.

  • method (str, optional) – Clustering method used. Default is “louvain”.

  • parameter (str, optional) – Clustering parameter name, as it appears in the subfolders’ names. Default is “r”.

  • values (list of float, optional) – List of parameters to be analyzed. Default is [1.25, 3.0, 5.0, 10.0, 20.0, 30.0].

Examples

>>> from graphpack.demo.sankey import *
>>> produce_sankey(input_path="./", output_folder="results", min_size=50, graph='simple_graph', method='hclust', parameter='k', values=[10, 50, 100, 250])