Graph Compression module
- compression.calculate_compression_efficacy(output_folder, is_lossless=True, original_network_file='original_network.txt', compressed_network_file='compressed_network.txt', compression_mapping_file='compression_mapping.msgpack', decompression_mapping_file='decompression_mapping.msgpack', edges_to_remove_file='edges_to_remove.txt')[source]
Calculates and prints the efficacy of the network compression.
- Parameters:
output_folder (str) – The folder path containing the network files.
is_lossless (bool, optional) – Flag indicating whether the compression is lossless (default: True).
original_network_file (str, optional) – The file name of the original network (default: ‘original_network.txt’).
compressed_network_file (str, optional) – The file name of the compressed network (default: ‘compressed_network.txt’).
compression_mapping_file (str, optional) – The file name of the compression mapping (default: ‘compression_mapping.msgpack’).
decompression_mapping_file (str, optional) – The file name of the decompression mapping (default: ‘decompression_mapping.msgpack’).
edges_to_remove_file (str, optional) – The file name containing the edges to remove (default: ‘edges_to_remove.txt’).
- Returns:
None
- Raises:
ValueError – If any required file is missing or cannot be read.
- compression.compress_graph(graph, method='louvain', seed=123, resolution=1.25, k=3, is_weighted=True, is_gene_network=True)[source]
Compresses a graph using the specified method and generates a new graph where nodes represent communities.
- Parameters:
graph (nx.Graph) – The original network graph.
method (str, optional) – The compression method to use. Options: ‘louvain’ (default), ‘greedy’, ‘label_propagation’, ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘cpm’, ‘nmf’.
seed (int, optional) – Random seed for reproducibility. Defaults to 123.
resolution (float, optional) – Resolution parameter for Louvain and greedy methods. Defaults to 1.25.
k (int, optional) – Number of communities/clusters/components for ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’ methods. For ‘cpm’ is the size of the smallest clique. Defaults to 3.
is_weighted (bool, optional) – Whether to consider edge weights. Defaults to True.
is_gene_network (bool, optional) – Whether the network is a gene network to perform GSEA. Defaults to True.
- Returns:
- A tuple containing:
compressed_graph (nx.Graph): The compressed network graph where nodes represent communities.
compression_mapping (dict): A mapping of original nodes to their corresponding community nodes.
decompression_mapping (dict): A mapping of community nodes to the original nodes they represent.
- Return type:
tuple
- Raises:
ValueError – If the specified method is not supported or if the graph is not connected and connectivity is required.
Example
>>> import networkx as nx >>> from graphpack.compression import compress_graph, detect_communities, compress_graph_partition_based
>>> # Example: Compress a weighted graph (gene network) using the Louvain method >>> G = nx.Graph() >>> edges = [("HIF1A", "EGFR", 0.934), ("HIF1A", "JAK2", 0.784), ("HIF1A", "IGF1R", 0.752), ("EGFR", "IGF1R", 0.989), ("JAK2", "IGF1R", 0.981)] >>> G.add_weighted_edges_from(edges) >>> compressed_graph, compression_mapping, decompression_mapping = compress_graph(G, method='louvain', is_weighted=True, is_gene_network=True) >>> print(compressed_graph.edges(data=True)) [(1, 0, {'weight': 2.525})] >>> print(compression_mapping) {1: ['HIF1A', 'EGFR'], 0: ['JAK2', 'IGF1R']} >>> print(decompression_mapping) {'HIF1A': 1, 'EGFR': 1, 'JAK2': 0, 'IGF1R': 0}
- compression.compress_graph_partition_based(graph, partition, is_weighted=True, is_gene_network=True)[source]
Compresses a network based on detected communities.
- Parameters:
graph (nx.Graph) – The original network graph.
partition (dict) – A dictionary mapping nodes to community IDs.
is_weighted (bool, optional) – Whether to consider edge weights. Defaults to True.
is_gene_network (bool, optional) – Whether the network is a gene network to perform GSEA. Defaults to True.
- Returns:
- A tuple containing:
community_graph (nx.Graph): The compressed network graph where nodes represent communities.
compression_mapping (dict): A mapping of original nodes to their corresponding community nodes.
decompression_mapping (dict): A mapping of community nodes to the original nodes they represent.
- Return type:
tuple
- Raises:
ValueError – If the partition is invalid or missing nodes.
Example
>>> import networkx as nx >>> import random >>> random.seed(123) >>> from networkx.algorithms.community import louvain_communities >>> from graphpack.compression import compress_graph_partition_based
>>> # Example 1: Compress an unweighted graph based on detected communities >>> G = nx.Graph() >>> G.add_edges_from([(1, 2), (2, 3), (3, 1), (4, 5), (5, 6), (6, 7), (7, 8), (8, 6)]) >>> communities = louvain_communities(graph) >>> partition = {node: cid for cid, community in enumerate(communities) for node in community} >>> compressed_graph, _, _ = compress_graph_partition_based(G, partition, is_weighted=False) >>> print(compressed_graph.nodes(data=True)) [(1, {'nodes': [1, 2, 3], 'label': [[1, 2, 3]], 'size': 3}), (0, {'nodes': [4, 5], 'label': [[4, 5]], 'size': 2}), (2, {'nodes': [6, 7, 8], 'label': [[6, 7, 8]], 'size': 3})] >>> print(compressed_graph.edges(data=True)) [(0, 2, {'weight': 1})]
>>> # Example 2: Compress a weighted gene network based on detected communities >>> weights = {edge: random.random() for edge in G.edges()} >>> nx.set_edge_attributes(G, weights, 'weight') >>> compressed_graph, compression_mapping, decompression_mapping = compress_graph_partition_based(G, partition) >>> print(compressed_graph.edges(data=True)) [(0, 2, {'weight': 0.9011988779516946})] >>> print(compression_mapping) {1: [1, 2, 3], 0: [4, 5], 2: [6, 7, 8]} >>> print(decompression_mapping) {1: 1, 2: 1, 3: 1, 4: 0, 5: 0, 6: 2, 7: 2, 8: 2}
- compression.compute_adjacency_matrix(graph)[source]
Computes the adjacency matrix of a graph.
- Parameters:
graph – The input graph.
- Returns:
The adjacency matrix of the input graph.
- Return type:
np.ndarray
- compression.compute_and_save_edges_to_remove(graph, community_graph, output_folder, edges_to_remove_file='edges_to_remove.txt')[source]
Computes and saves edges to remove from the original network based on the compressed network.
- Parameters:
graph (nx.Graph) – The original network graph.
community_graph (nx.Graph) – The compressed network graph where nodes represent communities.
output_folder (str) – The folder path to save the output file.
edges_to_remove_file (str) – The name of the file to save the edges to remove (default is ‘edges_to_remove.txt’).
- Returns:
None
- Raises:
ValueError – If the output folder does not exist or is not a directory.
- compression.detect_communities(graph, method='louvain', seed=123, resolution=1.25, k=3)[source]
Detects communities in a graph using various partitioning methods.
- Parameters:
graph (nx.Graph) – The graph to detect communities in.
method (str, optional) – The partitioning method to use. Options: ‘louvain’ (default), ‘greedy’, ‘label_propagation’, ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘cpm’, ‘nmf’.
seed (int, optional) – Random seed for reproducibility. Defaults to 123.
resolution (float, optional) – Resolution parameter for Louvain and greedy methods. Defaults to 1.25.
k (int, optional) – Number of communities/clusters/components for ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘nmf’ methods. For ‘cpm’ is the size of the smallest clique. Defaults to 3.
- Returns:
A dictionary mapping nodes to community IDs.
- Return type:
dict
- Raises:
ValueError – If the specified method is not supported or if the graph is not connected.
Example
>>> import networkx as nx >>> from graphpack.compression import detect_communities
>>> # Example: Louvain method >>> G = nx.Graph() >>> G.add_edges_from([(1, 2), (2, 3), (3, 1), (4, 5), (5, 6), (6, 7), (7, 8), (8, 6)]) >>> partition = detect_communities(G, method='louvain') >>> print(partition) {1: 1, 2: 1, 3: 1, 4: 0, 5: 0, 6: 2, 7: 2, 8: 2}
- compression.parse_args()[source]
Parse command-line arguments for graph compression.
Command-line arguments:
- Parameters:
--input (str) – Path to the input graph file. This is a required argument.
--output (str) – Path to the output folder where results will be saved. Default is ‘data/output’.
--output-format (str) – Output file format. Options: ‘.edgelist’, ‘.txt’, ‘.csv’, ‘.tsv’, ‘.json’, ‘.gpickle’, ‘.gml’, ‘.graphml’, ‘.net’, ‘.pajek’, ‘.gexf’, ‘.yaml’, ‘.yml’. Default is ‘txt’.
--method (str) – Community detection method to use for graph compression. Options: ‘louvain’, ‘greedy’ (default), ‘label_propagation’, ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘cpm’, ‘nmf’.
--resolution (float) – Resolution parameter for Louvain and greedy methods. Default is 1.25.
--k (int) – Number of clusters for clustering methods. Only applicable for methods requiring a cluster count (e.g., ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘nmf’). Default is 3.
--seed (int) – Random seed for reproducibility. Default is 123.
--is-weighted (bool) – Flag to indicate if the graph should consider edge weights. Use this flag if the graph is weighted.
--is-gene-network (bool) – Flag to assign biologically meaningful labels to communities. Use this flag for gene networks.
--is-lossless (bool) – Flag to perform lossless compression. Use this flag if lossless compression is required.
--plot (bool) – Flag to plot the original and compressed graphs. Default is False.
--is-interactive (bool) – Flag to produce interactive plots in HTML format. Use this flag to enable interactive plots.
--plot-disconnected (bool) – Flag to plot all nodes in a disconnected graph, not just the largest connected component.
--title (str) – Title for the graph plot. Default is ‘’.
--verbosity (int) – Verbosity level for logging information (0: minimal, 1: moderate, 2: detailed). Default is 2.
- Returns:
Parsed command-line arguments.
- Return type:
args (argparse.Namespace)
- compression.perform_compression(input, output='data/output', output_format='txt', method='louvain', resolution=1.25, k=3, seed=123, is_weighted=False, is_gene_network=False, is_lossless=False, plot=False, is_interactive=False, plot_disconnected=False, separate_communities=False, title='', verbosity=2)[source]
Suggested pipeline for the GraphPack tool.
This function performs graph compression using various community detection methods. It reads an input graph file, compresses the graph, and saves the compressed version along with optional visualizations and statistical summaries.
- Parameters:
input (str) – Path to the input graph file. This is a required argument.
output (str) – Path to the output folder where results will be saved. Default is ‘data/output’.
output_format (str) – File format to save the network files. Options: ‘edgelist’, ‘txt’ (default), ‘csv’, ‘tsv’, ‘json’, ‘gpickle’, ‘gml’, ‘graphml’, ‘net’, ‘pajek’, ‘gexf’, ‘yaml’, ‘yml’.
method (str) – Community detection method to use for graph compression. Options: ‘louvain’ (default), ‘greedy’, ‘label_propagation’, ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘cpm’, ‘nmf’.
resolution (float) – Resolution parameter for Louvain and greedy methods. Default is 1.25.
k (int) – Number of clusters for clustering methods. Only applicable for methods requiring a cluster count (e.g., ‘asyn_fluidc’, ‘spectral’, ‘hclust’, ‘node2vec’, ‘deepwalk’, ‘nmf’). Default is 3.
seed (int) – Random seed for reproducibility. Default is 123.
is_weighted (bool) – Flag to indicate if the graph should consider edge weights. Use this flag if the graph is weighted. Default is False.
is_gene_network (bool) – Flag to assign biologically meaningful labels to communities. Use this flag for gene networks. Default is False.
is_lossless (bool) – Flag to perform lossless compression. Use this flag if lossless compression is required. Default is False.
plot (bool) – Flag to plot the original and compressed graphs. Default is False.
is_interactive (bool) – Flag to produce interactive plots in HTML format. Use this flag to enable interactive plots. Default is False.
plot_disconnected (bool) – Flag to plot all nodes in a disconnected graph, not just the largest connected component. Default is False.
separate_communities (bool) – Flag to separate communities in the graph plot. Default is False.
title (str) – Title for the graph plot. Default is ‘’.
verbosity (int) – Verbosity level for logging information (0: minimal, 1: moderate, 2: detailed). Default is 2.
- Returns:
None
Examples
>>> from graphpack.compression import * >>> import networkx as nx >>> G = nx.Graph() >>> G.add_edges_from([(1, 2), (2, 3), (3, 1), (4, 5), (5, 6), (6, 7), (5, 8), (4, 8)]) >>> input_graph = 'simple_graph.txt' >>> save_graph(G, input_graph) >>> perform_compression(input_graph, output='results', method='greedy', plot=True, is_interactive=True, plot_disconnected=True, separate_communities=True, title='Greedy')
- compression.reconstruct_original_network(output_folder, compressed_network_file='compressed_network.txt', compression_mapping_file='compression_mapping.msgpack', edges_to_remove_file='edges_to_remove.txt', output_file='reconstructed_network.txt')[source]
Reconstructs the original network from compressed data and removes specified edges.
- Parameters:
output_folder (str) – The folder path to read input files from and save the output file.
compressed_network_file (str) – The file name of the compressed network (default is ‘compressed_network.txt’).
compression_mapping_file (str) – The file name of the compression mapping (default is ‘compression_mapping.msgpack’).
edges_to_remove_file (str) – The file name of the edges to remove (default is ‘edges_to_remove.txt’).
output_file (str) – The name of the file to save the reconstructed network (default is ‘reconstructed_network.txt’).
- Returns:
None
- Raises:
ValueError – If any of the required files are missing in the specified folder.
- compression.remove_unnecessary_files(output_folder_method)[source]
Removes unnecessary files from the output folder.
- Parameters:
output_folder_method (str) – The folder path containing the output files.
- Returns:
None
- compression.save_network_files(graph, community_graph, compression_mapping, decompression_mapping, output_folder, compression_mapping_filename='compression_mapping', decompression_mapping_filename='decompression_mapping', labels_mapping=None, save_data=False, file_format='txt')[source]
Saves network files and mappings to the specified output folder.
- Parameters:
graph (nx.Graph) – The original network graph.
community_graph (nx.Graph) – The compressed network graph where nodes represent communities.
compression_mapping (dict) – A mapping of original nodes to their corresponding community nodes.
decompression_mapping (dict) – A mapping of community nodes to the original nodes they represent.
output_folder (str) – The folder path to save the output files.
compression_mapping_filename (str) – The base name for the compression mapping files (default is ‘compression_mapping’).
decompression_mapping_filename (str) – The base name for the decompression mapping files (default is ‘decompression_mapping’).
labels_mapping (dict, optional) – A mapping of node labels (default is None).
save_data (bool, optional) – Whether to save edge data (default is False).
file_format (str, optional) – The file format to save the network files (default is ‘txt’).
- Returns:
None
- Raises:
ValueError – If the output folder does not exist or cannot be created.
Utilities module
- class utils.CustomArgumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True, exit_on_error=True)[source]
Bases:
ArgumentParser
Custom ArgumentParser class to modify the help message and usage format.
- Parameters:
argparse.ArgumentParser – The ArgumentParser class to inherit from.
- Returns:
A custom ArgumentParser class with modified help message and usage format.
- Return type:
- __init__(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True, exit_on_error=True)
- _add_action(action)
- _add_container_actions(container)
- _check_conflict(action)
- _check_value(action, value)
- _get_args()
- _get_formatter()
- _get_handler()
- _get_kwargs()
- _get_nargs_pattern(action)
- _get_option_tuples(option_string)
- _get_optional_actions()
- _get_optional_kwargs(*args, **kwargs)
- _get_positional_actions()
- _get_positional_kwargs(dest, **kwargs)
- _get_value(action, arg_string)
- _get_values(action, arg_strings)
- _handle_conflict_error(action, conflicting_actions)
- _handle_conflict_resolve(action, conflicting_actions)
- _match_argument(action, arg_strings_pattern)
- _match_arguments_partial(actions, arg_strings_pattern)
- _parse_known_args(arg_strings, namespace)
- _parse_optional(arg_string)
- _pop_action_class(kwargs, default=None)
- _print_message(message, file=None)
- _read_args_from_files(arg_strings)
- _registry_get(registry_name, value, default=None)
- _remove_action(action)
- add_argument(dest, ..., name=value, ...)
- add_argument(option_string, option_string, ..., name=value, ...) None
- add_argument_group(*args, **kwargs)
- add_mutually_exclusive_group(**kwargs)
- add_subparsers(**kwargs)
- convert_arg_line_to_args(arg_line)
- error(message: string)
Prints a usage message incorporating the message to stderr and exits.
If you override this in a subclass, it should not return – it should either exit or raise an exception.
- exit(status=0, message=None)
- format_usage()
- get_default(dest)
- parse_args(args=None, namespace=None)
- parse_intermixed_args(args=None, namespace=None)
- parse_known_args(args=None, namespace=None)
- parse_known_intermixed_args(args=None, namespace=None)
- print_help(file=None)
- print_usage(file=None)
- register(registry_name, value, object)
- set_defaults(**kwargs)
- utils.assign_community_colors(graph, compressed_graph, decompression_mapping, labels=None)[source]
Assigns colors to nodes in the original graph based on community labels of the compressed graph.
- Parameters:
graph (nx.Graph) – The original graph.
compressed_graph (nx.Graph) – The compressed graph.
decompression_mapping (dict) – Mapping from nodes in compressed graph to nodes in original graph.
labels (dict) – Community labels for nodes in the compressed graph. Defaults to None.
- Returns:
List of colors assigned to communities in the compressed graph. node_colors (list): List of colors assigned to nodes in the original graph. color_map (dict): Mapping from community labels to colors.
- Return type:
partition_colors (list)
- utils.cluster_graph_embeddings(graph, model, k, seed=123)[source]
Generate embeddings for the graph nodes and perform spectral clustering.
- Parameters:
graph (nx.Graph) – The input graph.
model (gensim.models.Word2Vec) – The trained embedding model.
k (int) – The number of clusters.
seed (int, optional) – The random seed for reproducibility. Default to 123.
- Returns:
A dictionary where keys are node IDs and values are cluster labels.
- Return type:
dict
Examples
>>> import networkx as nx >>> from gensim.models import Word2Vec >>> from sklearn.cluster import SpectralClustering >>> import numpy as np >>> from graphpack.utils import cluster_graph_embeddings
>>> # Example 1: Simple graph clustering >>> G = nx.karate_club_graph() >>> model = Word2Vec(sentences=[[str(node) for node in G.neighbors(n)] for n in G.nodes()], vector_size=16, window=5, min_count=1, sg=1) >>> partition = cluster_graph_embeddings(G, model, k=2) >>> print(partition) {0: 1, 1: 0, 2: 1, 3: 0, 4: 1, 5: 1, 6: 1, 7: 0, 8: 0, 9: 0, 10: 1, 11: 0, 12: 1, 13: 1, 14: 0, 15: 0, 16: 0, 17: 0, 18: 1, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 1, 26: 1, 27: 0, 28: 1, 29: 1, 30: 1, 31: 0, 32: 1, 33: 0}
>>> # Example 2: Clustering with different number of clusters >>> partition = cluster_graph_embeddings(G, model, k=4) >>> print(partition) {0: 0, 1: 1, 2: 0, 3: 1, 4: 2, 5: 0, 6: 2, 7: 3, 8: 2, 9: 2, 10: 0, 11: 1, 12: 0, 13: 0, 14: 3, 15: 3, 16: 1, 17: 1, 18: 0, 19: 3, 20: 3, 21: 1, 22: 3, 23: 1, 24: 2, 25: 2, 26: 3, 27: 3, 28: 2, 29: 0, 30: 0, 31: 3, 32: 2, 33: 1}
- utils.deepwalk_embedding(graph, num_walks, walk_length)[source]
Generates node embeddings using the DeepWalk algorithm.
This function performs random walks on the input graph and learns node embeddings using the Word2Vec algorithm.
- Parameters:
graph (nx.Graph) – The input graph.
num_walks (int) – Number of random walks to perform per node.
walk_length (int) – Length of each random walk.
- Returns:
A Word2Vec model trained on the generated random walks.
- Return type:
gensim.models.word2vec.Word2Vec
- utils.draw_graph(graph, labels=None, edge_thickness=None, node_color=None, color_map=None, title=None, file_path=None, is_interactive=False, plot_disconnected_components=False, separate_communities=False, **kwargs)[source]
Draws a graph using NetworkX and Matplotlib and saves it to an image file.
- Parameters:
graph (nx.Graph) – The graph to be drawn.
labels (dict, optional) – Node labels. Defaults to None.
edge_thickness (list, optional) – List of edge thicknesses. Defaults to None.
node_color (list, optional) – Node colors. Defaults to None.
color_map (dict, optional) – Color mapping for legend. Defaults to None.
title (str, optional) – Title of the graph plot. Defaults to None.
file_path (str, optional) – File path to save the plot as an image. Defaults to None.
is_interactive (bool, optional) – Whether to draw an interactive plot using config. Defaults to False.
plot_disconnected_components (bool, optional) – How to handle disconnected graphs. - True: Draw the entire graph including disconnected nodes. - False: Draw only the largest connected component.
separate_communities (bool, optional) – Whether to separate communities in the layout. Defaults to False.
**kwargs – Additional keyword arguments to be passed to plt.figure.
- Returns:
None
Examples
>>> import networkx as nx >>> import matplotlib.pyplot as plt >>> from graphpack.utils import draw_graph, draw_interactive_graph_pyvis
>>> # Example 1: Simple graph with default settings >>> G = nx.Graph() >>> G.add_edges_from([(1, 2), (2, 3), (3, 1)]) >>> draw_graph(G, title="Simple Graph", file_path="simple_graph")
>>> # Example 2: Graph with specified node colors and edge thickness >>> node_colors = ['blue', 'green', 'red'] >>> edge_thickness = [2.0, 1.5, 2.5] >>> draw_graph(G, node_color=node_colors, edge_thickness=edge_thickness, title="Graph with Custom Colors and Thickness", file_path="simple_graph")
>>> # Example 3: Interactive graph using config >>> draw_graph(G, is_interactive=True, title="Interactive Graph", file_path="simple_graph")
>>> # Example 4: Graph with communities and color mapping >>> communities = {1: 'blue', 2: 'blue', 3: 'red'} >>> draw_graph(G, node_color=[communities[node] for node in G.nodes()], color_map={'Blue Nodes': 'blue', 'Red Nodes': 'red'}, title="Graph with Communities", file_path="simple_graph")
>>> # Example 5: Handling disconnected components >>> G.add_edges_from([(4, 5), (5, 6)]) >>> draw_graph(G, plot_disconnected_components=True, title="Graph with Disconnected Components", file_path="simple_graph")
>>> # Final example >>> G = nx.Graph() >>> G.add_edges_from([(1, 2), (2, 3), (3, 1), (4, 5), (5, 6), (6, 7), (7, 8), (8, 6)]) >>> communities = {1: 'blue', 2: 'blue', 3: 'yellow', 4: 'green', 5: 'green', 6: 'red', 7: 'red', 8: 'red'} >>> edge_thickness = [2.0, 1.5, 2.5, 1.0, 2.0, 1.5, 2.5, 5.0] >>> color_map = {'Blue Nodes': 'blue', 'Red Nodes': 'red', 'Green Nodes': 'green', 'Yellow Nodes': 'yellow'} >>> draw_graph(G, labels=None, edge_thickness=edge_thickness, node_color=[communities[node] for node in G.nodes()], color_map=color_map, title="Example Graph with All Options Set", file_path="simple_graph", is_interactive=True, plot_disconnected_components=True, separate_communities=True)
- utils.draw_interactive_graph_pyvis(graph, labels=None, title=None, file_path=None, node_color=None, node_sizes=None, color_map=None)[source]
Draws an interactive graph using config and saves it to an HTML file.
- Parameters:
graph (nx.Graph) – The graph to be drawn.
labels (dict, optional) – Node labels. Defaults to None.
title (str, optional) – Title of the graph plot. Defaults to None.
file_path (str, optional) – File path to save the plot as an HTML file. If None, the plot is saved in the current directory with a default name. Defaults to None.
node_color (list, optional) – Node colors. Defaults to None.
node_sizes (list, optional) – Node sizes. Defaults to None.
color_map (dict, optional) – Color mapping for legend. Defaults to None.
- Returns:
None
Examples
>>> import networkx as nx >>> from graphpack.utils import draw_interactive_graph_pyvis
>>> # Example 1: Draw a basic graph >>> G = nx.Graph() >>> G.add_edge('A', 'B') >>> G.add_edge('B', 'C') >>> draw_interactive_graph_pyvis(G, title='Basic Graph', file_path='simple_graph')
>>> # Example 2: Draw a graph with custom node labels >>> labels = {'A': 'Node A', 'B': 'Node B', 'C': 'Node C'} >>> draw_interactive_graph_pyvis(G, labels=labels, title='Graph with Labels', file_path='simple_graph')
>>> # Example 3: Draw a graph with custom node colors >>> node_colors = ['red', 'green', 'blue'] >>> draw_interactive_graph_pyvis(G, node_color=node_colors, title='Graph with Colors', file_path='simple_graph')
>>> # Example 4: Draw a graph with custom node sizes >>> node_sizes = [10, 20, 30] >>> draw_interactive_graph_pyvis(G, node_sizes=node_sizes, title='Graph with Sizes', file_path='simple_graph')
- utils.generate_figure_filepath(title=None, file_path=None, fig_ext='png')[source]
Generate a file path for a figure based on title and file_path parameters.
- Parameters:
title (str or None) – Title of the figure. If None, a default name ‘fig’ will be used.
file_path (str or None) – Base directory where the figure file should be saved. If None, current directory will be used.
fig_ext (str) – Extension of the figure file (e.g., ‘png’, ‘jpg’).
- Returns:
Generated file path including the extension.
- Return type:
file_path_ext (str)
- utils.generate_random_walks(graph, num_walks, walk_length)[source]
Generates random walks from each node in the graph.
- Parameters:
graph (nx.Graph) – The input graph.
num_walks (int) – Number of random walks to perform per node.
walk_length (int) – Length of each random walk.
- Returns:
A list of random walks.
- Return type:
list of list of str
- utils.perform_gsea(gene_list, organism='human', gene_sets='KEGG_2019_Human', k=5)[source]
Performs Gene Set Enrichment Analysis (GSEA) on a list of nodes.
- Parameters:
gene_list (list) – A list of genes to perform GSEA on.
gene_sets (str) – Name of the gene sets to use for enrichment analysis. Default is ‘KEGG_2019_Human’.
k (int) – Number of top enriched terms to return. Default is 5.
- Returns:
A list of labels representing the top k enriched gene sets.
- Return type:
list
- Raises:
ValueError – If the gene list is empty.
Examples
>>> from graphpack.utils import perform_gsea
>>> # Example 1: Basic GSEA with default settings >>> gene_list = ['TP53', 'BRCA1', 'EGFR', 'MYC', 'MTOR'] >>> perform_gsea(gene_list) ['Breast cancer', 'Central carbon metabolism in cancer', 'MicroRNAs in cancer', 'Colorectal cancer', 'PI3K-Akt signaling pathway']
>>> # Example 2: GSEA with a different organism and gene set >>> perform_gsea(gene_list, organism='mouse', gene_sets='KEGG_2019_Mouse') ['Breast cancer', 'MicroRNAs in cancer', 'Central carbon metabolism in cancer', 'PI3K-Akt signaling pathway', 'ErbB signaling pathway']
>>> # Example 3: GSEA with a custom number of top enriched terms >>> perform_gsea(gene_list, k=3) ['Breast cancer', 'Central carbon metabolism in cancer', 'MicroRNAs in cancer']
>>> # Example 4: Handling gene list with fewer than 4 genes >>> perform_gsea(['TP53']) ['TP53'] >>> perform_gsea(['TP53', 'BRCA1']) [['TP53', 'BRCA1']]
- utils.random_walk(graph, start_node, walk_length)[source]
Performs a random walk starting from the given node.
- Parameters:
graph (nx.Graph) – The input graph.
start_node (str or int) – The starting node for the random walk.
walk_length (int) – Length of the random walk.
- Returns:
A random walk.
- Return type:
list of str or int
- utils.read_graph(file_path)[source]
Reads a graph from a file. Supports edgelist, JSON, and various other formats.
- Parameters:
file_path (str) – The path to the file.
- Returns:
A NetworkX graph created from the file data.
- Return type:
nx.Graph
- Raises:
IOError – If the file cannot be read.
ValueError – If the file extension is not supported.
json.JSONDecodeError – If the JSON file is not valid.
KeyError – If the JSON data does not contain the expected ‘edges’ key.
Examples
>>> import networkx as nx >>> from graphpack.utils import read_graph
>>> # Example 1: Reading an edgelist file >>> G = read_graph('path/to/edgelist.txt') >>> isinstance(G, nx.Graph) True >>> len(G.nodes) > 0 # Ensure the graph has nodes True
>>> # Example 2: Reading a JSON file >>> G = read_graph('path/to/graph.json') >>> isinstance(G, nx.Graph) True >>> len(G.edges) > 0 # Ensure the graph has edges True
>>> # Example 3: Reading a GML file >>> G = read_graph('path/to/graph.gml') >>> isinstance(G, nx.Graph) True >>> len(G.nodes) > 0 # Ensure the graph has nodes True
- utils.read_graph_from_edge_list(file_path, file_extension)[source]
Reads a graph from an edgelist, CSV, or TSV file.
- Parameters:
file_path (str) – The path to the file.
file_extension (str) – The extension of the file.
- Returns:
A NetworkX graph created from the file data.
- Return type:
nx.Graph
- Raises:
IOError – If the file cannot be read.
Examples
>>> import networkx as nx >>> from graphpack.utils import read_graph_from_edge_list
>>> # Example 1: Reading an edgelist file >>> # Contents of 'graph.edgelist': >>> # A B >>> # B C >>> G = read_graph_from_edge_list('path/to/graph.edgelist', '.edgelist') >>> isinstance(G, nx.Graph) True >>> len(G.nodes) == 3 # Ensure the graph has 3 nodes True
>>> # Example 2: Reading a CSV file, unweighted graph >>> # Contents of 'graph.csv': >>> # source,target >>> # A,B >>> # B,C >>> G = read_graph_from_edge_list('path/to/graph.csv', '.csv') >>> isinstance(G, nx.Graph) True >>> len(G.edges) == 2 # Ensure the graph has 2 edges True
>>> # Example 3: Reading a TSV file, weighted graph >>> # Contents of 'graph.tsv': >>> # source target weight >>> # A B 1.0 >>> # B C 2.0 >>> G = read_graph_from_edge_list('path/to/graph.tsv', '.tsv') >>> isinstance(G, nx.Graph) True >>> len(G.edges) == 2 # Ensure the graph has 2 edges True >>> G['A']['B']['weight'] == 1.0 # Ensure the graph has edge weights True
- utils.read_graph_from_json(file_path)[source]
Reads a graph from a JSON file.
- Parameters:
file_path (str) – The path to the JSON file.
- Returns:
A NetworkX graph created from the JSON data.
- Return type:
nx.Graph
- Raises:
IOError – If the file cannot be read.
json.JSONDecodeError – If the file is not valid JSON.
KeyError – If the JSON data does not contain the expected ‘edges’ key.
Examples
>>> import networkx as nx >>> from graphpack.utils import read_graph_from_json
>>> # Example 1: Reading a JSON file with weighted edges >>> # Contents of 'graph.json': >>> # { >>> # "nodes": ["A", "B", "C"], >>> # "edges": [ >>> # {"source": "A", "target": "B", "weight": 1.0}, >>> # {"source": "B", "target": "C", "weight": 2.0} >>> # ] >>> # } >>> G = read_graph_from_json('path/to/graph.json') >>> isinstance(G, nx.Graph) True >>> len(G.nodes) == 3 # Ensure the graph has nodes True >>> len(G.edges) == 2 # Ensure the graph has edges True >>> G['A']['B']['weight'] == 1.0 # Ensure the graph has weights True
- utils.save_graph(graph, file_path, save_data=False)[source]
Saves a NetworkX graph to a file in the format specified by the file extension.
- Parameters:
graph (nx.Graph) – The NetworkX graph to save.
file_path (str) – The path to the file to save the graph to.
save_data (bool) – Whether to save edge data (e.g., weights) to the file. Defaults to False.
- Raises:
ValueError – If the file extension is not supported.
Examples
>>> import networkx as nx >>> from graphpack.utils import save_graph
>>> # Example 1: Save a graph to a JSON file >>> G = nx.Graph() >>> G.add_edge('A', 'B', weight=1.0) >>> G.add_edge('B', 'C', weight=2.0) >>> save_graph(G, 'path/to/graph.json', save_data=True)
>>> # Example 2: Save a graph to an edgelist file >>> save_graph(G, 'path/to/graph.edgelist')
>>> # Example 3: Save a graph to a CSV file >>> save_graph(G, 'path/to/graph.csv')