spacr.utils

Module Contents

spacr.utils.filepaths_to_database(img_paths, settings, source_folder, crop_mode)[source]
spacr.utils.activation_maps_to_database(img_paths, source_folder, settings)[source]
spacr.utils.activation_correlations_to_database(df, img_paths, source_folder, settings)[source]
spacr.utils.calculate_activation_correlations(inputs, activation_maps, file_names, manders_thresholds=[15, 50, 75])[source]

Calculates Pearson and Manders correlations between input image channels and activation map channels.

Parameters:
  • inputs – A batch of input images, Tensor of shape (batch_size, channels, height, width)

  • activation_maps – A batch of activation maps, Tensor of shape (batch_size, channels, height, width)

  • file_names – List of file names corresponding to each image in the batch.

  • manders_thresholds – List of intensity percentiles to calculate Manders correlation.

Returns:

A DataFrame with columns for pairwise correlations (Pearson and Manders)

between input channels and activation map channels.

Return type:

df_correlations

spacr.utils.load_settings(csv_file_path, show=False, setting_key='setting_key', setting_value='setting_value')[source]

Convert a CSV file with ‘settings_key’ and ‘settings_value’ columns into a dictionary. Handles special cases where values are lists, tuples, booleans, None, integers, floats, and nested dictionaries.

Parameters:
  • csv_file_path (str) – The path to the CSV file.

  • show (bool) – Whether to display the dataframe (for debugging).

  • setting_key (str) – The name of the column that contains the setting keys.

  • setting_value (str) – The name of the column that contains the setting values.

Returns:

A dictionary where ‘settings_key’ are the keys and ‘settings_value’ are the values.

Return type:

dict

spacr.utils.save_settings(settings, name='settings', show=False)[source]
spacr.utils.print_progress(files_processed, files_to_process, n_jobs, time_ls=None, batch_size=None, operation_type='')[source]
spacr.utils.reset_mp()[source]
spacr.utils.is_multiprocessing_process(process)[source]

Check if the process is a multiprocessing process.

spacr.utils.close_file_descriptors()[source]

Close file descriptors and shared memory objects.

spacr.utils.close_multiprocessing_processes()[source]

Close all multiprocessing processes.

spacr.utils.check_mask_folder(src, mask_fldr)[source]
spacr.utils.smooth_hull_lines(cluster_data)[source]
spacr.utils.mask_object_count(mask)[source]

Counts the number of objects in a given mask.

Parameters: - mask: numpy.ndarray. The mask containing object labels.

Returns: - int. The number of objects in the mask.

spacr.utils.is_list_of_lists(var)[source]
spacr.utils.normalize_to_dtype(array, p1=2, p2=98, percentile_list=None, new_dtype=None)[source]

Normalize each image in the stack to its own percentiles.

Parameters: - array: numpy array The input stack to be normalized. - p1: int, optional The lower percentile value for normalization. Default is 2. - p2: int, optional The upper percentile value for normalization. Default is 98. - percentile_list: list, optional A list of pre-calculated percentiles for each image in the stack. Default is None.

Returns: - new_stack: numpy array The normalized stack with the same shape as the input stack.

spacr.utils.annotate_conditions(df, cells=None, cell_loc=None, pathogens=None, pathogen_loc=None, treatments=None, treatment_loc=None)[source]

Annotates conditions in a DataFrame based on specified criteria and combines them into a ‘condition’ column. NaN is used for missing values, and they are excluded from the ‘condition’ column.

Parameters:
  • df (pandas.DataFrame) – The DataFrame to annotate.

  • cells (list/str, optional) – Host cell types. Defaults to None.

  • cell_loc (list of lists, optional) – Values for each host cell type. Defaults to None.

  • pathogens (list/str, optional) – Pathogens. Defaults to None.

  • pathogen_loc (list of lists, optional) – Values for each pathogen. Defaults to None.

  • treatments (list/str, optional) – Treatments. Defaults to None.

  • treatment_loc (list of lists, optional) – Values for each treatment. Defaults to None.

Returns:

Annotated DataFrame with a combined ‘condition’ column.

Return type:

pandas.DataFrame

class spacr.utils.Cache(max_size)[source]

A class representing a cache with a maximum size.

max_size[source]

The maximum size of the cache.

Type:

int

cache[source]

The cache data structure.

Type:

OrderedDict

cache[source]
max_size[source]
get(key)[source]
put(key, value)[source]
class spacr.utils.ScaledDotProductAttention(d_k)[source]

Bases: torch.nn.Module

Scaled Dot-Product Attention module.

Parameters:

d_k (int) – The dimension of the key and query vectors.

d_k[source]

The dimension of the key and query vectors.

Type:

int

forward(Q, K, V)[source]

Performs the forward pass of the attention mechanism.

d_k[source]
forward(Q, K, V)[source]

Performs the forward pass of the attention mechanism.

Parameters:
  • Q (torch.Tensor) – The query tensor of shape (batch_size, seq_len_q, d_k).

  • K (torch.Tensor) – The key tensor of shape (batch_size, seq_len_k, d_k).

  • V (torch.Tensor) – The value tensor of shape (batch_size, seq_len_v, d_k).

Returns:

The output tensor of shape (batch_size, seq_len_q, d_k).

Return type:

torch.Tensor

class spacr.utils.SelfAttention(in_channels, d_k)[source]

Bases: torch.nn.Module

Self-Attention module that applies scaled dot-product attention mechanism.

Parameters:
  • in_channels (int) – Number of input channels.

  • d_k (int) – Dimensionality of the key and query vectors.

W_q[source]
W_k[source]
W_v[source]
attention[source]
forward(x)[source]

Forward pass of the SelfAttention module.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, in_channels).

Returns:

Output tensor of shape (batch_size, d_k).

Return type:

torch.Tensor

class spacr.utils.ScaledDotProductAttention(d_k)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

d_k[source]
forward(Q, K, V)[source]

Performs the forward pass of the ScaledDotProductAttention module.

Parameters:
  • Q (torch.Tensor) – The query tensor.

  • K (torch.Tensor) – The key tensor.

  • V (torch.Tensor) – The value tensor.

Returns:

The output tensor.

Return type:

torch.Tensor

class spacr.utils.SelfAttention(in_channels, d_k)[source]

Bases: torch.nn.Module

Self-Attention module that applies scaled dot-product attention mechanism.

Parameters:
  • in_channels (int) – Number of input channels.

  • d_k (int) – Dimensionality of the key and query vectors.

W_q[source]
W_k[source]
W_v[source]
attention[source]
forward(x)[source]

Forward pass of the SelfAttention module.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, in_channels).

Returns:

Output tensor after applying self-attention mechanism.

Return type:

torch.Tensor

class spacr.utils.EarlyFusion(in_channels)[source]

Bases: torch.nn.Module

Early Fusion module for image classification.

Parameters:

in_channels (int) – Number of input channels.

conv1[source]
forward(x)[source]

Forward pass of the Early Fusion module.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, height, width).

Returns:

Output tensor of shape (batch_size, 64, height, width).

Return type:

torch.Tensor

class spacr.utils.SpatialAttention(kernel_size=7)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

conv1[source]
sigmoid[source]
forward(x)[source]

Performs forward pass of the SpatialAttention module.

Parameters:

x (torch.Tensor) – The input tensor.

Returns:

The output tensor after applying spatial attention.

Return type:

torch.Tensor

class spacr.utils.MultiScaleBlockWithAttention(in_channels, out_channels)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

dilated_conv1[source]
spatial_attention[source]
custom_forward(x)[source]
forward(x)[source]
class spacr.utils.CustomCellClassifier(num_classes, pathogen_channel, use_attention, use_checkpoint, dropout_rate)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

early_fusion[source]
multi_scale_block_1[source]
fc1[source]
use_checkpoint[source]
custom_forward(x)[source]
forward(x)[source]
class spacr.utils.TorchModel(model_name='resnet50', pretrained=True, dropout_rate=None, use_checkpoint=False)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

model_name = 'resnet50'[source]
use_checkpoint = False[source]
base_model[source]
num_ftrs[source]
apply_dropout_rate(model, dropout_rate)[source]

Apply dropout rate to all dropout layers in the model.

init_base_model(pretrained)[source]

Initialize the base model from torchvision.models.

get_weight_choice()[source]

Get weight choice if it exists for the model.

get_num_ftrs()[source]

Determine the number of features output by the base model.

init_spacr_classifier(dropout_rate)[source]

Initialize the SPACR classifier.

forward(x)[source]

Define the forward pass of the model.

class spacr.utils.FocalLossWithLogits(alpha=1, gamma=2)[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

alpha = 1[source]
gamma = 2[source]
forward(logits, target)[source]
class spacr.utils.ResNet(resnet_type='resnet50', dropout_rate=None, use_checkpoint=False, init_weights='imagenet')[source]

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

initialize_base(base_model_dict, dropout_rate, use_checkpoint, init_weights)[source]
forward(x)[source]
spacr.utils.split_my_dataset(dataset, split_ratio=0.1)[source]

Splits a dataset into training and validation subsets.

Parameters:
  • dataset (torch.utils.data.Dataset) – The dataset to be split.

  • split_ratio (float, optional) – The ratio of validation samples to total samples. Defaults to 0.1.

Returns:

A tuple containing the training dataset and validation dataset.

Return type:

tuple

spacr.utils.classification_metrics(all_labels, prediction_pos_probs, loss, epoch)[source]

Calculate classification metrics for binary classification.

Parameters: - all_labels (list): List of true labels. - prediction_pos_probs (list): List of predicted positive probabilities. - loader_name (str): Name of the data loader. - loss (float): Loss value. - epoch (int): Epoch number.

Returns: - data_df (DataFrame): DataFrame containing the calculated metrics.

spacr.utils.compute_irm_penalty(losses, dummy_w, device)[source]

Computes the Invariant Risk Minimization (IRM) penalty.

Parameters:
  • losses (list) – A list of losses.

  • dummy_w (torch.Tensor) – A dummy weight tensor.

  • device (torch.device) – The device to perform computations on.

Returns:

The computed IRM penalty.

Return type:

float

spacr.utils.choose_model(model_type, device, init_weights=True, dropout_rate=0, use_checkpoint=False, channels=3, height=224, width=224, chan_dict=None, num_classes=2, verbose=False)[source]

Choose a model for classification.

Parameters:
  • model_type (str) – The type of model to choose. Can be one of the pre-defined TorchVision models or ‘custom’ for a custom model.

  • device (str) – The device to use for model inference.

  • init_weights (bool, optional) – Whether to initialize the model with pre-trained weights. Defaults to True.

  • dropout_rate (float, optional) – The dropout rate to use in the model. Defaults to 0.

  • use_checkpoint (bool, optional) – Whether to use checkpointing during model training. Defaults to False.

  • channels (int, optional) – The number of input channels for the model. Defaults to 3.

  • height (int, optional) – The height of the input images for the model. Defaults to 224.

  • width (int, optional) – The width of the input images for the model. Defaults to 224.

  • chan_dict (dict, optional) – A dictionary containing channel information for custom models. Defaults to None.

  • num_classes (int, optional) – The number of output classes for the model. Defaults to 2.

Returns:

The chosen model.

Return type:

torch.nn.Module

spacr.utils.calculate_loss(output, target, loss_type='binary_cross_entropy_with_logits')[source]
spacr.utils.pick_best_model(src)[source]
spacr.utils.get_paths_from_db(df, png_df, image_type='cell_png')[source]
spacr.utils.save_file_lists(dst, data_set, ls)[source]
spacr.utils.augment_single_image(args)[source]
spacr.utils.augment_images(file_paths, dst)[source]
spacr.utils.augment_classes(dst, nc, pc, generate=True, move=True)[source]
spacr.utils.annotate_predictions(csv_loc)[source]
spacr.utils.initiate_counter(counter_, lock_)[source]
spacr.utils.add_images_to_tar(paths_chunk, tar_path, total_images)[source]
spacr.utils.generate_fraction_map(df, gene_column, min_frequency=0.0)[source]
spacr.utils.fishers_odds(df, threshold=0.5, phenotyp_col='mean_pred')[source]
spacr.utils.model_metrics(model)[source]
spacr.utils.check_multicollinearity(x)[source]

Checks multicollinearity of the predictors by computing the VIF.

spacr.utils.lasso_reg(merged_df, alpha_value=0.01, reg_type='lasso')[source]
spacr.utils.MLR(merged_df, refine_model)[source]
spacr.utils.get_files_from_dir(dir_path, file_extension='*')[source]
spacr.utils.create_circular_mask(h, w, center=None, radius=None)[source]
spacr.utils.apply_mask(image, output_value=0)[source]
spacr.utils.invert_image(image)[source]
spacr.utils.resize_images_and_labels(images, labels, target_height, target_width, show_example=True)[source]
spacr.utils.resize_labels_back(labels, orig_dims)[source]
spacr.utils.calculate_iou(mask1, mask2)[source]
spacr.utils.match_masks(true_masks, pred_masks, iou_threshold)[source]
spacr.utils.compute_average_precision(matches, num_true_masks, num_pred_masks)[source]
spacr.utils.pad_to_same_shape(mask1, mask2)[source]
spacr.utils.compute_ap_over_iou_thresholds(true_masks, pred_masks, iou_thresholds)[source]
spacr.utils.compute_segmentation_ap(true_masks, pred_masks, iou_thresholds=np.linspace(0.5, 0.95, 10))[source]
spacr.utils.jaccard_index(mask1, mask2)[source]
spacr.utils.dice_coefficient(mask1, mask2)[source]
spacr.utils.extract_boundaries(mask, dilation_radius=1)[source]
spacr.utils.boundary_f1_score(mask_true, mask_pred, dilation_radius=1)[source]
spacr.utils.merge_touching_objects(mask, threshold=0.25)[source]

Merges touching objects in a binary mask based on the percentage of their shared boundary.

Parameters:
  • mask (ndarray) – Binary mask representing objects.

  • threshold (float, optional) – Threshold value for merging objects. Defaults to 0.25.

Returns:

Merged mask.

Return type:

ndarray

spacr.utils.remove_intensity_objects(image, mask, intensity_threshold, mode)[source]

Removes objects from the mask based on their mean intensity in the original image.

Parameters:
  • image (ndarray) – The original image.

  • mask (ndarray) – The mask containing labeled objects.

  • intensity_threshold (float) – The threshold value for mean intensity.

  • mode (str) – The mode for intensity comparison. Can be ‘low’ or ‘high’.

Returns:

The updated mask with objects removed.

Return type:

ndarray

class spacr.utils.SelectChannels(channels)[source]
channels[source]
spacr.utils.preprocess_image(image_path, image_size=224, channels=[1, 2, 3], normalize=True)[source]
class spacr.utils.SaliencyMapGenerator(model)[source]
model[source]
compute_saliency_maps(X, y)[source]
compute_saliency_and_predictions(X)[source]
plot_activation_grid(X, saliency, predictions, overlay=True, normalize=False)[source]
percentile_normalize(img, lower_percentile=2, upper_percentile=98)[source]
class spacr.utils.GradCAMGenerator(model, target_layer, cam_type='gradcam')[source]
model[source]
target_layer[source]
cam_type = 'gradcam'[source]
gradients = None[source]
activations = None[source]
target_layer_module[source]
hook_layers()[source]
get_layer(model, target_layer)[source]
compute_gradcam_maps(X, y)[source]
compute_gradcam_and_predictions(X)[source]
plot_activation_grid(X, gradcam, predictions, overlay=True, normalize=False)[source]
percentile_normalize(img, lower_percentile=2, upper_percentile=98)[source]
spacr.utils.preprocess_image(image_path, normalize=True, image_size=224, channels=[1, 2, 3])[source]
spacr.utils.class_visualization(target_y, model_path, dtype, img_size=224, channels=[0, 1, 2], l2_reg=0.001, learning_rate=25, num_iterations=100, blur_every=10, max_jitter=16, show_every=25, class_names=['nc', 'pc'])[source]
spacr.utils.get_submodules(model, prefix='')[source]
class spacr.utils.GradCAM(model, target_layers=None, use_cuda=True)[source]
model[source]
target_layers = None[source]
cuda = True[source]
forward(input)[source]
spacr.utils.show_cam_on_image(img, mask)[source]
spacr.utils.recommend_target_layers(model)[source]
class spacr.utils.IntegratedGradients(model)[source]
model[source]
generate_integrated_gradients(input_tensor, target_label_idx, baseline=None, num_steps=50)[source]
spacr.utils.get_db_paths(src)[source]
spacr.utils.get_sequencing_paths(src)[source]
spacr.utils.load_image_paths(c, visualize)[source]
spacr.utils.merge_dataframes(df, image_paths_df, verbose)[source]
spacr.utils.remove_highly_correlated_columns(df, threshold)[source]
spacr.utils.filter_columns(df, filter_by)[source]
spacr.utils.reduction_and_clustering(numeric_data, n_neighbors, min_dist, metric, eps, min_samples, clustering, reduction_method='umap', verbose=False, embedding=None, n_jobs=-1, mode='fit', model=False)[source]

Perform dimensionality reduction and clustering on the given data.

Parameters: numeric_data (np.ndarray): Numeric data for embedding and clustering. n_neighbors (int or float): Number of neighbors for UMAP or perplexity for t-SNE. min_dist (float): Minimum distance for UMAP. metric (str): Metric for UMAP and DBSCAN. eps (float): Epsilon for DBSCAN. min_samples (int): Minimum samples for DBSCAN or number of clusters for KMeans. clustering (str): Clustering method (‘DBSCAN’ or ‘KMeans’). reduction_method (str): Dimensionality reduction method (‘UMAP’ or ‘tSNE’). verbose (bool): Whether to print verbose output. embedding (np.ndarray, optional): Precomputed embedding. Default is None. return_model (bool): Whether to return the reducer model. Default is False.

Returns: tuple: embedding, labels (and optionally the reducer model)

spacr.utils.remove_noise(embedding, labels)[source]
spacr.utils.plot_embedding(embedding, image_paths, labels, image_nr, img_zoom, colors, plot_by_cluster, plot_outlines, plot_points, plot_images, smooth_lines, black_background, figuresize, dot_size, remove_image_canvas, verbose)[source]
spacr.utils.generate_colors(num_clusters, black_background)[source]
spacr.utils.assign_colors(unique_labels, random_colors)[source]
spacr.utils.setup_plot(figuresize, black_background)[source]
spacr.utils.plot_clusters(ax, embedding, labels, colors, cluster_centers, plot_outlines, plot_points, smooth_lines, figuresize=10, dot_size=50, verbose=False)[source]
spacr.utils.plot_umap_images(ax, image_paths, embedding, labels, image_nr, img_zoom, colors, plot_by_cluster, remove_image_canvas, verbose)[source]
spacr.utils.plot_images_by_cluster(ax, image_paths, embedding, labels, image_nr, img_zoom, colors, cluster_indices, remove_image_canvas, verbose)[source]
spacr.utils.plot_image(ax, x, y, img, img_zoom, remove_image_canvas=True)[source]
spacr.utils.remove_canvas(img)[source]
spacr.utils.plot_clusters_grid(embedding, labels, image_nr, image_paths, colors, figuresize, black_background, verbose)[source]
spacr.utils.plot_grid(cluster_images, colors, figuresize, black_background, verbose)[source]
spacr.utils.generate_path_list_from_db(db_path, file_metadata)[source]
spacr.utils.correct_paths(df, base_path, folder='data')[source]
spacr.utils.delete_folder(folder_path)[source]
spacr.utils.measure_test_mode(settings)[source]
spacr.utils.preprocess_data(df, filter_by, remove_highly_correlated, log_data, exclude, column_list=False)[source]

Preprocesses the given dataframe by applying filtering, removing highly correlated columns, applying log transformation, filling NaN values, and scaling the numeric data.

Args: df (pandas.DataFrame): The input dataframe. filter_by (str or None): The channel of interest to filter the dataframe by. remove_highly_correlated (bool or float): Whether to remove highly correlated columns. If a float is provided, it represents the correlation threshold. log_data (bool): Whether to apply log transformation to the numeric data. exclude (list or None): List of features to exclude from the filtering process. verbose (bool): Whether to print verbose output during preprocessing.

Returns: numpy.ndarray: The preprocessed numeric data.

Raises: ValueError: If no numeric columns are available after filtering.

spacr.utils.remove_low_variance_columns(df, threshold=0.01, verbose=False)[source]

Removes columns from the dataframe that have low variance.

Parameters: df (pandas.DataFrame): The DataFrame containing the data. threshold (float): The variance threshold below which columns will be removed.

Returns: pandas.DataFrame: The DataFrame with low variance columns removed.

spacr.utils.remove_highly_correlated_columns(df, threshold=0.95, verbose=False)[source]

Removes columns from the dataframe that are highly correlated with one another.

Parameters: df (pandas.DataFrame): The DataFrame containing the data. threshold (float): The correlation threshold above which columns will be removed.

Returns: pandas.DataFrame: The DataFrame with highly correlated columns removed.

spacr.utils.filter_dataframe_features(df, channel_of_interest, exclude=None, remove_low_variance_features=True, remove_highly_correlated_features=True, verbose=False)[source]

Filter the dataframe df based on the specified channel_of_interest and exclude parameters.

Parameters: - df (pandas.DataFrame): The input dataframe to be filtered. - channel_of_interest (str, int, list, None): The channel(s) of interest to filter the dataframe. If None, no filtering is applied. If ‘morphology’, only morphology features are included.If an integer, only the specified channel is included. If a list, only the specified channels are included.If a string, only the specified channel is included. - exclude (str, list, None): The feature(s) to exclude from the filtered dataframe. If None, no features are excluded. If a string, the specified feature is excluded.If a list, the specified features are excluded.

Returns: - filtered_df (pandas.DataFrame): The filtered dataframe based on the specified parameters. - features (list): The list of selected features after filtering.

spacr.utils.check_overlap(current_position, other_positions, threshold)[source]
spacr.utils.find_non_overlapping_position(x, y, image_positions, threshold, max_attempts=100)[source]
spacr.utils.search_reduction_and_clustering(numeric_data, n_neighbors, min_dist, metric, eps, min_samples, clustering, reduction_method, verbose, reduction_param=None, embedding=None, n_jobs=-1)[source]

Perform dimensionality reduction and clustering on the given data.

Parameters: numeric_data (np.array): Numeric data to process. n_neighbors (int): Number of neighbors for UMAP or perplexity for tSNE. min_dist (float): Minimum distance for UMAP. metric (str): Metric for UMAP, tSNE, and DBSCAN. eps (float): Epsilon for DBSCAN clustering. min_samples (int): Minimum samples for DBSCAN or number of clusters for KMeans. clustering (str): Clustering method (‘DBSCAN’ or ‘KMeans’). reduction_method (str): Dimensionality reduction method (‘UMAP’ or ‘tSNE’). verbose (bool): Whether to print verbose output. reduction_param (dict): Additional parameters for the reduction method. embedding (np.array): Precomputed embedding (optional). n_jobs (int): Number of parallel jobs to run.

Returns: embedding (np.array): Embedding of the data. labels (np.array): Cluster labels.

spacr.utils.load_image(image_path)[source]

Load and preprocess an image.

spacr.utils.extract_features(image_paths, resnet=resnet50)[source]

Extract features from images using a pre-trained ResNet model.

spacr.utils.check_normality(series)[source]

Helper function to check if a feature is normally distributed.

spacr.utils.random_forest_feature_importance(all_df, cluster_col='cluster')[source]

Random Forest feature importance.

spacr.utils.perform_statistical_tests(all_df, cluster_col='cluster')[source]

Perform ANOVA or Kruskal-Wallis tests depending on normality of features.

spacr.utils.combine_results(rf_df, anova_df, kruskal_df)[source]

Combine the results into a single DataFrame.

spacr.utils.cluster_feature_analysis(all_df, cluster_col='cluster')[source]

Perform Random Forest feature importance, ANOVA for normally distributed features, and Kruskal-Wallis for non-normally distributed features. Combine results into a single DataFrame.

spacr.utils.adjust_cell_masks(parasite_folder, cell_folder, nuclei_folder, overlap_threshold=5, perimeter_threshold=30)[source]

Process all npy files in the given folders. Merge and relabel cells in cell masks based on parasite overlap and cell perimeter sharing conditions.

Parameters:
  • parasite_folder (str) – Path to the folder containing parasite masks.

  • cell_folder (str) – Path to the folder containing cell masks.

  • nuclei_folder (str) – Path to the folder containing nuclei masks.

  • overlap_threshold (float) – The percentage threshold for merging cells based on parasite overlap.

  • perimeter_threshold (float) – The percentage threshold for merging cells based on shared perimeter.

spacr.utils.process_masks(mask_folder, image_folder, channel, batch_size=50, n_clusters=2, plot=False)[source]
spacr.utils.merge_regression_res_with_metadata(results_file, metadata_file, name='_metadata')[source]
spacr.utils.process_vision_results(df, threshold=0.5)[source]
spacr.utils.get_ml_results_paths(src, model_type='xgboost', channel_of_interest=1)[source]
spacr.utils.augment_image(image)[source]

Perform data augmentation by rotating and reflecting the image.

Parameters: - image (PIL Image or numpy array): The input image.

Returns: - augmented_images (list): A list of augmented images.

spacr.utils.augment_dataset(dataset, is_grayscale=False)[source]

Perform data augmentation on the entire dataset by rotating and reflecting the images.

Parameters: - dataset (list of tuples): The input dataset, each entry is a tuple (image, label, filename). - is_grayscale (bool): Flag indicating if the images are grayscale.

Returns: - augmented_dataset (list of tuples): A dataset with augmented (image, label, filename) tuples.

spacr.utils.convert_and_relabel_masks(folder_path)[source]

Converts all int64 npy masks in a folder to uint16 with relabeling to ensure all labels are retained.

Parameters: - folder_path (str): The path to the folder containing int64 npy mask files.

Returns: - None

spacr.utils.correct_masks(src)[source]
spacr.utils.count_reads_in_fastq(fastq_file)[source]
spacr.utils.get_cuda_version()[source]
spacr.utils.all_elements_match(list1, list2)[source]
spacr.utils.prepare_batch_for_segmentation(batch)[source]
spacr.utils.check_index(df, elements=5, split_char='_')[source]
spacr.utils.map_condition(col_value, neg='c1', pos='c2', mix='c3')[source]
spacr.utils.download_models(repo_id='einarolafsson/models', retries=5, delay=5)[source]

Downloads all model files from Hugging Face and stores them in the resources/models directory within the installed spacr package.

Parameters:
  • repo_id (str) – The repository ID on Hugging Face (default is ‘einarolafsson/models’).

  • retries (int) – Number of retry attempts in case of failure.

  • delay (int) – Delay in seconds between retries.

Returns:

The local path to the downloaded models.

Return type:

str

spacr.utils.generate_cytoplasm_mask(nucleus_mask, cell_mask)[source]

Generates a cytoplasm mask from nucleus and cell masks.

Parameters: - nucleus_mask (np.array): Binary or segmented mask of the nucleus (non-zero values represent nucleus). - cell_mask (np.array): Binary or segmented mask of the whole cell (non-zero values represent cell).

Returns: - cytoplasm_mask (np.array): Mask for the cytoplasm (1 for cytoplasm, 0 for nucleus and pathogens).

spacr.utils.add_column_to_database(settings)[source]

Adds a new column to the database table by matching on a common column from the DataFrame. If the column already exists in the database, it adds the column with a suffix. NaN values will remain as NULL in the database.

Parameters:

settings (dict) – A dictionary containing the following keys: csv_path (str): Path to the CSV file with the data to be added. db_path (str): Path to the SQLite database (or connection string for other databases). table_name (str): The name of the table in the database. update_column (str): The name of the new column in the DataFrame to add to the database. match_column (str): The common column used to match rows.

Returns:

None

spacr.utils.fill_holes_in_mask(mask)[source]

Fill holes in each object in the mask while keeping objects separated.

Parameters:

mask (np.ndarray) – A labeled mask where each object has a unique integer value.

Returns:

A mask with holes filled and original labels preserved.

Return type:

np.ndarray

spacr.utils.correct_metadata_column_names(df)[source]
spacr.utils.control_filelist(folder, mode='columnID', values=['01', '02'])[source]
spacr.utils.rename_columns_in_db(db_path)[source]
spacr.utils.group_feature_class(df, feature_groups=['cell', 'cytoplasm', 'nucleus', 'pathogen'], name='compartment')[source]
spacr.utils.delete_intermedeate_files(settings)[source]
spacr.utils.filter_and_save_csv(input_csv, output_csv, column_name, upper_threshold, lower_threshold)[source]

Reads a CSV into a DataFrame, filters rows based on a column for values > upper_threshold and < lower_threshold, and saves the filtered DataFrame to a new CSV file.

Parameters:
  • input_csv (str) – Path to the input CSV file.

  • output_csv (str) – Path to save the filtered CSV file.

  • column_name (str) – Column name to apply the filters on.

  • upper_threshold (float) – Upper threshold for filtering (values greater than this are retained).

  • lower_threshold (float) – Lower threshold for filtering (values less than this are retained).

Returns:

None

spacr.utils.extract_tar_bz2_files(folder_path)[source]

Extracts all .tar.bz2 files in the given folder into subfolders with the same name as the tar file.

Parameters:

folder_path (str) – Path to the folder containing .tar.bz2 files.

spacr.utils.calculate_shortest_distance(df, object1, object2)[source]

Calculate the shortest edge-to-edge distance between two objects (e.g., pathogen and nucleus).

Parameters: - df: Pandas DataFrame containing measurements - object1: String, name of the first object (e.g., “pathogen”) - object2: String, name of the second object (e.g., “nucleus”)

Returns: - df: Pandas DataFrame with a new column for shortest edge-to-edge distance.

spacr.utils.format_path_for_system(path)[source]

Takes a file path and reformats it to be compatible with the current operating system.

Parameters:

path (str) – The file path to be formatted.

Returns:

The formatted path for the current operating system.

Return type:

str

spacr.utils.normalize_src_path(src)[source]

Ensures that the ‘src’ value is properly formatted as either a list of strings or a single string.

Parameters:

src (str or list) – The input source path(s).

Returns:

A correctly formatted list if the input was a list (or string representation of a list),

otherwise a single string.

Return type:

list or str

spacr.utils.generate_image_path_map(root_folder, valid_extensions=('tif', 'tiff', 'png', 'jpg', 'jpeg', 'bmp', 'czi', 'nd2', 'lif'))[source]

Recursively scans a folder and its subfolders for images, then creates a mapping of: {original_image_path: new_image_path}, where the new path includes all subfolder names.

Parameters:
  • root_folder (str) – The root directory to scan for images.

  • valid_extensions (tuple) – Tuple of valid image file extensions.

Returns:

A dictionary mapping original image paths to their new paths.

Return type:

dict

spacr.utils.copy_images_to_consolidated(image_path_map, root_folder)[source]

Copies images from their original locations to a ‘consolidated’ folder, renaming them according to the generated dictionary.

Parameters:
  • image_path_map (dict) – Dictionary mapping {original_path: new_path}.

  • root_folder (str) – The root directory where the ‘consolidated’ folder will be created.

spacr.utils.correct_metadata(df)[source]