hcitools
hcitools
The hcitools
package provides tools for analyzing and visualizing data
generated in high-content imaging experiments.
Installation
# Clone repository
git clone -b prod git@mygithub.gsk.com:to561778/hci-tools.git
# Install package
python -m pip install -e hci-tools
Usage
Package documentation is available here. See docs/examples for detailed guides for generating figures and performing various analysis steps.
Developer Instructions
Use the script below to set up a development environment for this package.
# Clone the repository
git clone -b dev git@mygithub.gsk.com:to561778/hci-tools.git
cd hci-tools
# Create conda environment
conda env create -f environment.yml
conda activate hcitools
# Install the package
python -m pip install -e .
Deploying Changes
Once changes have been made, use the
scripts/deploy.sh
script to rebuild the package wheel and update the documentation. This will also reinstall the package in the active environment.Note: Only run
deploy.sh
from the top-level hci-tools directory.
Examples
Heatmaps
This example will show you how to generate various heatmaps and how to annotate them using the plotly library.
Datasets
This example makes use of two of the built-in datasets listed below
covid
- Protein expression data from a cohort of COVID-19 patientsros-mito
- High content imaging features from an experiment.
# Import
from hcitools import datasets, plot
# Load datasets
covid = datasets.load_dataset('covid')
ros = datasets.load_dataset('ros-mito')
# Plotly renderer
plot.set_renderer('notebook') # Use this when running notebook
plot.set_renderer('iframe_connected') # Use this when rendering docs
Protein Expression Heatmaps
Here, we'll create a heatmap to look at the expression of proteins in the patients' blood. We'll include colorbars for patient sex and mortality. We'll also look at how you could add annotations to highlight certain regions of the heatmap.
# Prepare data frame
data = (covid.copy()
.filter(regex='^B-', axis=1)) # Keep only blood markers
metadata = covid[['Mortality', 'Sex']]
data.columns = [x[2:] for x in data.columns]
# Define groups for heatmap
row_groups = {
k: list(v.values()) for k, v in metadata.to_dict(orient='index').items()
}
row_group_names = ['Mortality', 'Sex']
row_colors = {'Alive': '#38d652', 'Dead': '#d93e38',
'Male': 'blue', 'Female': 'pink'}
# Create heatmap
fig = plot.heatmap(
data=data,
clust_rows=True,
clust_cols=True,
row_colors=row_colors,
row_groups=row_groups,
row_group_names=row_group_names
)
# Add a title and tweak the size
fig.update_layout(
title='Blood Biomarkers',
title_x=0.5,
height=400,
width=700
)
# Annotate highly expressed proteins
fig.add_shape(
type='rect',
x0='MCP-1', x1='EGF',
y0=0, y1=88,
row=1, col=3,
line=dict(color='black')
)
fig.show()
Correlation Maps
Here, we'll generate a heatmap to visualize the correlation of blood proteins with markers of clinical severity.
# Prepare data frame
vars = ['APACHE1h', 'APACHE24h', 'CCI']
data = (covid.copy()
.set_index(vars)
.filter(regex='^B-')
.reset_index()
.corr()
.loc[vars, :]
.drop(vars, axis=1))
data.columns = [x[2:] for x in data.columns]
# Create heatmap
fig = plot.heatmap(
data=data,
clust_cols=True,
clust_rows=True
)
fig.update_layout(
title='Correlation with Clinical Severity',
title_x=0.5,
height=400,
width=700
)
# Show ticks on the y axis (these are hidden by default)
fig.update_yaxes(
showticklabels=True,
tickfont_size=14
)
fig.show()
Plate Map
Next, we'll show how you can generate an interactive heatmap to view expression across a 96 (or 384) well plate using high-content imaging data.
fig = plot.plate_heatmap(
data=ros,
feature="Non-border cells - Number of Objects"
)
fig.update_layout(width=900, height=500)
fig.show()
Clustering
This example will show you how to perform dimensionality reduction and visualize
any resulting clusters. We will also show how certain preprocessing steps can
be done using preprocess.clean_data
.
Datasets
This example makes use of the ros-mito
data set which contains features
extracted from high-content images.
# Imports
from hcitools import datasets, plot, analysis, preprocess
# Load dataset
ros = datasets.load_dataset('ros-mito')
# Plotly renderer
plot.set_renderer('notebook') # Use this when running notebook
plot.set_renderer('iframe_connected') # Use this when rendering docs
# Preprocessing
meta = ['Well', 'Row', 'Column', 'Timepoint', 'Compound', 'Conc']
df, dropped, LOG = preprocess.clean_data(
data=ros,
metacols=meta,
dropna=True,
drop_low_var=0.0,
corr_thresh=0.9,
verbose=True
)
df = df.set_index(meta)
# Generate clusters with default arguments
proj, expvar = analysis.dim_reduction(data=df, method=['pca', 'tsne'])
# Plot PCA components
fig = plot.pca_comps(proj, expvar, n_comps=3)
fig.update_layout(width=700, height=400)
fig.show()
# Compare 2 compounds
fig = plot.clusters(proj, 'Sorafenib Tosylate', 'Imatinib mesylate', 'tsne')
fig.update_layout(width=750, height=450)
fig.show()
1""" 2.. include:: ../README.md 3 4# Examples 5.. include:: ../docs/heatmaps.md 6.. include:: ../docs/clustering.md 7""" 8 9import os 10 11__all__ = ['preprocess', 'analysis', 'plot', 'datasets'] 12 13location = os.path.dirname(os.path.realpath(__file__)) 14 15 16class datasets: 17 """ 18 Class for loading built-in datasets 19 """ 20 21 _avail = { 22 'caer': os.path.join(location, 'datasets', 'caer-timecourse.tsv'), 23 'covid': os.path.join(location, 'datasets', 'covid-cohort.tsv'), 24 'ros-mito': os.path.join(location, 'datasets', 'ros-mito-timecourse.tsv') 25 } 26 27 def list_datasets(): 28 """ 29 List available built-in datasets 30 """ 31 32 print("Available Datasets:", *datasets._avail.keys(), sep='\n') 33 34 35 def load_dataset(dataset): 36 """ 37 Load a built-in dataset 38 39 Parameters 40 ---------- 41 `dataset` : str 42 One of 'covid', 'caer' or 'ros-mito' 43 44 Returns 45 ------- 46 pd.DataFrame 47 Desired dataset 48 """ 49 50 assert dataset in ['caer', 'covid', 'ros-mito'], \ 51 "Unknown dataset. See datasets.list_datasets()" 52 from pandas import read_csv 53 54 return read_csv(datasets._avail[dataset], sep='\t')
17class datasets: 18 """ 19 Class for loading built-in datasets 20 """ 21 22 _avail = { 23 'caer': os.path.join(location, 'datasets', 'caer-timecourse.tsv'), 24 'covid': os.path.join(location, 'datasets', 'covid-cohort.tsv'), 25 'ros-mito': os.path.join(location, 'datasets', 'ros-mito-timecourse.tsv') 26 } 27 28 def list_datasets(): 29 """ 30 List available built-in datasets 31 """ 32 33 print("Available Datasets:", *datasets._avail.keys(), sep='\n') 34 35 36 def load_dataset(dataset): 37 """ 38 Load a built-in dataset 39 40 Parameters 41 ---------- 42 `dataset` : str 43 One of 'covid', 'caer' or 'ros-mito' 44 45 Returns 46 ------- 47 pd.DataFrame 48 Desired dataset 49 """ 50 51 assert dataset in ['caer', 'covid', 'ros-mito'], \ 52 "Unknown dataset. See datasets.list_datasets()" 53 from pandas import read_csv 54 55 return read_csv(datasets._avail[dataset], sep='\t')
Class for loading built-in datasets
28 def list_datasets(): 29 """ 30 List available built-in datasets 31 """ 32 33 print("Available Datasets:", *datasets._avail.keys(), sep='\n')
List available built-in datasets
36 def load_dataset(dataset): 37 """ 38 Load a built-in dataset 39 40 Parameters 41 ---------- 42 `dataset` : str 43 One of 'covid', 'caer' or 'ros-mito' 44 45 Returns 46 ------- 47 pd.DataFrame 48 Desired dataset 49 """ 50 51 assert dataset in ['caer', 'covid', 'ros-mito'], \ 52 "Unknown dataset. See datasets.list_datasets()" 53 from pandas import read_csv 54 55 return read_csv(datasets._avail[dataset], sep='\t')
Load a built-in dataset
Parameters
dataset
(str): One of 'covid', 'caer' or 'ros-mito'
Returns
- pd.DataFrame: Desired dataset