scNym/api Module

Classify cell identities using scNym

scnym_api() is the main API endpoint for users. This function allows for training and prediction using scnym_train() and scnym_predict(). Both of these functions will be infrequently accessed by users.

get_pretrained_weights() is a wrapper function that downloads pretrained weights from our cloud storage bucket. atlas2target() downloads preprocessed reference datasets and concatenates them onto a user supplied target dataset.

scnym.api.scnym_api(adata, task='train', groupby=None, out_path='./scnym_outputs', trained_model=None, config='new_identity_discovery', key_added='scNym', copy=False)

scNym: Semi-supervised adversarial neural networks for single cell classification [Kimmel2020].

scNym is a cell identity classifier that transfers annotations from one single cell experiment to another. The model is implemented as a neural network that employs MixMatch semi-supervision and a domain adversary to take advantage of unlabeled data during training. scNym offers superior performance to many baseline single cell identity classification methods.

Parameters
  • adata (AnnData) – Annotated data matrix used for training or prediction.

  • task (str) – Task to perform, either “train” or “predict”. If “train”, uses adata as labeled training data. If “predict”, uses trained_model to infer cell identities for observations in adata.

  • groupby (Optional[str]) – Column in adata.obs that contains cell identity annotations. Values of “Unlabeled” indicate that a given cell should be used only as unlabeled data during training.

  • out_path (str) – Path to a directory for saving scNym model weights and training logs.

  • trained_model (Optional[str]) – Used when task==”predict”’. Path to the output directory of an scNym training run or a string specifying a pretrained model. Pretrained model strings are f”pretrained_{species}” where species is one of `{“human”, “mouse”, “rat”}. Providing a pretrained model string will download pre-trained weights and predict directly on the target data, without additional training.

  • config (Union[dict, str]) –

    Configuration name or dictionary of configuration of parameters. Pre-defined configurations:

    ”new_identity_discovery” - Default. Employs pseudolabel thresholding to allow for discovery of new cell identities in the target dataset using scNym confidence scores. “no_new_identity” - Assumes all cells in the target data belong to one of the classes in the training data. Recommended to improve performance when this assumption is valid.

  • key_added (str) – Key added to adata.obs with scNym predictions if task==”predict”.

  • copy (bool) – copy the AnnData object before predicting cell types.

Return type

Optional[AnnData]

Returns

  • Depending on copy, returns or updates adata with the following fields.

  • `X_scnym` (ndarray, (obsm, shape=(n_samples, n_hidden), dtype float)) – scNym embedding coordinates of data.

  • `scNym` ((adata.obs, dtype str)) – scNym cell identity predictions for each observation.

  • `scNym_train_results` (dict, (uns)) – results of scNym model training.

Examples

>>> import scanpy as sc
>>> from scnym.api import scnym_api, atlas2target

Loading Data and preparing labels

>>> adata = sc.datasets.kang17()
>>> target_bidx = adata.obs['stim']=='stim'
>>> adata.obs['cell'] = np.array(adata.obs['cell'])
>>> adata.obs.loc[target_bidx, 'cell'] = 'Unlabeled'

Train an scNym model

>>> scnym_api(
...   adata=adata,
...   task='train',
...   groupby='clusters',
...   out_path='./scnym_outputs',
...   config='no_new_identity',
... )

Predict cell identities with the trained scNym model

>>> path_to_model = './scnym_outputs/'
>>> scnym_api(
...   adata=adata,
...   task='predict',
...   groupby='scNym',
...   trained_model=path_to_model,
...   config='no_new_identity',
... )

Predict cell identities with a pretrained scNym model

>>> scnym_api(
...   adata=adata,
...   task='predict',
...   groupby='scNym',
...   trained_model='pretrained_human',
...   config='no_new_identity',
... )

Perform semi-supervised training with an atlas

>>> joint_adata = atlas2target(
...   adata=adata,
...   species='human',
...   key_added='annotations',
... )
>>> scnym_api(
...   adata=joint_adata,
...   task='train',
...   groupby='annotations',
...   out_path='./scnym_outputs',
...   config='no_new_identity',
... )
scnym.api.scnym_train(adata, config)

Train an scNym model.

Parameters
  • adata (AnnData) – [Cells, Genes] experiment containing annotated cells to train on.

  • config (dict) – configuration options.

Return type

None

Returns

  • None.

  • Saves model outputs to config[“out_path”] and adds model results

  • to adata.uns[“scnym_train_results”].

Notes

This method should only be directly called by advanced users. Most users should use scnym_api.

See also

scnym_api()

scnym.api.scnym_predict(adata, config)

Predict cell identities using an scNym model.

Parameters
  • adata (AnnData) – [Cells, Genes] experiment containing annotated cells to train on.

  • config (dict) – configuration options.

Returns

Return type

None. Adds adata.obs[config[“key_added”]] and adata.obsm[“X_scnym”].

Notes

This method should only be directly called by advanced users. Most users should use scnym_api.

See also

scnym_api()

scnym.api.get_pretrained_weights(trained_model, out_path)

Given the name of a set of pretrained model weights, fetch weights from GCS and return the model state dict.

Parameters
  • trained_model (str) – the name of a pretrained model to use, formatted as “pretrained_{species}”. species should be one of {“human”, “mouse”, “rat”}.

  • out_path (str) – path for saving model weights and outputs.

Return type

str

Returns

  • species (str) – species parsed from the trained model name.

  • Saves “{out_path}/00_best_model_weights.pkl” and

  • ”{out_path}/scnym_train_results.pkl”.

Notes

Requires an internet connection to download pre-trained weights.

scnym.api.atlas2target(adata, species, key_added='annotations')

Download a preprocessed cell atlas dataset and append your new dataset as a target to allow for semi-supervised scNym training.

Parameters

adata (anndata.AnnData) – [Cells, Features] experiment to use as a target dataset.

Returns

joint_adata – [Cells, Features] experiment concatenated with a preprocessed cell atlas reference dataset. Annotations from the atlas are copied to .obs[key_added] and all cells in the target dataset adata are labeled with the special “Unlabeled” token.

Return type

anndata.AnnData

Examples

>>> adata = sc.datasets.pbmc3k()
>>> joint_adata = scnym.api.atlas2target(
...     adata=adata,
...     species='human',
...     key_added='annotations',
... )

Notes

Requires an internet connection to download reference datasets.