scNym/api Module¶
Classify cell identities using scNym
scnym_api() is the main API endpoint for users. This function allows for training and prediction using scnym_train() and scnym_predict(). Both of these functions will be infrequently accessed by users.
get_pretrained_weights() is a wrapper function that downloads pretrained weights from our cloud storage bucket. atlas2target() downloads preprocessed reference datasets and concatenates them onto a user supplied target dataset.
-
scnym.api.
scnym_api
(adata, task='train', groupby=None, out_path='./scnym_outputs', trained_model=None, config='new_identity_discovery', key_added='scNym', copy=False)¶ scNym: Semi-supervised adversarial neural networks for single cell classification [Kimmel2020].
scNym is a cell identity classifier that transfers annotations from one single cell experiment to another. The model is implemented as a neural network that employs MixMatch semi-supervision and a domain adversary to take advantage of unlabeled data during training. scNym offers superior performance to many baseline single cell identity classification methods.
- Parameters
adata (
AnnData
) – Annotated data matrix used for training or prediction.task (
str
) – Task to perform, either “train” or “predict”. If “train”, uses adata as labeled training data. If “predict”, uses trained_model to infer cell identities for observations in adata.groupby (
Optional
[str
]) – Column in adata.obs that contains cell identity annotations. Values of “Unlabeled” indicate that a given cell should be used only as unlabeled data during training.out_path (
str
) – Path to a directory for saving scNym model weights and training logs.trained_model (
Optional
[str
]) – Used when task==”predict”’. Path to the output directory of an scNym training run or a string specifying a pretrained model. Pretrained model strings are f”pretrained_{species}” where species is one of `{“human”, “mouse”, “rat”}. Providing a pretrained model string will download pre-trained weights and predict directly on the target data, without additional training.config (
Union
[dict
,str
]) –Configuration name or dictionary of configuration of parameters. Pre-defined configurations:
”new_identity_discovery” - Default. Employs pseudolabel thresholding to allow for discovery of new cell identities in the target dataset using scNym confidence scores. “no_new_identity” - Assumes all cells in the target data belong to one of the classes in the training data. Recommended to improve performance when this assumption is valid.
key_added (
str
) – Key added to adata.obs with scNym predictions if task==”predict”.copy (
bool
) – copy the AnnData object before predicting cell types.
- Return type
Optional
[AnnData
]- Returns
Depending on copy, returns or updates adata with the following fields.
`X_scnym` (
ndarray
, (obsm
, shape=(n_samples, n_hidden), dtype float)) – scNym embedding coordinates of data.`scNym` ((adata.obs, dtype str)) – scNym cell identity predictions for each observation.
`scNym_train_results` (
dict
, (uns
)) – results of scNym model training.
Examples
>>> import scanpy as sc >>> from scnym.api import scnym_api, atlas2target
Loading Data and preparing labels
>>> adata = sc.datasets.kang17() >>> target_bidx = adata.obs['stim']=='stim' >>> adata.obs['cell'] = np.array(adata.obs['cell']) >>> adata.obs.loc[target_bidx, 'cell'] = 'Unlabeled'
Train an scNym model
>>> scnym_api( ... adata=adata, ... task='train', ... groupby='clusters', ... out_path='./scnym_outputs', ... config='no_new_identity', ... )
Predict cell identities with the trained scNym model
>>> path_to_model = './scnym_outputs/' >>> scnym_api( ... adata=adata, ... task='predict', ... groupby='scNym', ... trained_model=path_to_model, ... config='no_new_identity', ... )
Predict cell identities with a pretrained scNym model
>>> scnym_api( ... adata=adata, ... task='predict', ... groupby='scNym', ... trained_model='pretrained_human', ... config='no_new_identity', ... )
Perform semi-supervised training with an atlas
>>> joint_adata = atlas2target( ... adata=adata, ... species='human', ... key_added='annotations', ... ) >>> scnym_api( ... adata=joint_adata, ... task='train', ... groupby='annotations', ... out_path='./scnym_outputs', ... config='no_new_identity', ... )
-
scnym.api.
scnym_train
(adata, config)¶ Train an scNym model.
- Parameters
adata (AnnData) – [Cells, Genes] experiment containing annotated cells to train on.
config (dict) – configuration options.
- Return type
None
- Returns
None.
Saves model outputs to config[“out_path”] and adds model results
to adata.uns[“scnym_train_results”].
Notes
This method should only be directly called by advanced users. Most users should use scnym_api.
See also
-
scnym.api.
scnym_predict
(adata, config)¶ Predict cell identities using an scNym model.
- Parameters
adata (AnnData) – [Cells, Genes] experiment containing annotated cells to train on.
config (dict) – configuration options.
- Returns
- Return type
None. Adds adata.obs[config[“key_added”]] and adata.obsm[“X_scnym”].
Notes
This method should only be directly called by advanced users. Most users should use scnym_api.
See also
-
scnym.api.
get_pretrained_weights
(trained_model, out_path)¶ Given the name of a set of pretrained model weights, fetch weights from GCS and return the model state dict.
- Parameters
trained_model (str) – the name of a pretrained model to use, formatted as “pretrained_{species}”. species should be one of {“human”, “mouse”, “rat”}.
out_path (str) – path for saving model weights and outputs.
- Return type
str
- Returns
species (str) – species parsed from the trained model name.
Saves “{out_path}/00_best_model_weights.pkl” and
”{out_path}/scnym_train_results.pkl”.
Notes
Requires an internet connection to download pre-trained weights.
-
scnym.api.
atlas2target
(adata, species, key_added='annotations')¶ Download a preprocessed cell atlas dataset and append your new dataset as a target to allow for semi-supervised scNym training.
- Parameters
adata (anndata.AnnData) – [Cells, Features] experiment to use as a target dataset.
- Returns
joint_adata – [Cells, Features] experiment concatenated with a preprocessed cell atlas reference dataset. Annotations from the atlas are copied to .obs[key_added] and all cells in the target dataset adata are labeled with the special “Unlabeled” token.
- Return type
anndata.AnnData
Examples
>>> adata = sc.datasets.pbmc3k() >>> joint_adata = scnym.api.atlas2target( ... adata=adata, ... species='human', ... key_added='annotations', ... )
Notes
Requires an internet connection to download reference datasets.