from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
The new interface for hyperparameter selection and modeling are "Trainers", which implement an sklearn-style interface that allows for tuning using sklearn model selection constructs. Import the trainers and some other useful packages.
from kladi.matrix_models.estimator import AccessibilityTrainer, ExpressionTrainer
from sklearn.model_selection import train_test_split, ShuffleSplit
import scanpy as sc
import anndata
import numpy as np
import logging
logging.basicConfig(level=logging.INFO)
rna_data = sc.read_10x_h5('./data/mouse_prostate/2021-06-01_mouse_prostate_raw_10x_features.h5')
rna_data.var_names = rna_data.var_names.str.upper()
rna_data.var_names_make_unique()
rna_data = rna_data[:, ~rna_data.var.index.str.startswith('GM')]
Variable names are not unique. To make them unique, call `.var_names_make_unique`. Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Cells with below 400 genes expressed will tend to produce produce poor topics and embeddings. This is a good threshold for removing cells in order to train a quality model.
sc.pp.filter_cells(rna_data, min_counts = 400)
sc.pp.filter_genes(rna_data, min_cells=15)
Trying to set attribute `.obs` of view, copying.
This dataset has a lot of mitochondrial reads, so filter cells with a high proportion of those.
rna_data.var['mt'] = rna_data.var_names.str.startswith('MT-') # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(rna_data, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
sc.pl.violin(rna_data, ['n_counts','n_genes_by_counts','total_counts'], jitter = 0.4, multi_panel = True)
sc.pl.scatter(rna_data, x='total_counts', y='pct_counts_mt')
sc.pl.scatter(rna_data, x='total_counts', y='n_genes_by_counts')
... storing 'feature_types' as categorical ... storing 'genome' as categorical
Subset cells by %mitochondrial reads, and also remove cells with a huge number of genes since these may dublets or technical anomolies.
rna_data = rna_data[rna_data.obs.pct_counts_mt < 15, :]
rna_data = rna_data[rna_data.obs.n_genes_by_counts < 3000, :]
With QC, complete, we need to perform feature selection to identify two groups of genes. The first is
Essentially genes that meat a minimum mean_expression
threshold. The second group is a subset of the first, and contains
These genes are more likely to follow interesting patterns of dynamic response and regulation. SCIPM will learn module relationships and impute expression patterns in all genes in the first group, but will use only the second group of genes' expression in mapping expression to latent module compositions. This reduces training time and ensures that modules follow dynamic changes in expression instead of basal trends that are more likely to arrise in non-deviant genes.
Important: Freeze the state of the andata here to preserve raw counts
rna_data.raw = rna_data # first, freeze the anndata's state in raw counts
sc.pp.normalize_total(rna_data, target_sum=1e4) #normalize and log
sc.pp.log1p(rna_data)
Next, find genes that meet scanpy's default min_mean
requirement, which is reasonable set.
Here, reduce the min_disp
parameter so that genes that are expressed, but not necessarily variable, are included.
All genes flagged with "highly_variable" will be modeled by scIPM.
sc.pp.highly_variable_genes(rna_data, min_disp = -0.1)
With the first group of genes selected, choose a subset of them which will be used as features for the encoder. For this, simply set a higher threshold for dispersion. Add a is_feature
column to the .vars
dataframe which flags genes that have a dispersions_norm
greater than 0.8. This particular threshold should be adjusted so that between 1000 and 2000 dispersed genes are used as features.
rna_data.var['is_feature'] = rna_data.var.dispersions_norm > 0.8
Important: Restore the raw count matrix from the .raw
attribute, then subset the counts to genes flagged highly_variable
, or all genes that passed the minimum expression test.
rna_data.X = rna_data.raw.X
rna_data = rna_data[:, rna_data.var.highly_variable]
Now move on to hyperparameter tuning, which seeks to find the most optimal model to describe your data. Kladi utilizes iterative Bayesian Optimization with aggressive pruning to try to converge to the best model in the shortest possible time. The basic workflow is
First, separate cells into training and validation sets using sklearn.model_selection.train_test_split
. The training and testing-set size should be adjusted depending on the size of the dataset and should maximize the proportion of cells in the training set while still providing a reliable evaluation of model quality.
train_expr, test_expr = train_test_split(rna_data.X, train_size = 0.8, random_state = 2556)
Next, instantiate an ExpressionTrainer
. The two arguments that must provide are the
features
, a list or numpy array of gene nameshighly_variable
, a boolean mask the same length as the features list, where highly-variable genes that will be used as features for the encoder are flagged with True
estimator = ExpressionTrainer(features = np.array(rna_data.var_names),
highly_variable = rna_data.var.is_feature.values)
After instantiating a trainer, run the tune_learning_rate_bounds
function to identify the minimum and maximum learning rates appropriate for the dataset. This function gradually increases the learning rate of the optimizer while monitoring the loss. The optimal boundaries for the learning rate are at the start and end of region of the loss curve with the greatest decreasing slope.
estimator.tune_learning_rate_bounds(train_expr, num_epochs = 5, eval_every = 10)
(0.0009325715837037386, 0.1430019603506025)
Plot the loss curve using the ExpressionTrainer.plot_learning_rate_bounds
function. The trainer will try to automatically detect good boundaries, but if they need to be adjusted, use the ExpressionTrainer.trim_learning_rate_bounds
function. The first and second arguments scoot the minimum and maximum boundaries inward by one magnitude, respectively. In this example, I adjust the minimum rate upwards by a 1 magnitude (5e-4 --> 5e-3) so rest the boundary at the start of the decline.
One may also directly set the learning rates with the ExpressionTrainer.set_learning_rates
function.
estimator.trim_learning_rate_bounds(1, 0)
estimator.plot_learning_rate_bounds()
<AxesSubplot:xlabel='Learning Rate', ylabel='Loss'>
With the learning rate set, use Bayesian Optimization to rune the remaining hyperparameters:
The tuner updates a histogram with the number of modules chosen for each trial. When the optimizer repeatedly chooses similar num_modules
hyperparamters, this is a sign it is converging closer to the optimal model.
The process may be stopped after a number of iterations, or by esc-I-I to interrupt the process. This will end optimization and save results.
study = estimator.tune_hyperparameters(train_expr.copy())
Trials finished: 64 | Best trial: 43 | Best score: 2.1757e-01 Modules | Trials (number is #folds tested) 5 | 5 3 1 1 6 | 1 1 1 1 1 7 | 5 5 5 1 3 1 1 8 | 3 1 3 1 1 3 9 | 5 1 5 3 1 5 1 5 5 1 1 10 | 3 5 1 5 5 1 5 11 | 3 5 1 5 1 12 | 3 1 5 13 | 1 1 1 14 | 1 15 | 3 3 1 17 | 1 18 | 1 20 | 1 22 | 5 23 | 1 24 | 1 27 | 1 38 | 1 53 | 1 Trial Information: Trial #0 | completed, score: 2.2120e-01 | params: {'num_modules': 22, 'batch_size': 128, 'encoder_dropout': 0.08242138974805993, 'num_epochs': 24, 'seed': 1962878913} Trial #1 | completed, score: 2.1833e-01 | params: {'num_modules': 5, 'batch_size': 64, 'encoder_dropout': 0.2396864600494837, 'num_epochs': 31, 'seed': 2144373191} Trial #2 | completed, score: 2.1784e-01 | params: {'num_modules': 7, 'batch_size': 64, 'encoder_dropout': 0.026511986055858736, 'num_epochs': 22, 'seed': 2067730395} Trial #3 | pruned at step: 3 | params: {'num_modules': 5, 'batch_size': 32, 'encoder_dropout': 0.06010452622456072, 'num_epochs': 33, 'seed': 320028794} Trial #4 | pruned at step: 1 | params: {'num_modules': 38, 'batch_size': 32, 'encoder_dropout': 0.047441768632954184, 'num_epochs': 32, 'seed': 4148136245} Trial #5 | pruned at step: 1 | params: {'num_modules': 23, 'batch_size': 32, 'encoder_dropout': 0.142868561604981, 'num_epochs': 30, 'seed': 3724803561} Trial #6 | pruned at step: 1 | params: {'num_modules': 24, 'batch_size': 64, 'encoder_dropout': 0.019707029866746138, 'num_epochs': 32, 'seed': 1767928698} Trial #7 | pruned at step: 3 | params: {'num_modules': 15, 'batch_size': 64, 'encoder_dropout': 0.22492275542641288, 'num_epochs': 22, 'seed': 2403870987} Trial #8 | pruned at step: 3 | params: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.0776732098379506, 'num_epochs': 24, 'seed': 3349825079} Trial #9 | pruned at step: 3 | params: {'num_modules': 11, 'batch_size': 32, 'encoder_dropout': 0.16735857155053852, 'num_epochs': 26, 'seed': 2532030062} Trial #10 | completed, score: 2.1788e-01 | params: {'num_modules': 7, 'batch_size': 128, 'encoder_dropout': 0.2978836088039605, 'num_epochs': 39, 'seed': 2037411891} Trial #11 | completed, score: 2.1792e-01 | params: {'num_modules': 7, 'batch_size': 128, 'encoder_dropout': 0.29420112460401726, 'num_epochs': 40, 'seed': 1590795410} Trial #12 | pruned at step: 3 | params: {'num_modules': 8, 'batch_size': 128, 'encoder_dropout': 0.12321814252477711, 'num_epochs': 40, 'seed': 2054599470} Trial #13 | pruned at step: 1 | params: {'num_modules': 5, 'batch_size': 128, 'encoder_dropout': 0.2926976520040703, 'num_epochs': 20, 'seed': 1719701719} Trial #14 | pruned at step: 1 | params: {'num_modules': 7, 'batch_size': 64, 'encoder_dropout': 0.1917855909845501, 'num_epochs': 36, 'seed': 3938793071} Trial #15 | pruned at step: 1 | params: {'num_modules': 13, 'batch_size': 128, 'encoder_dropout': 0.26083623917457843, 'num_epochs': 20, 'seed': 747713223} Trial #16 | pruned at step: 1 | params: {'num_modules': 8, 'batch_size': 128, 'encoder_dropout': 0.011981932139393525, 'num_epochs': 27, 'seed': 3506106968} Trial #17 | pruned at step: 1 | params: {'num_modules': 6, 'batch_size': 64, 'encoder_dropout': 0.11112878394170587, 'num_epochs': 22, 'seed': 2960744749} Trial #18 | pruned at step: 1 | params: {'num_modules': 53, 'batch_size': 128, 'encoder_dropout': 0.184669829094548, 'num_epochs': 36, 'seed': 1782749767} Trial #19 | completed, score: 2.1775e-01 | params: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.21046060872591305, 'num_epochs': 29, 'seed': 1495837260} Trial #20 | pruned at step: 1 | params: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.2117743130969354, 'num_epochs': 29, 'seed': 959264911} Trial #21 | completed, score: 2.1767e-01 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.26554314946359686, 'num_epochs': 36, 'seed': 1496647099} Trial #22 | pruned at step: 1 | params: {'num_modules': 17, 'batch_size': 64, 'encoder_dropout': 0.2622569126404776, 'num_epochs': 35, 'seed': 1110122748} Trial #23 | pruned at step: 3 | params: {'num_modules': 12, 'batch_size': 64, 'encoder_dropout': 0.265470661528064, 'num_epochs': 25, 'seed': 1091083373} Trial #24 | pruned at step: 1 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.21343621250276396, 'num_epochs': 22, 'seed': 3031693428} Trial #25 | pruned at step: 3 | params: {'num_modules': 15, 'batch_size': 64, 'encoder_dropout': 0.2392143523602535, 'num_epochs': 27, 'seed': 1350900188} Trial #26 | pruned at step: 1 | params: {'num_modules': 6, 'batch_size': 64, 'encoder_dropout': 0.19401621514550912, 'num_epochs': 28, 'seed': 2297095598} Trial #27 | completed, score: 2.1762e-01 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.1615287147217766, 'num_epochs': 34, 'seed': 908241052} Trial #28 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 64, 'encoder_dropout': 0.15386020433067363, 'num_epochs': 38, 'seed': 2552913309} Trial #29 | pruned at step: 1 | params: {'num_modules': 13, 'batch_size': 64, 'encoder_dropout': 0.1681008152516475, 'num_epochs': 34, 'seed': 4041406018} Trial #30 | pruned at step: 3 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.12581948057583062, 'num_epochs': 37, 'seed': 121826864} Trial #31 | pruned at step: 1 | params: {'num_modules': 6, 'batch_size': 64, 'encoder_dropout': 0.10495749985580072, 'num_epochs': 30, 'seed': 3152660040} Trial #32 | pruned at step: 3 | params: {'num_modules': 8, 'batch_size': 64, 'encoder_dropout': 0.2417994327774824, 'num_epochs': 34, 'seed': 3442712121} Trial #33 | completed, score: 2.1779e-01 | params: {'num_modules': 11, 'batch_size': 64, 'encoder_dropout': 0.27676593524195015, 'num_epochs': 24, 'seed': 803168545} Trial #34 | pruned at step: 1 | params: {'num_modules': 11, 'batch_size': 64, 'encoder_dropout': 0.27598362230438533, 'num_epochs': 31, 'seed': 978954468} Trial #35 | pruned at step: 1 | params: {'num_modules': 20, 'batch_size': 64, 'encoder_dropout': 0.24280844905082444, 'num_epochs': 24, 'seed': 702015538} Trial #36 | pruned at step: 1 | params: {'num_modules': 13, 'batch_size': 64, 'encoder_dropout': 0.28033774225003666, 'num_epochs': 33, 'seed': 4158652377} Trial #37 | pruned at step: 1 | params: {'num_modules': 27, 'batch_size': 32, 'encoder_dropout': 0.20960680178719157, 'num_epochs': 23, 'seed': 624162381} Trial #38 | pruned at step: 1 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.22957066844842453, 'num_epochs': 21, 'seed': 2657151855} Trial #39 | completed, score: 2.1774e-01 | params: {'num_modules': 11, 'batch_size': 64, 'encoder_dropout': 0.17337405798169025, 'num_epochs': 31, 'seed': 1517504506} Trial #40 | pruned at step: 1 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.17978803454125372, 'num_epochs': 32, 'seed': 717211117} Trial #41 | pruned at step: 1 | params: {'num_modules': 11, 'batch_size': 64, 'encoder_dropout': 0.14438470841981077, 'num_epochs': 30, 'seed': 1129367176} Trial #42 | completed, score: 2.1775e-01 | params: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.16466506203652256, 'num_epochs': 29, 'seed': 3066327037} Trial #43 | completed, score: 2.1757e-01 | params: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.17186290932951004, 'num_epochs': 29, 'seed': 1392224450} Trial #44 | pruned at step: 1 | params: {'num_modules': 8, 'batch_size': 64, 'encoder_dropout': 0.20022863775736569, 'num_epochs': 31, 'seed': 3226035130} Trial #45 | completed, score: 2.1761e-01 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.1354554213520453, 'num_epochs': 28, 'seed': 60025027} Trial #46 | pruned at step: 1 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.1345044313200267, 'num_epochs': 28, 'seed': 3790860826} Trial #47 | pruned at step: 1 | params: {'num_modules': 5, 'batch_size': 64, 'encoder_dropout': 0.08967546630891743, 'num_epochs': 33, 'seed': 353389968} Trial #48 | pruned at step: 3 | params: {'num_modules': 7, 'batch_size': 32, 'encoder_dropout': 0.05283646300373761, 'num_epochs': 26, 'seed': 4115769611} Trial #49 | pruned at step: 1 | params: {'num_modules': 12, 'batch_size': 64, 'encoder_dropout': 0.15468392608774412, 'num_epochs': 34, 'seed': 1671731908} Trial #50 | pruned at step: 1 | params: {'num_modules': 14, 'batch_size': 64, 'encoder_dropout': 0.17852418868145795, 'num_epochs': 32, 'seed': 3591213543} Trial #51 | pruned at step: 1 | params: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.14586654757027778, 'num_epochs': 29, 'seed': 1602447721} Trial #52 | pruned at step: 1 | params: {'num_modules': 8, 'batch_size': 64, 'encoder_dropout': 0.16704765172291475, 'num_epochs': 27, 'seed': 361880847} Trial #53 | pruned at step: 1 | params: {'num_modules': 6, 'batch_size': 64, 'encoder_dropout': 0.11112936966046322, 'num_epochs': 30, 'seed': 881998067} Trial #54 | pruned at step: 1 | params: {'num_modules': 7, 'batch_size': 64, 'encoder_dropout': 0.13000187242707179, 'num_epochs': 26, 'seed': 4238689834} Trial #55 | completed, score: 2.1760e-01 | params: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.1773476873677264, 'num_epochs': 31, 'seed': 2662312188} Trial #56 | completed, score: 2.1782e-01 | params: {'num_modules': 12, 'batch_size': 64, 'encoder_dropout': 0.15674095609337468, 'num_epochs': 35, 'seed': 2506156415} Trial #57 | completed, score: 2.1772e-01 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.176929365471849, 'num_epochs': 31, 'seed': 3127708157} Trial #58 | pruned at step: 1 | params: {'num_modules': 6, 'batch_size': 64, 'encoder_dropout': 0.18590635829814223, 'num_epochs': 32, 'seed': 3640202356} Trial #59 | pruned at step: 3 | params: {'num_modules': 8, 'batch_size': 128, 'encoder_dropout': 0.034590965961307674, 'num_epochs': 36, 'seed': 1146938458} Trial #60 | completed, score: 2.1764e-01 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.07157944422707452, 'num_epochs': 35, 'seed': 805363870} Trial #61 | pruned at step: 1 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.07545847747972005, 'num_epochs': 37, 'seed': 3069757599} Trial #62 | pruned at step: 1 | params: {'num_modules': 7, 'batch_size': 64, 'encoder_dropout': 0.20060689152104288, 'num_epochs': 33, 'seed': 2467548189} Trial #63 | pruned at step: 1 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.08371876713311742, 'num_epochs': 39, 'seed': 1291970473}
WARNING:kladi.matrix_models.scipm_base:Interrupted training.
This function returns an optuna
study, for which you can see methods and visualization methods here: Optuna Study
The best models trained in the procedure above must now be retrained with all the training data, and compared on how well they minimize test-set loss. The best model from this final comparison is then fit again using all data and returned to the user.
All of this is performed by the ExpressionTrainer.select_best_model
function. Changing top_n_models
changes how many models from the optimization procedure will be compared for validation set loss.
model = estimator.select_best_model(train_expr, test_expr, top_n_models = 5)
INFO:root:Training model with parameters: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.17186290932951004, 'num_epochs': 29, 'seed': 1392224450} INFO:root:Score: 2.14799e-01 INFO:root:Training model with parameters: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.1773476873677264, 'num_epochs': 31, 'seed': 2662312188} INFO:root:Score: 2.14877e-01 INFO:root:Training model with parameters: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.1354554213520453, 'num_epochs': 28, 'seed': 60025027} INFO:root:Score: 2.14850e-01 INFO:root:Training model with parameters: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.1615287147217766, 'num_epochs': 34, 'seed': 908241052} INFO:root:Score: 2.15023e-01 INFO:root:Training model with parameters: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.07157944422707452, 'num_epochs': 35, 'seed': 805363870} INFO:root:Score: 2.14888e-01 INFO:root:Set parameters to best combination: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.17186290932951004, 'num_epochs': 29, 'seed': 1392224450} INFO:root:Training model with all data.
The result of this function is an ExpressionModel
object which contains the methods needed for downstream analysis of the discovered modules. Now is a good time to save the model. The trainer object implements the save
function, which simply takes a file name and saves the parameters and weights needed to recreate the best model.
estimator.save('data/mouse_prostate/best_model.pth')
To reload the model later, simply use:
model = ExpressionTrainer.load('data/mouse_prostate/best_model.pth')
With the trained model, there are three main functions we can use to represent and visualize the data:
model = ExpressionTrainer.load('data/mouse_prostate/best_model.pth') # load model
rna_topic_names = ['rna_{}'.format(str(i)) for i in range(model.num_topics)] # get a column name for each topic
rna_data.obs[rna_topic_names] = model.predict(rna_data.X) # predict topic compositions and add them to "obs" columns
rna_data.layers['imputed'] = model.impute(model.predict(rna_data.X)) # impute gene expression to a new layer in the anndata
rna_data.obsm['umap_features'] = model.get_UMAP_features(rna_data.X) # get real-space embedding of cells
sc.pp.neighbors(rna_data, use_rep = 'umap_features') # run scanpy's UMAP
sc.tl.umap(rna_data, min_dist = 0.3, negative_sample_rate = 5)
sc.pl.umap(rna_data, color = rna_topic_names, color_map = 'viridis', frameon = False) # visualize topics
from kladi.matrix_models.estimator import AccessibilityTrainer, ExpressionTrainer
from sklearn.model_selection import train_test_split, ShuffleSplit
import scanpy as sc
import anndata
import numpy as np
import logging
logging.basicConfig(level=logging.INFO)
atac_data = anndata.read_h5ad('data/mouse_prostate/atac_data.h5ad')
atac_data.var[['chr','start','end']] = atac_data.var.id.str.split('_', expand = True) # make chrom, start, and end columns, alternatively, make a column of <chr>:<start>-<end>
Feature selection in scATAC-seq is not as well defined as in scRNA-seq, where dispersed genes are well-established as good features for differentiating cell states. No such analogue is accepted in scATAC-seq so there are two options for choosing features for the encoder network:
In this example, I do not subset the features of the accessibility model.
The procedure and interface for hyperparameter selection are identical, except we use AccessibilityTrainer
instead of ExpressionTrainer
. Instead of passing gene names to the features
argument, instead pass the peak locations in the format:
[[chr, start, end], ...]
One can also pass a list of strings in the format:
["<chr>:<start>-<end>", ...]
train_acc, test_acc = train_test_split(atac_data.X, random_state = 2556, train_size = 0.8) #make train and validation set
atac_estimator = AccessibilityTrainer(features = atac_data.var[['chr','start','end']].values.tolist()) # instantiate estimator, leaving highly_variable argument as None
atac_estimator.tune_learning_rate_bounds(train_acc, eval_every = 10, num_epochs = 7) # increase the number of epochs of the test to get more sampled points
(4.261911782300387e-06, 0.08590683144559928)
atac_estimator.trim_learning_rate_bounds(5, 0.5) #trim in lower bound aggressively
atac_estimator.plot_learning_rate_bounds()
<AxesSubplot:xlabel='Learning Rate', ylabel='Loss'>
atac_study = atac_estimator.tune_hyperparameters(train_acc)
Trials finished: 161 | Best trial: 53 | Best score: 2.0560e-01 Modules | Trials (number is #folds tested) 5 | 1 1 6 | 1 8 | 0 0 9 | 5 0 5 10 | 1 0 11 | 0 1 0 12 | 1 1 3 1 3 13 | 0 1 0 1 3 1 5 14 | 1 3 5 3 1 3 1 1 3 1 15 | 5 0 5 1 5 5 1 0 1 0 1 1 1 0 3 3 16 | 5 3 5 5 0 5 1 1 1 3 3 3 1 5 1 1 1 17 | 1 1 5 5 5 5 5 1 3 5 5 1 1 1 3 1 1 3 5 18 | 1 1 1 3 1 5 1 5 1 1 3 5 1 1 1 19 | 5 1 1 1 5 1 3 1 3 5 3 1 1 1 20 | 1 0 5 3 1 1 1 1 1 5 1 1 21 | 5 5 3 1 3 0 1 22 | 1 3 1 1 1 3 23 | 3 1 1 24 | 1 0 1 25 | 0 1 26 | 3 27 | 1 1 29 | 1 1 30 | 0 32 | 1 34 | 1 38 | 0 39 | 1 49 | 1 52 | 0 Trial Information: Trial #0 | completed, score: 2.0596e-01 | params: {'num_modules': 9, 'batch_size': 32, 'encoder_dropout': 0.10389355044650528, 'num_epochs': 28, 'seed': 228295817} Trial #1 | pruned at step: 0 | params: {'num_modules': 9, 'batch_size': 64, 'encoder_dropout': 0.12254635139845542, 'num_epochs': 31, 'seed': 1218731039} Trial #2 | completed, score: 2.0586e-01 | params: {'num_modules': 9, 'batch_size': 32, 'encoder_dropout': 0.12588953117923532, 'num_epochs': 39, 'seed': 2515659408} Trial #3 | pruned at step: 0 | params: {'num_modules': 52, 'batch_size': 128, 'encoder_dropout': 0.2889427071470159, 'num_epochs': 34, 'seed': 2281583406} Trial #4 | pruned at step: 0 | params: {'num_modules': 8, 'batch_size': 64, 'encoder_dropout': 0.053870229594672754, 'num_epochs': 23, 'seed': 4016119507} Trial #5 | pruned at step: 1 | params: {'num_modules': 5, 'batch_size': 32, 'encoder_dropout': 0.1147990103961171, 'num_epochs': 31, 'seed': 1047013955} Trial #6 | completed, score: 2.0582e-01 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.08590792914318666, 'num_epochs': 22, 'seed': 4236641183} Trial #7 | pruned at step: 0 | params: {'num_modules': 8, 'batch_size': 128, 'encoder_dropout': 0.14189908852740007, 'num_epochs': 21, 'seed': 2350666489} Trial #8 | pruned at step: 0 | params: {'num_modules': 30, 'batch_size': 128, 'encoder_dropout': 0.02914314546204092, 'num_epochs': 25, 'seed': 2572255210} Trial #9 | pruned at step: 0 | params: {'num_modules': 13, 'batch_size': 32, 'encoder_dropout': 0.0690087721622401, 'num_epochs': 40, 'seed': 1916803187} Trial #10 | pruned at step: 1 | params: {'num_modules': 24, 'batch_size': 32, 'encoder_dropout': 0.21028880515248, 'num_epochs': 20, 'seed': 2984090518} Trial #11 | completed, score: 2.0574e-01 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.18809300306736054, 'num_epochs': 38, 'seed': 3226306854} Trial #12 | pruned at step: 1 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.19125771111155032, 'num_epochs': 25, 'seed': 1213432305} Trial #13 | pruned at step: 3 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.23461689764921545, 'num_epochs': 36, 'seed': 1921601664} Trial #14 | pruned at step: 1 | params: {'num_modules': 39, 'batch_size': 32, 'encoder_dropout': 0.18322212861119333, 'num_epochs': 22, 'seed': 2360536277} Trial #15 | pruned at step: 0 | params: {'num_modules': 15, 'batch_size': 64, 'encoder_dropout': 0.2821041617847423, 'num_epochs': 27, 'seed': 2779389437} Trial #16 | pruned at step: 1 | params: {'num_modules': 12, 'batch_size': 32, 'encoder_dropout': 0.08083646658710551, 'num_epochs': 20, 'seed': 586201717} Trial #17 | completed, score: 2.0566e-01 | params: {'num_modules': 21, 'batch_size': 32, 'encoder_dropout': 0.011902156745160403, 'num_epochs': 31, 'seed': 2712611615} Trial #18 | pruned at step: 1 | params: {'num_modules': 27, 'batch_size': 32, 'encoder_dropout': 0.24159809078641048, 'num_epochs': 32, 'seed': 2546226694} Trial #19 | pruned at step: 0 | params: {'num_modules': 38, 'batch_size': 64, 'encoder_dropout': 0.011193537535368692, 'num_epochs': 36, 'seed': 1637843316} Trial #20 | pruned at step: 0 | params: {'num_modules': 20, 'batch_size': 128, 'encoder_dropout': 0.1644734066415834, 'num_epochs': 38, 'seed': 265030752} Trial #21 | completed, score: 2.0564e-01 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.03639795648799181, 'num_epochs': 30, 'seed': 3812164452} Trial #22 | pruned at step: 3 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.0332788828163328, 'num_epochs': 30, 'seed': 2545295328} Trial #23 | pruned at step: 1 | params: {'num_modules': 34, 'batch_size': 32, 'encoder_dropout': 0.017274788184885034, 'num_epochs': 34, 'seed': 263028816} Trial #24 | pruned at step: 3 | params: {'num_modules': 23, 'batch_size': 32, 'encoder_dropout': 0.052774696355755116, 'num_epochs': 28, 'seed': 1274239121} Trial #25 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.16242947840462024, 'num_epochs': 29, 'seed': 3063600945} Trial #26 | pruned at step: 1 | params: {'num_modules': 12, 'batch_size': 32, 'encoder_dropout': 0.23311904729391858, 'num_epochs': 33, 'seed': 79202905} Trial #27 | pruned at step: 3 | params: {'num_modules': 26, 'batch_size': 32, 'encoder_dropout': 0.2620539161334563, 'num_epochs': 26, 'seed': 3981473747} Trial #28 | pruned at step: 1 | params: {'num_modules': 49, 'batch_size': 32, 'encoder_dropout': 0.19562217054331496, 'num_epochs': 36, 'seed': 716881755} Trial #29 | pruned at step: 1 | params: {'num_modules': 10, 'batch_size': 32, 'encoder_dropout': 0.09900315701693793, 'num_epochs': 29, 'seed': 2219496828} Trial #30 | pruned at step: 1 | params: {'num_modules': 6, 'batch_size': 32, 'encoder_dropout': 0.13893830237495347, 'num_epochs': 38, 'seed': 4259644331} Trial #31 | pruned at step: 1 | params: {'num_modules': 14, 'batch_size': 32, 'encoder_dropout': 0.08601381240214803, 'num_epochs': 24, 'seed': 3235129114} Trial #32 | pruned at step: 1 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.04650222697851749, 'num_epochs': 27, 'seed': 2072056352} Trial #33 | pruned at step: 1 | params: {'num_modules': 22, 'batch_size': 32, 'encoder_dropout': 0.02812604254365346, 'num_epochs': 31, 'seed': 2613656404} Trial #34 | pruned at step: 0 | params: {'num_modules': 10, 'batch_size': 64, 'encoder_dropout': 0.06443269587354808, 'num_epochs': 33, 'seed': 275252633} Trial #35 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.011054405339309566, 'num_epochs': 29, 'seed': 3024465112} Trial #36 | completed, score: 2.0575e-01 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.11097171385522356, 'num_epochs': 30, 'seed': 1374229916} Trial #37 | pruned at step: 0 | params: {'num_modules': 11, 'batch_size': 128, 'encoder_dropout': 0.1172251016646839, 'num_epochs': 31, 'seed': 1452105051} Trial #38 | pruned at step: 1 | params: {'num_modules': 29, 'batch_size': 32, 'encoder_dropout': 0.13190587963004186, 'num_epochs': 30, 'seed': 1955615844} Trial #39 | pruned at step: 1 | params: {'num_modules': 15, 'batch_size': 64, 'encoder_dropout': 0.1618711178104902, 'num_epochs': 27, 'seed': 3493982124} Trial #40 | pruned at step: 3 | params: {'num_modules': 14, 'batch_size': 32, 'encoder_dropout': 0.10264254243635833, 'num_epochs': 35, 'seed': 2826496505} Trial #41 | pruned at step: 1 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.08418922460171085, 'num_epochs': 32, 'seed': 1205010134} Trial #42 | completed, score: 2.0572e-01 | params: {'num_modules': 21, 'batch_size': 32, 'encoder_dropout': 0.03744303282730702, 'num_epochs': 30, 'seed': 2485055202} Trial #43 | pruned at step: 3 | params: {'num_modules': 22, 'batch_size': 32, 'encoder_dropout': 0.0419949491813156, 'num_epochs': 28, 'seed': 2423801331} Trial #44 | completed, score: 2.0565e-01 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.06523594139713146, 'num_epochs': 30, 'seed': 1922046482} Trial #45 | pruned at step: 1 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.06474213680850821, 'num_epochs': 32, 'seed': 3404472361} Trial #46 | pruned at step: 0 | params: {'num_modules': 25, 'batch_size': 128, 'encoder_dropout': 0.023457999344733527, 'num_epochs': 31, 'seed': 3632620502} Trial #47 | pruned at step: 1 | params: {'num_modules': 32, 'batch_size': 32, 'encoder_dropout': 0.03859754756142853, 'num_epochs': 40, 'seed': 3580964587} Trial #48 | pruned at step: 1 | params: {'num_modules': 22, 'batch_size': 32, 'encoder_dropout': 0.18119077045907853, 'num_epochs': 34, 'seed': 2992804951} Trial #49 | pruned at step: 1 | params: {'num_modules': 29, 'batch_size': 32, 'encoder_dropout': 0.055621393398293806, 'num_epochs': 26, 'seed': 2430373426} Trial #50 | pruned at step: 1 | params: {'num_modules': 13, 'batch_size': 32, 'encoder_dropout': 0.2096928885540356, 'num_epochs': 29, 'seed': 1610124086} Trial #51 | completed, score: 2.0568e-01 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.07849545989707775, 'num_epochs': 30, 'seed': 3349166406} Trial #52 | pruned at step: 1 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.07039142262522084, 'num_epochs': 30, 'seed': 1330250945} Trial #53 | completed, score: 2.0560e-01 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.023435233664846504, 'num_epochs': 33, 'seed': 2862597996} Trial #54 | pruned at step: 3 | params: {'num_modules': 21, 'batch_size': 32, 'encoder_dropout': 0.018851674710504283, 'num_epochs': 32, 'seed': 3866493577} Trial #55 | pruned at step: 1 | params: {'num_modules': 25, 'batch_size': 32, 'encoder_dropout': 0.03480802939384452, 'num_epochs': 33, 'seed': 240151169} Trial #56 | completed, score: 2.0565e-01 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.052010360054768875, 'num_epochs': 31, 'seed': 1074898710} Trial #57 | pruned at step: 0 | params: {'num_modules': 16, 'batch_size': 128, 'encoder_dropout': 0.05531811710834656, 'num_epochs': 28, 'seed': 3974676987} Trial #58 | pruned at step: 0 | params: {'num_modules': 13, 'batch_size': 64, 'encoder_dropout': 0.04795443676927117, 'num_epochs': 31, 'seed': 3295110931} Trial #59 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.07240645132650964, 'num_epochs': 34, 'seed': 1927702553} Trial #60 | completed, score: 2.0563e-01 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.01776887863072775, 'num_epochs': 35, 'seed': 2580316063} Trial #61 | completed, score: 2.0566e-01 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.019495825220545798, 'num_epochs': 35, 'seed': 4255763918} Trial #62 | completed, score: 2.0566e-01 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.010486861720264359, 'num_epochs': 35, 'seed': 4025270420} Trial #63 | pruned at step: 1 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.011026158822794355, 'num_epochs': 36, 'seed': 720451436} Trial #64 | completed, score: 2.0567e-01 | params: {'num_modules': 14, 'batch_size': 32, 'encoder_dropout': 0.02952862329820604, 'num_epochs': 37, 'seed': 3327767546} Trial #65 | pruned at step: 3 | params: {'num_modules': 12, 'batch_size': 32, 'encoder_dropout': 0.012293599755370897, 'num_epochs': 33, 'seed': 2890663510} Trial #66 | pruned at step: 1 | params: {'num_modules': 23, 'batch_size': 32, 'encoder_dropout': 0.024541410690161594, 'num_epochs': 35, 'seed': 396889065} Trial #67 | pruned at step: 1 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.010002309589335751, 'num_epochs': 33, 'seed': 1754304243} Trial #68 | pruned at step: 1 | params: {'num_modules': 27, 'batch_size': 32, 'encoder_dropout': 0.047981750356582434, 'num_epochs': 32, 'seed': 1117306831} Trial #69 | pruned at step: 1 | params: {'num_modules': 13, 'batch_size': 32, 'encoder_dropout': 0.05854176578858319, 'num_epochs': 31, 'seed': 4264551506} Trial #70 | completed, score: 2.0567e-01 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.027854447584308446, 'num_epochs': 34, 'seed': 2572621874} Trial #71 | pruned at step: 1 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.01834351529203319, 'num_epochs': 37, 'seed': 4050995852} Trial #72 | pruned at step: 3 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.040610246485688783, 'num_epochs': 35, 'seed': 1837468695} Trial #73 | completed, score: 2.0565e-01 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.018598234094341984, 'num_epochs': 37, 'seed': 560453205} Trial #74 | completed, score: 2.0564e-01 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.03361815241800496, 'num_epochs': 37, 'seed': 2913877872} Trial #75 | pruned at step: 3 | params: {'num_modules': 14, 'batch_size': 32, 'encoder_dropout': 0.03245500985579925, 'num_epochs': 39, 'seed': 2932358894} Trial #76 | pruned at step: 1 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.04614917649143428, 'num_epochs': 37, 'seed': 4079532723} Trial #77 | pruned at step: 1 | params: {'num_modules': 11, 'batch_size': 32, 'encoder_dropout': 0.06132321273052456, 'num_epochs': 39, 'seed': 139189352} Trial #78 | pruned at step: 0 | params: {'num_modules': 15, 'batch_size': 64, 'encoder_dropout': 0.025159113134980142, 'num_epochs': 38, 'seed': 3024389979} Trial #79 | completed, score: 2.0565e-01 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.03521127248262236, 'num_epochs': 36, 'seed': 1337204928} Trial #80 | pruned at step: 0 | params: {'num_modules': 11, 'batch_size': 128, 'encoder_dropout': 0.051443941428344124, 'num_epochs': 29, 'seed': 448972871} Trial #81 | pruned at step: 1 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.034042825794293366, 'num_epochs': 36, 'seed': 1646168857} Trial #82 | completed, score: 2.0562e-01 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.04236474409721123, 'num_epochs': 37, 'seed': 3655390847} Trial #83 | pruned at step: 1 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.042908379069666165, 'num_epochs': 37, 'seed': 2892868083} Trial #84 | pruned at step: 1 | params: {'num_modules': 14, 'batch_size': 32, 'encoder_dropout': 0.038745956671641914, 'num_epochs': 36, 'seed': 1044982287} Trial #85 | pruned at step: 1 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.07311179696803564, 'num_epochs': 40, 'seed': 1112675964} Trial #86 | pruned at step: 1 | params: {'num_modules': 23, 'batch_size': 32, 'encoder_dropout': 0.019146869980166684, 'num_epochs': 38, 'seed': 2032320073} Trial #87 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.06350885560045766, 'num_epochs': 39, 'seed': 3820074914} Trial #88 | pruned at step: 1 | params: {'num_modules': 12, 'batch_size': 32, 'encoder_dropout': 0.05429510222011671, 'num_epochs': 34, 'seed': 2394857605} Trial #89 | pruned at step: 1 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.09432206170064696, 'num_epochs': 30, 'seed': 2771593334} Trial #90 | pruned at step: 1 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.031117839313375403, 'num_epochs': 33, 'seed': 771407808} Trial #91 | pruned at step: 3 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.01606680035986297, 'num_epochs': 34, 'seed': 3205355333} Trial #92 | pruned at step: 1 | params: {'num_modules': 21, 'batch_size': 32, 'encoder_dropout': 0.02249969504593315, 'num_epochs': 35, 'seed': 678751142} Trial #93 | pruned at step: 3 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.010493087324880496, 'num_epochs': 36, 'seed': 2681375561} Trial #94 | pruned at step: 3 | params: {'num_modules': 13, 'batch_size': 32, 'encoder_dropout': 0.042444164805105836, 'num_epochs': 37, 'seed': 3474824180} Trial #95 | pruned at step: 1 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.03608183510962659, 'num_epochs': 35, 'seed': 3278268415} Trial #96 | pruned at step: 3 | params: {'num_modules': 14, 'batch_size': 32, 'encoder_dropout': 0.02512705425743971, 'num_epochs': 36, 'seed': 144126579} Trial #97 | completed, score: 2.0561e-01 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.05037912250709933, 'num_epochs': 32, 'seed': 154455071} Trial #98 | completed, score: 2.0566e-01 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.05032016727655558, 'num_epochs': 32, 'seed': 2636263445} Trial #99 | pruned at step: 1 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.09000849488871861, 'num_epochs': 31, 'seed': 1547215512} Trial #100 | pruned at step: 0 | params: {'num_modules': 15, 'batch_size': 64, 'encoder_dropout': 0.06733010704384498, 'num_epochs': 31, 'seed': 3097964875} Trial #101 | completed, score: 2.0569e-01 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.02950661353245656, 'num_epochs': 33, 'seed': 680510000} Trial #102 | pruned at step: 1 | params: {'num_modules': 22, 'batch_size': 32, 'encoder_dropout': 0.057652351165283576, 'num_epochs': 32, 'seed': 3421283374} Trial #103 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.07788226950402163, 'num_epochs': 32, 'seed': 889646816} Trial #104 | pruned at step: 3 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.01626829366069893, 'num_epochs': 29, 'seed': 2171098739} Trial #105 | pruned at step: 1 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.04340771840877851, 'num_epochs': 31, 'seed': 1785711045} Trial #106 | pruned at step: 3 | params: {'num_modules': 21, 'batch_size': 32, 'encoder_dropout': 0.034299841144153506, 'num_epochs': 34, 'seed': 3716770287} Trial #107 | pruned at step: 0 | params: {'num_modules': 24, 'batch_size': 128, 'encoder_dropout': 0.0497512912771102, 'num_epochs': 29, 'seed': 2791188468} Trial #108 | pruned at step: 1 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.02291132401173977, 'num_epochs': 30, 'seed': 1250359869} Trial #109 | pruned at step: 1 | params: {'num_modules': 14, 'batch_size': 32, 'encoder_dropout': 0.01454543591056882, 'num_epochs': 37, 'seed': 2726404692} Trial #110 | pruned at step: 1 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.036922419148683994, 'num_epochs': 30, 'seed': 4214785995} Trial #111 | completed, score: 2.0562e-01 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.050939973516588494, 'num_epochs': 32, 'seed': 2269889261} Trial #112 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.05828340438290992, 'num_epochs': 32, 'seed': 984044782} Trial #113 | pruned at step: 3 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.04702595925787219, 'num_epochs': 31, 'seed': 2956798786} Trial #114 | pruned at step: 1 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.0310763294191687, 'num_epochs': 35, 'seed': 4141346043} Trial #115 | pruned at step: 1 | params: {'num_modules': 13, 'batch_size': 32, 'encoder_dropout': 0.05195362328350197, 'num_epochs': 32, 'seed': 830668770} Trial #116 | pruned at step: 1 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.04310106051226367, 'num_epochs': 33, 'seed': 1135042061} Trial #117 | pruned at step: 3 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.06409537255429468, 'num_epochs': 33, 'seed': 3822486287} Trial #118 | pruned at step: 3 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.02395285804681395, 'num_epochs': 35, 'seed': 2035196747} Trial #119 | completed, score: 2.0568e-01 | params: {'num_modules': 13, 'batch_size': 32, 'encoder_dropout': 0.03780354951815309, 'num_epochs': 38, 'seed': 51390491} Trial #120 | pruned at step: 1 | params: {'num_modules': 14, 'batch_size': 32, 'encoder_dropout': 0.010174796135020394, 'num_epochs': 31, 'seed': 167535990} Trial #121 | pruned at step: 1 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.051456054271650245, 'num_epochs': 32, 'seed': 1177370093} Trial #122 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.07213013446367861, 'num_epochs': 30, 'seed': 1648252731} Trial #123 | pruned at step: 1 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.02837385030808283, 'num_epochs': 30, 'seed': 482286695} Trial #124 | completed, score: 2.0565e-01 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.04585036213259017, 'num_epochs': 31, 'seed': 4204969044} Trial #125 | pruned at step: 1 | params: {'num_modules': 22, 'batch_size': 32, 'encoder_dropout': 0.059209747385105094, 'num_epochs': 31, 'seed': 3667098309} Trial #126 | pruned at step: 1 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.04128461779494112, 'num_epochs': 31, 'seed': 186561821} Trial #127 | pruned at step: 1 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.01857868854493862, 'num_epochs': 29, 'seed': 3962320643} Trial #128 | pruned at step: 3 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.033091169667010555, 'num_epochs': 34, 'seed': 3931646744} Trial #129 | pruned at step: 3 | params: {'num_modules': 12, 'batch_size': 32, 'encoder_dropout': 0.04676018530451509, 'num_epochs': 36, 'seed': 462528523} Trial #130 | pruned at step: 0 | params: {'num_modules': 21, 'batch_size': 64, 'encoder_dropout': 0.15156883989912437, 'num_epochs': 32, 'seed': 1064071190} Trial #131 | pruned at step: 1 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.0535807023800095, 'num_epochs': 33, 'seed': 1495632457} Trial #132 | pruned at step: 3 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.03854262056843039, 'num_epochs': 37, 'seed': 2379144098} Trial #133 | pruned at step: 3 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.06690127678267768, 'num_epochs': 30, 'seed': 743648687} Trial #134 | pruned at step: 1 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.027563486154057137, 'num_epochs': 30, 'seed': 1069956494} Trial #135 | completed, score: 2.0563e-01 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.04873281187614031, 'num_epochs': 32, 'seed': 714225390} Trial #136 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.04410959336420687, 'num_epochs': 37, 'seed': 3973102571} Trial #137 | pruned at step: 1 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.06150908597091133, 'num_epochs': 31, 'seed': 1716468110} Trial #138 | pruned at step: 0 | params: {'num_modules': 15, 'batch_size': 128, 'encoder_dropout': 0.021327336223685316, 'num_epochs': 33, 'seed': 686598930} Trial #139 | completed, score: 2.0562e-01 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.03362340108391326, 'num_epochs': 30, 'seed': 2136284932} Trial #140 | pruned at step: 1 | params: {'num_modules': 24, 'batch_size': 32, 'encoder_dropout': 0.034381083163686856, 'num_epochs': 31, 'seed': 1646129826} Trial #141 | pruned at step: 1 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.028257097416209662, 'num_epochs': 29, 'seed': 895278130} Trial #142 | pruned at step: 1 | params: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.04419786326773665, 'num_epochs': 30, 'seed': 4221769933} Trial #143 | completed, score: 2.0565e-01 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.05319247495223046, 'num_epochs': 31, 'seed': 3737200378} Trial #144 | pruned at step: 1 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.05501592864067638, 'num_epochs': 32, 'seed': 1740365083} Trial #145 | pruned at step: 3 | params: {'num_modules': 14, 'batch_size': 32, 'encoder_dropout': 0.048684978813727686, 'num_epochs': 29, 'seed': 2590805364} Trial #146 | pruned at step: 3 | params: {'num_modules': 22, 'batch_size': 32, 'encoder_dropout': 0.03815665014685657, 'num_epochs': 29, 'seed': 2030732230} Trial #147 | pruned at step: 1 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.057572365439320455, 'num_epochs': 29, 'seed': 3954254318} Trial #148 | pruned at step: 1 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.07857539260203023, 'num_epochs': 30, 'seed': 3179813595} Trial #149 | pruned at step: 1 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.04864866600797294, 'num_epochs': 32, 'seed': 1907508700} Trial #150 | pruned at step: 1 | params: {'num_modules': 21, 'batch_size': 32, 'encoder_dropout': 0.040600182836535575, 'num_epochs': 32, 'seed': 3007089249} Trial #151 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.013736415624482655, 'num_epochs': 30, 'seed': 3201492304} Trial #152 | pruned at step: 3 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.02260311094511795, 'num_epochs': 30, 'seed': 1386249212} Trial #153 | pruned at step: 1 | params: {'num_modules': 5, 'batch_size': 32, 'encoder_dropout': 0.2973128959155331, 'num_epochs': 31, 'seed': 720719870} Trial #154 | pruned at step: 1 | params: {'num_modules': 14, 'batch_size': 32, 'encoder_dropout': 0.03167673086875623, 'num_epochs': 31, 'seed': 3789882838} Trial #155 | pruned at step: 1 | params: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.035320836519760465, 'num_epochs': 31, 'seed': 575705489} Trial #156 | pruned at step: 3 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.054298559662832875, 'num_epochs': 36, 'seed': 826714973} Trial #157 | pruned at step: 1 | params: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.010265330124322538, 'num_epochs': 31, 'seed': 1742631409} Trial #158 | completed, score: 2.0564e-01 | params: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.027468898854720585, 'num_epochs': 30, 'seed': 1195997241} Trial #159 | pruned at step: 1 | params: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.028371903013011762, 'num_epochs': 29, 'seed': 1472097351} Trial #160 | pruned at step: 3 | params: {'num_modules': 15, 'batch_size': 32, 'encoder_dropout': 0.04040084039964536, 'num_epochs': 32, 'seed': 1489168809}
WARNING:kladi.matrix_models.scipm_base:Interrupted training.
atac_model = atac_estimator.select_best_model(train_acc, test_acc)
INFO:root:Training model with parameters: {'num_modules': 16, 'batch_size': 32, 'encoder_dropout': 0.023435233664846504, 'num_epochs': 33, 'seed': 2862597996} INFO:root:Score: 2.01356e-01 INFO:root:Training model with parameters: {'num_modules': 17, 'batch_size': 32, 'encoder_dropout': 0.05037912250709933, 'num_epochs': 32, 'seed': 154455071} INFO:root:Score: 2.01183e-01 INFO:root:Training model with parameters: {'num_modules': 20, 'batch_size': 32, 'encoder_dropout': 0.03362340108391326, 'num_epochs': 30, 'seed': 2136284932} INFO:root:Score: 2.01352e-01 INFO:root:Training model with parameters: {'num_modules': 19, 'batch_size': 32, 'encoder_dropout': 0.04236474409721123, 'num_epochs': 37, 'seed': 3655390847} INFO:root:Score: 2.01173e-01 INFO:root:Training model with parameters: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.050939973516588494, 'num_epochs': 32, 'seed': 2269889261} INFO:root:Score: 2.01172e-01 INFO:root:Set parameters to best combination: {'num_modules': 18, 'batch_size': 32, 'encoder_dropout': 0.050939973516588494, 'num_epochs': 32, 'seed': 2269889261} INFO:root:Training model with all data.
atac_estimator.save('data/mouse_prostate/best_atac_model.pth')
The accessibility model has similar functions as the expression model for representation, except impute
is not available due to the memory constrains of imputing every peak in every cell with a dense probablility.
atac_model = AccessibilityTrainer.load('data/mouse_prostate/best_atac_model.pth')
atac_topic_names = ['atac_{}'.format(str(i)) for i in range(atac_model.num_topics)] # get a column name for each topic
atac_data.obs[atac_topic_names] = atac_model.predict(atac_data.X) # predict topic compositions and add them to "obs" columns
atac_data.obsm['umap_features'] = atac_model.get_UMAP_features(atac_data.X) # get real-space embedding of cells
100%|██████████| 10/10 [00:12<00:00, 1.26s/it] 100%|██████████| 10/10 [00:12<00:00, 1.26s/it]
sc.pp.neighbors(atac_data, use_rep = 'umap_features') # run scanpy's UMAP
sc.tl.umap(atac_data, min_dist = 0.3, negative_sample_rate = 5)
sc.pl.umap(atac_data, color = atac_topic_names, color_map = 'viridis', frameon = False) # visualize topics
To create the joint representation, we need to find cells that passed QC for both assays by intersecting the cell barcodes. Unfortunately for this assay, only 1300 cells passed both scRNA and scATAC QC thresholds.
overlapping_barcodes = np.intersect1d(rna_data.obs_names, atac_data.obs_names) # overlap barcodes of cells that passed both assays' QC
overlapping_barcodes.shape
(1300,)
If I look at what sorts of cells are shared between the two assays, they appear to be evenly-mixed in between major cell types in the RNA-seq representation (bottom), but according to the ATAC-seq representation (top), we will lose a major cell population in the joint representation. This cell type may not be captured well by scRNA-seq or may have too few transcripts to pass QC.
atac_data.obs['shared_cell'] = atac_data.obs_names.isin(overlapping_barcodes).astype(str)
rna_data.obs['shared_cell'] = rna_data.obs_names.isin(overlapping_barcodes).astype(str)
sc.pl.umap(atac_data, color = 'shared_cell', color_map = 'Set2', frameon = False)
sc.pl.umap(rna_data, color = 'shared_cell', color_map = 'Set2', frameon = False)
... storing 'shared_cell' as categorical
... storing 'shared_cell' as categorical
shared_rna, shared_atac = rna_data[overlapping_barcodes].copy(), atac_data[overlapping_barcodes].copy() # subset dataframes using same cells
shared_rna.obsm['joint_features'] = np.hstack([shared_rna.obsm['umap_features'], shared_atac.obsm['umap_features']]) # create joint features by concatenating RNA and ATAC UMAP features
shared_atac.obsm['joint_features'] = shared_rna.obsm['joint_features'] # share joint features with scATAC-seq anndata
Since I lost a significant portion of cells by joining the representations, this UMAP loses some clarity, but it still shows separated cell types. The RNA-seq topics still appear coherent with respect ot the cell representations.
sc.pp.neighbors(shared_rna, use_rep = 'joint_features') # run scanpy's UMAP
sc.tl.umap(shared_rna, min_dist = 0.3, negative_sample_rate = 5)
sc.pl.umap(shared_rna, color = rna_topic_names, frameon=False)
Continue with previous analysis notebook ...