Working with the Detectron Subpackage

This tutorial guides you through the process of setting up and running the Detectron experiment using the detectron subpackage, which is designed to assess the robustness of predictive models against covariate shifts in datasets.

Step 1: Setting up the Datasets

First, configure the DatasetsManager to manage the necessary datasets for the experiment. For this, we need the training_dataset used to train the model, alongside the validation_dataset. We also need an unseen reference_dataset from the model’s domain, and finally the dataset we want to inspect if it is possibly shifted, the test_dataset.

from MED3pa.datasets import DatasetsManager

# Initialize the DatasetsManager
datasets = DatasetsManager()

# Load datasets for training, validation, reference, and testing
datasets.set_from_file(dataset_type="training", file='./path_to_train_dataset.csv', target_column_name='y_true')
datasets.set_from_file(dataset_type="validation", file='./path_to_validation_dataset.csv', target_column_name='y_true')
datasets.set_from_file(dataset_type="reference", file='./path_to_reference_dataset.csv', target_column_name='y_true')
datasets.set_from_file(dataset_type="testing", file='./path_to_test_dataset.csv', target_column_name='y_true')

datasets2 = DatasetsManager()

# Load datasets for training, validation, reference, and testing
datasets2.set_from_file(dataset_type="training", file='./data/train_data.csv', target_column_name='Outcome')
datasets2.set_from_file(dataset_type="validation", file='./data/val_data.csv', target_column_name='Outcome')
datasets2.set_from_file(dataset_type="reference", file='./data/test_data.csv', target_column_name='Outcome')
datasets2.set_from_file(dataset_type="testing", file='./data/test_data_shifted_1.6.csv', target_column_name='Outcome')

Step 2: Configuring the Model

Next, utilize the ModelFactory to load a pre-trained model, setting it as the base model for the experiment. Alternatively, you can train your own model and use it.

from MED3pa.models import BaseModelManager, ModelFactory

# Initialize the model factory and load the pre-trained model
factory = ModelFactory()
model = factory.create_model_from_pickled("./path_to_model.pkl")

# Set the base model using BaseModelManager
base_model_manager = BaseModelManager()
base_model_manager.set_base_model(model=model)

Step 3: Running the Detectron Experiment

Execute the Detectron experiment with the specified datasets and base model. You can also specify other parameters.

from MED3pa.detectron import DetectronExperiment

# Execute the Detectron experiment
experiment_results = DetectronExperiment.run(datasets=datasets, base_model_manager=base_model_manager)
experiment_results2 = DetectronExperiment.run(datasets=datasets2, base_model_manager=base_model_manager)

Step 4: Analyzing the Results

Finally, evaluate the outcomes of the experiment using different strategies to determine the probability of a shift in dataset distributions:

# Analyze the results using the disagreement strategies
test_strategies = ["enhanced_disagreement_strategy", "mannwhitney_strategy"]
experiment_results.analyze_results(test_strategies)
experiment_results2.analyze_results(test_strategies)

Output

The following output provides a detailed assessment of dataset stability:

[
    {
        "shift_probability": 0.8111111111111111,
        "test_statistic": 8.466666666666667,
        "baseline_mean": 7.4,
        "baseline_std": 1.2631530214330944,
        "significance_description": {
            "no shift": 38.34567901234568,
            "small": 15.592592592592592,
            "moderate": 16.34567901234568,
            "large": 29.716049382716047
        },
        "Strategy": "EnhancedDisagreementStrategy"
    },
    {
        "p_value": 0.00016360887668277182,
        "u_statistic": 3545.0,
        "z-score": 0.4685784328619402,
        "shift significance": "Small",
        "Strategy": "MannWhitneyStrategy"
    }
]

Step 5: Saving the Results

You can save the experiment results using the save method, while specifying the path.

experiment_results.save("./detectron_experiment_results")
experiment_results2.save("./detectron_experiment_results2")

Step 6: Comparing the Results

You can compare between two detectron experiments using the DetectronComparaison class, as follows: .. code-block:: python

from MED3pa.detectron.comparaison import DetectronComparison

comparaison = DetectronComparison(“./detectron_experiment_results”, “./detectron_experiment_results2”) comparaison.compare_experiments() comparaison.save(“./detectron_experiment_comparaison”)