This table provides detailed performance metrics for each distribution shift scenario across multiple evaluation metrics.

Shift Scenario Accuracy F1 Score AUC Composite Score
Baseline Target Gap Baseline Target Gap Baseline Target Gap
Loading detailed metrics data...

Metrics Summary

Metric Avg. Baseline Avg. Target Avg. Gap Max Gap Resilience
Accuracy - - - - -
F1 Score - - - - -
AUC - - - - -
Composite - - - - -
📊 Composite Score is a weighted average of all evaluation metrics, providing an overall measure of model performance.
â„šī¸ Gap values are highlighted based on severity: green (minor), yellow (moderate), orange (significant), and red (severe).