{% if favicon_base64 %} {% endif %}
Uncertainty Analysis Report
Type: | {{ model_type|default('Unknown') }} |
Features: | {{ features|length|default(0) }} |
Primary Metric: | {{ metric|default('Accuracy')|upper }} |
Sensitive Features: | {{ sensitive_features|length|default(0) }} |
Alternative Models: | {{ report_data.alternative_models|length|default(0) }} |
Uncertainty Score: | {{ (uncertainty_score if uncertainty_score is not none else 0)|round(4) }} |
Coverage: | {{ (coverage if coverage is not none else 0)|round(4) }} |
Mean Width: | {{ (mean_width if mean_width is not none else 0)|round(4) }} |
Calibration Size: | {{ cal_size }} |
Generation Time | {{ timestamp }} |
---|---|
Sensitive Features | {{ sensitive_features|join(', ') }} |
Metric | {{ metric|default('Accuracy') }} |
Report Type | Static (non-interactive) |
Model | Uncertainty Score | Coverage | Mean Width | {% if metrics %} {% for metric_name in metrics|sort %} {% if metric_name not in ['uncertainty_score', 'coverage', 'mean_width'] %}{{ metric_name|title }} | {% endif %} {% endfor %} {% endif %}
---|---|---|---|---|
{{ model_name }} | {{ "%.4f"|format(uncertainty_score if uncertainty_score is not none else 0) }} | {{ "%.4f"|format(coverage if coverage is not none else 0) }} | {{ "%.4f"|format(mean_width if mean_width is not none else 0) }} | {% if metrics %} {% for metric_name, metric_value in metrics.items() %} {% if metric_name not in ['uncertainty_score', 'coverage', 'mean_width'] %}{{ "%.4f"|format(metric_value if metric_value is not none else 0) }} | {% endif %} {% endfor %} {% endif %}
{{ alt_model_name }} | {{ "%.4f"|format(alt_model_data.uncertainty_score if alt_model_data.uncertainty_score is not none else 0) }} | {{ "%.4f"|format(alt_model_data.coverage if alt_model_data.coverage is not none else 0) }} | {{ "%.4f"|format(alt_model_data.mean_width if alt_model_data.mean_width is not none else 0) }} | {% if alt_model_data.metrics %} {% for metric_name, metric_value in alt_model_data.metrics.items() %} {% if metric_name not in ['uncertainty_score', 'coverage', 'mean_width'] %}{{ "%.4f"|format(metric_value if metric_value is not none else 0) }} | {% endif %} {% endfor %} {% endif %}
Compares uncertainty metrics across different models or model configurations.
Compares actual coverage with expected coverage at different alpha (confidence) levels. Closer to the diagonal line indicates better calibration.
{% if charts.coverage_vs_expected %}Shows the relationship between interval width and coverage. Efficient uncertainty estimates achieve higher coverage with narrower intervals.
{% if charts.width_vs_coverage %}Shows gaps between expected and actual coverage at different alpha levels. Values close to zero indicate well-calibrated uncertainty.
{% if charts.performance_gap_by_alpha %}Shows key uncertainty metrics for the model, including uncertainty score, coverage, and mean width.
{% if charts.uncertainty_metrics %}Shows the most important features affecting model uncertainty. Features with higher importance have greater impact on prediction intervals.
Shows feature reliability scores, indicating which features are most consistent in their impact on uncertainty quantification.
Shows the distribution of prediction interval widths across the dataset. Narrower intervals with proper coverage indicate more efficient uncertainty quantification.
Population Stability Index (PSI) scores measure the stability of feature distributions between calibration and test sets.
Feature | PSI Score |
---|---|
{{ feature }} | {{ "%.4f"|format(psi) }} |
Feature | Importance |
---|---|
{{ feature }} | {{ "%.4f"|format(importance) }} |
Shows the distribution of residuals (prediction errors) across different datasets, helping identify biases under stress conditions.
Shows which features are most correlated with model errors, helping identify potential areas for model improvement.
Compares different distance metrics (PSI, WD1, KS, etc.) across alpha levels, showing how distribution shift is captured by different metrics.
Shows the distribution shift of each feature as measured by different metrics, visualizing which features are most affected by different types of distribution shifts.
Compares resilience performance across different models under increasing stress levels. Models with more gradual decline are more resilient.
Shows how the performance gap changes across different alpha levels for each model. Models with smaller gaps at higher alpha levels demonstrate better resilience.
Compares the overall resilience score for each model. Higher scores indicate better performance under distribution shifts.
Compares different distance metrics (PSI, WD1, KS, etc.) across alpha levels, showing how distribution shift is captured by different metrics.