This table provides comprehensive metrics for evaluating different aspects of uncertainty quantification quality.

Metric Value Acceptable Range Status Description
Coverage Metrics
Average Coverage - Depends on target - Average empirical coverage across all alpha levels
Average Coverage Gap - |Gap| < 0.05 - Average difference between expected and empirical coverage
Coverage Consistency - ≥ 0.8 - Consistency of coverage performance across alpha levels
Calibration Metrics
Expected Calibration Error - < 0.05 - Weighted average of calibration errors across all bins
Maximum Calibration Error - < 0.15 - Maximum calibration error observed in any bin
Brier Score - < 0.1 - Mean squared error between predicted probabilities and outcomes
Sharpness Metrics
Average Interval Width - Domain dependent - Average width of prediction intervals (lower is sharper)
Width Variation - < 0.5 - Coefficient of variation in interval widths
Normalized Sharpness - ≥ 0.7 - Sharpness score normalized against baseline
Composite Scores
Uncertainty Score - ≥ 0.8 - Overall score for uncertainty quantification quality
Reliability-Sharpness Balance - ≥ 0.7 - Balance between reliable coverage and sharp intervals

Understanding the Metrics

Good uncertainty quantification requires balancing multiple objectives:

The composite scores combine these aspects to provide an overall assessment of uncertainty quality.