This table provides comprehensive metrics for evaluating different aspects of uncertainty quantification quality.
Metric | Value | Acceptable Range | Status | Description |
---|---|---|---|---|
Coverage Metrics | ||||
Average Coverage | - | Depends on target | - | Average empirical coverage across all alpha levels |
Average Coverage Gap | - | |Gap| < 0.05 | - | Average difference between expected and empirical coverage |
Coverage Consistency | - | ≥ 0.8 | - | Consistency of coverage performance across alpha levels |
Calibration Metrics | ||||
Expected Calibration Error | - | < 0.05 | - | Weighted average of calibration errors across all bins |
Maximum Calibration Error | - | < 0.15 | - | Maximum calibration error observed in any bin |
Brier Score | - | < 0.1 | - | Mean squared error between predicted probabilities and outcomes |
Sharpness Metrics | ||||
Average Interval Width | - | Domain dependent | - | Average width of prediction intervals (lower is sharper) |
Width Variation | - | < 0.5 | - | Coefficient of variation in interval widths |
Normalized Sharpness | - | ≥ 0.7 | - | Sharpness score normalized against baseline |
Composite Scores | ||||
Uncertainty Score | - | ≥ 0.8 | - | Overall score for uncertainty quantification quality |
Reliability-Sharpness Balance | - | ≥ 0.7 | - | Balance between reliable coverage and sharp intervals |
Good uncertainty quantification requires balancing multiple objectives:
The composite scores combine these aspects to provide an overall assessment of uncertainty quality.