API Documentation#
scores.continuous#
This module contains methods which may be used for continuous scoring
- scores.continuous.mae(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the mean absolute error from forecast and observed data.
A detailed explanation is on [Wikipedia](https://en.wikipedia.org/wiki/Mean_absolute_error)
Dimensional reduction is not supported for pandas and the user should convert their data to xarray to formulate the call to the metric. At most one of reduce_dims and preserve_dims may be specified. Specifying both will result in an exception.
- Parameters:
fcst (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]) – Forecast or predicted variables in xarray or pandas.
obs (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]) – Observed variables in xarray or pandas.
reduce_dims (Union[str, Iterable[str]]) – Optionally specify which dimensions to reduce when calculating MAE. All other dimensions will be preserved.
preserve_dims (Union[str, Iterable[str]]) – Optionally specify which dimensions to preserve when calculating MAE. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the absolute error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights – Not yet implemented. Allow weighted averaging (e.g. by area, by latitude, by population, custom).
- Returns:
By default an xarray DataArray containing a single floating point number representing the mean absolute error for the supplied data. All dimensions will be reduced.
Alternatively, an xarray structure with dimensions preserved as appropriate containing the score along reduced dimensions
- Return type:
Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]
- scores.continuous.mse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the mean squared error from forecast and observed data.
Dimensional reduction is not supported for pandas and the user should convert their data to xarray to formulate the call to the metric. At most one of reduce_dims and preserve_dims may be specified. Specifying both will result in an exception.
- Parameters:
fcst (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]) – Forecast or predicted variables in xarray or pandas.
obs (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]) – Observed variables in xarray or pandas.
reduce_dims (Union[str, Iterable[str]) – Optionally specify which dimensions to reduce when calculating MSE. All other dimensions will be preserved.
preserve_dims (Union[str, Iterable[str]) – Optionally specify which dimensions to preserve when calculating MSE. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the squared error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights – Not yet implemented. Allow weighted averaging (e.g. by area, by latitude, by population, custom)
- Returns:
- An object containing
a single floating point number representing the mean absolute error for the supplied data. All dimensions will be reduced. Otherwise: Returns an object representing the mean squared error, reduced along the relevant dimensions and weighted appropriately.
- Return type:
Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]
- scores.continuous.rmse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None)#
Calculate the Root Mean Squared Error from xarray or pandas objects.
A detailed explanation is on [Wikipedia](https://en.wikipedia.org/wiki/Root-mean-square_deviation)
Dimensional reduction is not supported for pandas and the user should convert their data to xarray to formulate the call to the metric. At most one of reduce_dims and preserve_dims may be specified. Specifying both will result in an exception.
- Parameters:
Union[xr.Dataset (fcst) – Forecast or predicted variables in xarray or pandas.
xr.DataArray – Forecast or predicted variables in xarray or pandas.
pd.Dataframe – Forecast or predicted variables in xarray or pandas.
pd.Series] – Forecast or predicted variables in xarray or pandas.
obs (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]) – Observed variables in xarray or pandas.
reduce_dims (Union[str, Iterable[str]) – Optionally specify which dimensions to reduce when calculating RMSE. All other dimensions will be preserved.
preserve_dims – Optionally specify which dimensions to preserve when calculating RMSE. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the absolute error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
- Returns:
- An object containing
a single floating point number representing the root mean squared error for the supplied data. All dimensions will be reduced. Otherwise: Returns an object representing the root mean squared error, reduced along the relevant dimensions and weighted appropriately.
- Return type:
Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]
scores.probability#
- scores.probability.crps_cdf(fcst: DataArray, obs: DataArray, threshold_dim: str = 'threshold', threshold_weight: DataArray | None = None, additional_thresholds: Iterable[float] | None = None, propagate_nans: bool = True, fcst_fill_method: Literal['linear', 'step', 'forward', 'backward'] = 'linear', threshold_weight_fill_method: Literal['linear', 'step', 'forward', 'backward'] = 'forward', integration_method: Literal['exact', 'trapz'] = 'exact', reduce_dims: Iterable[str] | None = None, preserve_dims: Iterable[str] | None = None, weights=None, include_components=False)#
Calculates CRPS CDF probabilistic metric.
Calculates the continuous ranked probability score (CRPS), or the mean CRPS over specified dimensions, given forecasts in the form of predictive cumulative distribution functions (CDFs). Can also calculate threshold-weighted versions of the CRPS by supplying a threshold_weight.
Predictive CDFs here are described by an indexed set of values rather than by closed forumulae. As a result, the resolution or number of points at which the CDF is realised has an impact on the calculation of areas under and over the curve to obtain the CRPS result. The term ‘threshold’ is used to describe the dimension which is used as an index for predictive CDF values. Various techniques are used to interpolate CDF values between indexed thresholds.
- Given
a predictive CDF fcst indexed at thresholds by variable x,
an observation in CDF form obs_cdf (i.e., obs_cdf(x) = 0 if x < obs and 1 if x >= obs),
a threshold_weight array indexed by variable x,
- The threshold-weighted CRPS is given by:
CRPS = integral(threshold_weight(x) * (fcst(x) - obs_cdf(x))**2), over all thresholds x.
The usual CRPS is the threshold-weighted CRPS with threshold_weight(x) = 1 for all x. This can be decomposed into an over-forecast penalty
integral(threshold_weight(x) * (fcst(x) - obs_cdf(x))**2), over all thresholds x where x >= obs
- and an under-forecast penalty
integral(threshold_weight(x) * (fcst(x) - obs_cdf(x))**2), over all thresholds x where x <= obs.
Note that the function crps_cdf is designed so that the obs argument contains actual observed values. crps_cdf will convert obs into CDF form in order to calculate the CRPS.
- To calculate CRPS, integration is applied over the set of thresholds x taken from
fcst[threshold_dim].values,
obs.values.
threshold_weight[threshold_dim].values if applicable.
additional_thresholds if applicable.
- With NaN values excluded. There are two methods of integration:
“exact” gives the exact integral under that assumption that that fcst is continuous and piecewise linear between its specified values, and that threshold_weight (if supplied) is piecewise constant and right-continuous between its specified values.
“trapz” simply uses a trapezoidal rule using the specified values, and so is an approximation of the CRPS. To get an accurate approximation, the density of threshold values can be increased by supplying additional_thresholds.
Both methods of calculating CRPS may require adding additional values to the threshold_dim dimension in fcst and (if supplied) threshold_weight. fcst_fill_method and weight_fill_method specify how fcst and weight are to be filled at these additional points.
The function crps_cdf calculates the CRPS using forecast CDF values ‘as is’. No checks are performed to ensure that CDF values in fcst are nondecreasing along threshold_dim. Checks are conducted on fcst and threshold_weight (if applicable) to ensure that coordinates are increasing along threshold_dim.
- Parameters:
fcst – array of forecast CDFs, with the threshold dimension given by threshold_dim.
obs – array of observations, not in CDF form.
threshold_dim – name of the dimension in fcst that indexes the thresholds.
threshold_weight – weight to be applied along threshold_dim to calculat threshold-weighted CRPS. Must contain threshold_dim as a dimension, and may also include other dimensions from fcst. If weight=None, a weight of 1 is applied everywhere, which gives the usual CRPS.
additional_thresholds – additional thresholds values to add to fcst and (if applicable) threshold_weight prior to calculating CRPS.
propagate_nans – If True, propagate NaN values along threshold_dim in fcst and threshold_weight prior to calculating CRPS. This will result in CRPS being NaN for these cases. If False, NaN values in fcst and weight will be replaced, wherever possible, with non-NaN values using the fill method specified by fcst_fill_method and threshold_weight_fill_method.
fcst_fill_method –
how to fill values in fcst when NaNs have been introduced (by including additional thresholds) or are specified to be removed (by setting propagate_nans=False). Select one of:
”linear”: use linear interpolation, then replace any leading or trailing NaNs using linear extrapolation. Afterwards, all values are clipped to the closed interval [0, 1].
”step”: apply forward filling, then replace any leading NaNs with 0.
”forward”: first apply forward filling, then remove any leading NaNs by back filling.
”backward”: first apply back filling, then remove any trailing NaNs by forward filling.
In most cases, “linear” is likely the appropriate choice.
threshold_weight_fill_method – how to fill values in threshold_weight when NaNs have been introduced (by including additional thresholds) or are specified to be removed (by setting propagate_nans=False). Select one of “linear”, “step”, “forward” or “backward”. If the weight function is continuous, “linear” is probably the best choice. If it is an increasing step function, “forward” may be best.
integration_method (str) – one of “exact” or “trapz”.
preserve_dims (Tuple[str]) – dimensions to preserve in the output. All other dimensions are collapsed by taking the mean.
reduce_dims (Tuple[str]) – dimensions to reduce in the output by taking the mean. All other dimensions are preserved.
weights – Not yet implemented. Allow weighted averaging (e.g. by area, by latitude, by population, custom)
include_components (bool) – if True, include the under and over forecast components of the score in the returned dataset.
- Returns:
- The following are the produced Dataset variables:
”total” the total CRPS.
”underforecast_penalty”: the under-forecast penalty contribution of the CRPS.
”overforecast_penalty”: the over-forecast penalty contribution of the CRPS.
- Return type:
xr.Dataset
- Raises:
ValueError – if threshold_dim is not a dimension of fcst.
ValueError – if threshold_dim is not a dimension of threshold_weight when threshold_weight is not None.
ValueError – if threshold_dim is a dimension of obs.
ValueError – if dimensions of obs are not also dimensions of fcst.
ValueError – if dimensions of threshold_weight are not also dimensions of fcst when threshold_weight is not None.
ValueError – if dims is not a subset of dimensions of fcst.
ValueError – if fcst_fill_method is not one of ‘linear’, ‘step’, ‘forward’ or ‘backward’.
ValueError – if weight_fill_method is not one of ‘linear’, ‘step’, ‘forward’ or ‘backward’.
ValueError – if fcst[threshold_dim] has less than 2 values.
ValueError – if coordinates in fcst[threshold_dim] are not increasing.
ValueError – if threshold_weight is not None and coordinates in threshold_weight[threshold_dim] are not increasing.
ValueError – if threshold_weight has negative values.
See also
scores.probability.crps_cdf_brier_decomposition
References
- Matheson, J. E., and R. L. Winkler, 1976: Scoring rules for continuous probability distributions.
Manage. Sci.,22, 1087–1095.
- Gneiting, T., & Ranjan, R. (2011). Comparing Density Forecasts Using Threshold- and
Quantile-Weighted Scoring Rules. Journal of Business & Economic Statistics, 29(3), 411–422. http://www.jstor.org/stable/23243806
- scores.probability.adjust_fcst_for_crps(fcst: DataArray, threshold_dim: str, obs: DataArray, decreasing_tolerance: float = 0, additional_thresholds: Iterable[float] | None = None, fcst_fill_method: Literal['linear', 'step', 'forward', 'backward'] = 'linear', integration_method: Literal['exact', 'trapz'] = 'exact') DataArray #
This function takes a forecast cumulative distribution functions (CDF) fcst. If fcst is not decreasing outside of specified tolerance, it returns fcst. Otherwise, the CDF envelope for fcst is computed, and the CDF from among
fcst,
the upper envelope, and
the lower envelope
that has the higher (i.e. worse) CRPS is returned. In the event of a tie, preference is given in the order fcst then upper. See scores.probability.functions.cdf_envelope for details about the CDF envelope.
The use case for this is when, either due to rounding or poor forecast process, the forecast CDF fcst fails to be nondecreasing. Rather than simply replacing fcst with NaN, adjust_fcst_for_crps returns a CDF for which CRPS can be calculated, but possibly with a predictive performance cost as measured by CRPS.
Whether a CDF is decreasing outside specified tolerance is determined as follows. For each CDF in fcst, the sum of incremental decreases along the threshold dimension is calculated. For example, if the CDF values are
[0, 0.4, 0.3, 0.9, 0.88, 1]
then the sum of incremental decreases is -0.12. This CDF decreases outside specified tolerance if 0.12 > decreasing_tolerance.
- The adjusted array of forecast CDFs is determined as follows:
- any NaN values in fcst are propagated along threshold_dim so that in each case
the entire CDF is NaN;
any CDFs in fcst that are decreasing within specified tolerance are unchanged;
- any CDFs in fcst that are decreasing outside specified tolerance are replaced with
whichever of the upper or lower CDF envelope gives the highest CRPS, unless the original values give a higher CRPS in which case original values are kept.
See scores.probability.functions.cdf_envelope for a description of the ‘CDF envelope’. If propagating NaNs is not desired, the user may first fill NaNs in fcst using scores.probability.functions.fill_cdf. The CRPS for each forecast case is calculated using crps, with a weight of 1.
- Parameters:
fcst – DataArray of CDF values with threshold dimension threshold_dim.
threshold_dim – name of the threshold dimension in fcst.
obs – DataArray of observations.
decreasing_tolerance – nonnegative tolerance value.
additional_thresholds – optional additional thresholds passed on to crps when calculating CRPS.
fcst_fill_method – fcst fill method passed on to crps when calculating CRPS.
integration_method – integration method passed on to crps when calculating CRPS.
- Returns:
An xarray DataArray of possibly adjusted forecast CDFs, where adjustments are made to penalise CDFs that decrease outside tolerance.
- Raises:
ValueError – If threshold_dim is not a dimension of fcst.
ValueError – If decreasing_tolerance is negative.
- scores.probability.crps_cdf_brier_decomposition(fcst: DataArray, obs: DataArray, threshold_dim: str = 'threshold', additional_thresholds: Iterable[float] | None = None, fcst_fill_method: Literal['linear', 'step', 'forward', 'backward'] = 'linear', reduce_dims: Iterable[str] | None = None, preserve_dims: Iterable[str] | None = None) Dataset #
Given an array fcst of predictive CDF values indexed along threshold_dim, and an array obs of observations, calculates the mean Brier score for each index along threshold_dim. Since the mean CRPS is the integral of the mean Brier score over all thresholds, this gives a threshold decomposition of the mean CRPS.
If any there are any NaNs along the threshold dimension of fcst, then NaNs are propagated along this dimension prior to calculating the decomposition. If propagating NaNs is not desired, the user may first fill NaNs in fcst using scores.probability.functions.fill_cdf.
- Parameters:
fcst (xr.DataArray) – DataArray of CDF values with threshold dimension threshold_dim.
obs (xr.DataArray) – DataArray of observations, not in CDF form.
threshold_dim (str) – name of the threshold dimension in fcst.
additional_thresholds (Optional[Iterable[float]]) – additional thresholds at which to calculate the mean Brier score.
fcst_fill_method (Literal["linear", "step", "forward", "backward"]) –
How to fill NaN values in fcst that arise from new user-supplied thresholds or thresholds derived from observations. - “linear”: use linear interpolation, and if needed also extrapolate linearly.
Clip to 0 and 1. Needs at least two non-NaN values for interpolation, so returns NaNs where this condition fails.
”step”: use forward filling then set remaining leading NaNs to 0. Produces a step function CDF (i.e. piecewise constant).
”forward”: use forward filling then fill any remaining leading NaNs with backward filling.
”backward”: use backward filling then fill any remaining trailing NaNs with forward filling.
dims – dimensions to preserve in the output. The dimension threshold_dim is always preserved, even if not specified here.
- Returns:
“total_penalty”: the mean Brier score for each threshold.
- ”underforecast_penalty”: the mean of the underforecast penalties for the Brier score. For a particular
forecast case, this component equals 0 if the event didn’t occur and the Brier score if it did.
- ”overforecast_penalty”: the mean of the overforecast penalties for the Brier score. For a particular
forecast case, this component equals 0 if the event did occur and the Brier score if it did not.
- Return type:
An xarray Dataset with data_vars
- Raises:
ValueError – if threshold_dim is not a dimension of fcst.
ValueError – if threshold_dim is a dimension of obs.
ValueError – if dimensions of obs are not also among the dimensions of fcst.
ValueError – if dimensions in dims is not among the dimensions of fcst.
ValueError – if fcst_fill_method is not one of ‘linear’, ‘step’, ‘forward’ or ‘backward’.
ValueError – if coordinates in fcst[threshold_dim] are not increasing.
scores.categorical#
- scores.categorical.firm(fcst: DataArray, obs: DataArray, risk_parameter: float, categorical_thresholds: Sequence[float], threshold_weights: Sequence[float | DataArray], discount_distance: float | None = 0, reduce_dims: Sequence[str] | None = None, preserve_dims: Sequence[str] | None = None, weights: DataArray | None = None) Dataset #
Calculates the FIxed Risk Multicategorical (FIRM) score including the underforecast and overforecast penalties.
categorical_thresholds and threshold_weights must be the same length.
- Parameters:
fcst – An array of real-valued forecasts that we want to treat categorically.
obs – An array of real-valued observations that we want to treat categorically.
risk_parameter – Risk parameter (alpha) for the FIRM score. The value must satisfy 0 < risk_parameter < 1.
categorical_thresholds – Category thresholds (thetas) to delineate the categories.
threshold_weights – Weights that specify the relative importance of forecasting on the correct side of each category threshold. Either a positive float can be supplied for each categorical threshold or an xr.DataArray (with no negative values) can be provided for each categorical threshold as long as its dims are a subset of obs dims. NaN values are allowed in the xr.DataArray. For each NaN value at a given coordinate, the FIRM score will be NaN at that coordinate, before dims are collapsed.
discount_distance – An optional discounting distance parameter which satisfies discount_distance >= 0 such that the cost of misses and false alarms are discounted whenever the observation is within distance discount_distance of the forecast category. A value of 0 will not apply any discounting.
reduce_dims – Optionally specify which dimensions to reduce when calculating the FIRM score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
preserve_dims – Optionally specify which dimensions to preserve when calculating FIRM. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the FIRM score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
weights – Optionally provide an array for weighted averaging (e.g. by area, by latitude, by population, custom)
- Returns:
firm_score: A score for a single category for each coord based on the FIRM framework.
overforecast_penalty: Penalty for False Alarms.
underforecast_penalty: Penalty for Misses.
- Return type:
An xarray Dataset with data vars
- Raises:
ValueError – if len(categorical_thresholds) < 1.
ValueError – if categorical_thresholds and threshold_weights lengths are not equal.
ValueError – if risk_parameter <= 0 or >= 1.
ValueError – if any values in threshold_weights are <= 0.
ValueError – if discount_distance is not None and < 0.
scores.utils.DimensionError – if threshold_weights is a list of xr.DataArrays and if the dimensions of these xr.DataArrays is not a subset of the obs dims.
Note
Setting discount distance to None or 0, will mean that no discounting is applied. This means that errors will be penalised strictly categorically.
Setting discount distance to np.inf means that the cost of a miss is always proportional to the distance of the observation from the threshold, and similarly for false alarms.
References
Taggart, R., Loveday, N. and Griffiths, D., 2022. A scoring framework for tiered warnings and multicategorical forecasts based on fixed risk measures. Quarterly Journal of the Royal Meteorological Society, 148(744), pp.1389-1406.
scores.stats#
- scores.stats.tests.diebold_mariano(da_timeseries: DataArray, ts_dim: str, h_coord: str, method: Literal['HG', 'HLN'] = 'HG', confidence_level: float = 0.95, statistic_distribution: Literal['normal', 't'] = 'normal') Dataset #
Given an array of (multiple) timeseries, with each timeseries consisting of score differences for h-step ahead forecasts, calculates a modified Diebold-Mariano test statistic for each timeseries. Several other statistics are also returned such as the confidence that the population mean of score differences is greater than zero and confidence intervals for that mean.
Two methods for calculating the test statistic have been implemented: the “HG” method Hering and Genton (2011) and the “HLN” method of Harvey, Leybourne and Newbold (1997). The default “HG” method has an advantage of only generating positive estimates for the spectral density contribution to the test statistic. For further details see scores.stats.confidence_intervals.impl._dm_test_statistic.
Prior to any calculations, NaNs are removed from each timeseries. If there are NaNs in da_timeseries then a warning will occur. This is because NaNs may impact the autocovariance calculation.
To determine the value of h for each timeseries of score differences of h-step ahead forecasts, one may ask ‘How many observations of the phenomenon will be made between making the forecast and having the observation that will validate the forecast?’ For example, suppose that the phenomenon is afternoon precipitation accumulation in New Zealand (00z to 06z each day). Then a Day+1 forecast issued at 03z on Day+0 will be a 2-ahead forecast, since Day+0 and Day+1 accumulations will be observed before the forecast can be validated. On the other hand, a Day+1 forecast issued at 09z on Day+0 will be a 1-step ahead forecast. The value of h for each timeseries in the array needs to be specified in one of the sets of coordinates. See the example below.
Confidence intervals and “confidence_gt_0” statistics are calculated using the test statistic, which is assumed to have either the standard normal distribution or Student’s t distribution with n - 1 degrees of freedom (where n is the length of the timeseries). The distribution used is specified by statistic_distribution. See Harvey, Leybourne and Newbold (1997) for why the t distribution may be preferred, especially for shorter timeseries.
- Parameters:
da_timeseries – a 2 dimensional array containing the timeseries.
ts_dim – name of the dimension which identifies each timeseries in the array.
h_coord – name of the coordinate specifying, for each timeseries, that the timeseries is an h-step ahead forecast. h_coord coordinates must be indexed by the dimension ts_dim.
method – method for calculating the test statistic, one of “HG” or “HLN”.
confidence_level – the confidence level, between 0 and 1 exclusive, at which to calculate confidence intervals.
statistic_distribution – the distribution of the test-statistic under the null hypothesis of equipredictive skill. Used to calculate the “confidence_gt_0” statistic and confidence intervals. One of “normal” or “t” (for Student’s t distribution).
- Returns:
“mean”: the mean value for each timeseries, ignoring NaNs
”dm_test_stat”: the modified Diebold-Mariano test statistic for each timeseries
”timeseries_len”: the length of each timeseries, with NaNs removed.
”confidence_gt_0”: the confidence that the mean value of the population is greater than zero, based on the specified statistic_distribution. Precisely, it is the value of the cumululative distribution function evaluated at dm_test_stat.
”ci_upper”: the upper end point of a confidence interval about the mean at specified confidence_level.
”ci_lower”: the lower end point of a confidence interval about the mean at specified confidence_level.
- Return type:
Dataset, indexed by ts_dim, with six variables
- Raises:
ValueError – if method is not one of “HG” or “HLN”.
ValueError – if statistic_distribution is not one of “normal” or “t”.
ValueError – if 0 < confidence_level < 1 fails.
ValueError – if len(da_timeseries.dims) != 2.
ValueError – if ts_dim is not a dimension of da_timeseries.
ValueError – if h_coord is not a coordinate of da_timeseries.
ValueError – if ts_dim is not the only dimension of da_timeseries[h_coord].
ValueError – if h_coord values aren’t positive integers.
ValueError – if h_coord values aren’t less than the lengths of the timeseries after NaNs are removed.
RuntimeWarnning – if there is a NaN in diffs.
References
Diebold and Mariano, ‘Comparing predictive accuracy’, Journal of Business and Economic Statistics 13 (1995), 253-265.
Hering and Genton, ‘Comparing spatial predictions’, Technometrics 53 no. 4 (2011), 414-425.
Harvey, Leybourne and Newbold, ‘Testing the equality of prediction mean squared errors’, International Journal of Forecasting 13 (1997), 281-291.
Example
This array gives three timeseries of score differences. Coordinates in the “lead_day” dimension uniquely identify each timeseries. Here ts_dim=”lead_day”. Coordinates in the “valid_date” dimension give the forecast validity timestamp of each item in the timeseries. The “h” coordinates specify that the timeseries are for 2, 3 and 4-step ahead forecasts respectively. Here h_coord=”h”.
>>> da_timeseries = xr.DataArray( ... data=[[1, 2, 3.0, 4, np.nan], [2.0, 1, -3, -1, 0], [1.0, 1, 1, 1, 1]], ... dims=["lead_day", "valid_date"], ... coords={ ... "lead_day": [1, 2, 3], ... "valid_date": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"], ... "h": ("lead_day", [2, 3, 4]), ... }, ... )
>>> dm_test_stats(da_timeseries, "lead_day", "h")