Interval scores#

The choice of interval score represents the choice of distributional feature(s) to detect changes in.

Interval scores are not primarily meant to be used directly, but they are important building blocks to understand to make full use of the library.

The most basic type of interval scores in Skchange are costs. A cost measures the cost/loss/error of a model fit to a data interval X[s:e].

[1]:
import numpy as np

from skchange.costs import GaussianCost

X = np.random.rand(100)

cost = GaussianCost()  # Cost for a Gaussian model with constant mean and variance.
cost.fit(X)  # Set up the cost for the given data.
cost.evaluate([0, 10])  # Evaluate the cost for the given interval, X[0:10].
[1]:
array([[-5.63978161]])

Another type of interval score are change scores. A change score measures the degree of change between two intervals adjacent X[s:k] and X[k:e]. They can be statistical tests, time series distances, or any other measure of difference.

[2]:
from skchange.change_scores import CUSUM

score = CUSUM()  # CUSUM score for a change in mean.
score.fit(X)  # Set up the score for the given data.
score.evaluate([0, 5, 10])  # Evaluate the change score between X[0, 5] and X[5, 10].
[2]:
array([[0.30029151]])

We can also compute several interval scores at once.

[3]:
score.evaluate([[0, 5, 10], [10, 12, 30], [60, 69, 71]])
[3]:
array([[0.30029151],
       [0.42547556],
       [0.30441832]])

The computational bottleneck of change detection algorithms is to evaluate an interval score over a large number of intervals and possible splits. In Skchange, this is solved as follows:

  • In fit, relevant quantities are precomputed to speed up the cost evaluations.

  • In evaluate, numba is leveraged to efficiently evaluate many interval-split-pairs in one call.

Moreover, costs can always be used to construct a change score by the following formula:

score.evaluate([start, split, end]) = cost.evaluate([start, end]) - (cost.evaluate([start, split]) + cost.evaluate([split, end]))

You can read this formula as “score = cost of the interval without a change point - cost of the interval with a single change point”

This means that you can always pass a cost to a change detector, even the ones that expects change scores, because it is converted to a change score internally.

At the same time, we also support change scores that can not be reduced to costs. This is different from e.g. the ruptures library. There are quite a few important scores that can not be reduced to costs, such as the Mann-Whitney U test, the Kolmogorov-Smirnov test, as well as scores for sparse change detection.