Getting started#

This section is here to help you getting started with Skchange. It covers the fundamental concepts of the library in a brief and concise way.

Installation#

pip install skchange

To make full use of the library, you can install the optional Numba dependency. This will speed up the computation of the algorithms in Skchange, often by as much as 10-100 times.

pip install skchange[numba]

Change detection#

The task#

Change detection is the task of identifying abrupt changes in the distribution of a time series. The goal is to estimate the time points at which the distribution changes. These points are called change points (or change-points or changepoints).

Here is an example of two changes in the mean of a Gaussian time series with unit variance.

image1

Changes may occur in much more complex ways. For example, changes can affect:

  • Variance.

  • Shape of the distribution.

  • Auto-correlation.

  • Relationships between variables in multivariate time series.

  • An unknown, small portion of variables in a high-dimensional time series.

Skchange supports detecting changes in all of these scenarios, amongst others.

Composable change detectors#

Skchange follows a familiar scikit-learn-type API and is compatible with Sktime.

Here’s an example of a change detector:

[1]:
from skchange.change_detectors import MovingWindow
from skchange.change_scores import CUSUM

detector = MovingWindow(
    change_score=CUSUM(),
    penalty=10,
)
detector
[1]:
MovingWindow(change_score=CUSUM(), penalty=10)
Please rerun this cell to show the HTML repr or trust the notebook.

Let us look at each each part of the detector in more detail:

  1. change_score: Represents the choice of feature to detect changes in. CUSUM is a popular choice for detecting changes in the mean of a time series.

  2. penalty: Used to control the complexity of the change point model. The higher the penalty, the fewer change points will be detected.

  3. detector: The search algorithm for detecting change points. It governs which data intervals the change score is evaluated on and how the results are compiled to a final set of detected change points.

In Skchange, all detectors follow the same pattern. They are composed of some kind of score to be evaluated on data intervals, and a penalty. You can read more about the core components of Skchange in the Concepts section.

fit#

After initialising your detector of choice, you need to fit it to training data before you can use it to detect change points.

Here are some 3-dimensional Gaussian toy data with four segments with different means vectors.

[2]:
import numpy as np

from skchange.datasets import generate_changing_data

n = 300
cpts = [100, 140, 220]
means = [
    np.array([0.0, 0.0, 0.0]),
    np.array([8.0, 0.0, 0.0]),
    np.array([0.0, 0.0, 0.0]),
    np.array([2.0, 3.0, 5.0]),
]
x = generate_changing_data(n, changepoints=cpts, means=means, random_state=8)
x.columns = ["var0", "var1", "var2"]
x.index.name = "time"
x
[2]:
var0 var1 var2
time
0 0.091205 1.091283 -1.946970
1 -1.386350 -2.296492 2.409834
2 1.727836 2.204556 0.794828
3 0.976421 -1.183427 1.916364
4 -1.123327 -0.664035 -0.378359
... ... ... ...
295 0.325434 2.015049 4.939516
296 3.485036 3.118221 6.393023
297 2.517864 3.445919 3.264219
298 2.290727 2.758822 4.492490
299 1.230467 1.715009 4.918493

300 rows × 3 columns

Here is what the data looks like:

[3]:
import plotly.express as px

px.line(x)

As in scikit-learn, the role of fit is to estimate certain parameters of the detector before it can be used for detection tasks on test data. In Skchange, all currently supported detectors have empty fit methods, but this may change in the future.

[4]:
detector.fit(x)
[4]:
MovingWindow(change_score=CUSUM(), penalty=10)
Please rerun this cell to show the HTML repr or trust the notebook.

predict#

After fitting the detector, you can use it to detect change points. The predict method returns the integer locations of detected change points.

[5]:
detections = detector.predict(x)
detections
[5]:
ilocs
0 100
1 140
2 220

Note that change points indicate the start of a new segment.

transform#

Alternatively, you can use the transform method to label the data according to the change point segmentation.

[6]:
labels = detector.transform(x)
labels
[6]:
labels
time
0 0
1 0
2 0
3 0
4 0
... ...
295 3
296 3
297 3
298 3
299 3

300 rows × 1 columns

[7]:
px.line(labels)

This is useful for e.g. grouping operations per segment:

[8]:
x["label"] = labels
x.groupby("label").agg(["mean", "std"])
[8]:
var0 var1 var2
mean std mean std mean std
label
0 -0.145056 1.038400 0.078223 1.107580 0.016803 1.013129
1 8.085414 0.938503 -0.181219 1.152032 0.205081 0.881243
2 0.143322 1.136743 0.126735 0.975529 0.066954 1.085700
3 2.248388 0.919702 2.959066 1.029075 4.851858 1.018683

transform_scores#

Some detectors also support the transform_scores method, which returns the penalised change scores for each data point. This is the case for MovingWindow.

[9]:
detection_scores = detector.transform_scores(x)
detection_scores
[9]:
bandwidth 20
time
0 NaN
1 -6.943667
2 -7.688373
3 -8.703367
4 -6.636503
... ...
295 -8.910835
296 -9.046409
297 -8.271702
298 -8.353627
299 -7.787641

300 rows × 1 columns

[10]:
import plotly.express as px

px.line(detection_scores)

Segment anomaly detection#

The task#

Composable segment anomaly detectors#

fit#

predict#

transform#

transform_scores#