sportsbet.evaluation.ClassifierBettor

class sportsbet.evaluation.ClassifierBettor(classifier)[source]

Bettor based on a Scikit-Learn classifier.

Read more in the user guide.

Parameters:
classifier : classifier object

A scikit-learn classifier object implementing fit, score and predict_proba.

Examples

>>> from sklearn.tree import DecisionTreeClassifier
>>> from sklearn.preprocessing import OneHotEncoder
>>> from sklearn.impute import SimpleImputer
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.compose import make_column_transformer
>>> from sportsbet.evaluation import ClassifierBettor
>>> from sportsbet.datasets import FDSoccerDataLoader
>>> # Select only backtesting data for the Italian league and years 2020, 2021
>>> param_grid = {'league': ['Italy'], 'year': [2020, 2021]}
>>> dataloader = FDSoccerDataLoader(param_grid)
>>> # Select the odds of Pinnacle bookmaker
>>> X, Y, O = dataloader.extract_train_data(
... odds_type='pinnacle',
... drop_na_thres=1.0
... )
Football-Data.co.uk...
>>> # Create a pipeline to handle categorical features and missing values
>>> clf_pipeline = make_pipeline(
... make_column_transformer(
... (OneHotEncoder(handle_unknown='ignore'), ['league', 'home_team', 'away_team']),
... remainder='passthrough'
... ),
... SimpleImputer(),
... DecisionTreeClassifier(random_state=0)
... )
>>> # Backtest the bettor
>>> bettor = ClassifierBettor(clf_pipeline)
>>> bettor.backtest(X, Y, O)
ClassifierBettor(classifier=...
>>> # Display backtesting results
>>> bettor.backtest_results_
  Training Start ... Avg Bet Yield [%]  Std Bet Yield [%]
...
backtest(X, Y, O, tscv=None, init_cash=1000, refit=True)

Backtest the bettor.

Parameters:
X : DataFrame object

The input data. Each row of X represents information that is available before the start of a specific match. The rows should be sorted by an index named as 'date'.

Y : DataFrame object

The multi-output targets. Each row of Y represents information that is available after the end of a specific match. The column names follow the convention for the output data Y of the method extract_train_data().

O : DataFrame object

The odds data. Each row of O represents information that is available after the end of a specific match. The column names follow the convention for the output data Y of the method extract_train_data().

tscv : TimeSeriesSplit object, default=None

Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. The default value of the parameter is None.

init_cash : init, default=1000

The initial cash to use for backtesting.

refit : bool, default=True

Refit the bettor using the whole input data and multi-output targets.

Returns:
self : bettor object.

The backtested bettor.

bet(X, O)

Predict the value bets for the provided input data and odds.

Parameters:
X : {array-like, sparse matrix} of shape (n_samples, n_features)

The input data.

O : {array-like, sparse matrix} of shape (n_samples, n_outputs)

The odds data.

Returns:
B : {array-like, sparse matrix} of shape (n_samples, n_outputs)

The value bets.

fit(X, Y)

Fit the bettor to the input data and multi-output targets.

Parameters:
X : {array-like, sparse matrix} of shape (n_samples, n_features)

The input data.

Y : {array-like, sparse matrix} of shape (n_samples, n_outputs)

The multi-output targets.

Returns:
self : Bettor object

The fitted bettor object.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : dict

Parameter names mapped to their values.

predict(X)

Predict class probabilities for multi-output targets.

Parameters:
X : {array-like, sparse matrix} of shape (n_samples, n_features)

The input data.

Returns:
Y : {array-like, sparse matrix} of shape (n_samples, n_outputs)

The positive class probabilities.

predict_proba(X)

Predict class probabilities for multi-output targets.

Parameters:
X : {array-like, sparse matrix} of shape (n_samples, n_features)

The input data.

Returns:
Y : {array-like, sparse matrix} of shape (n_samples, n_outputs)

The positive class probabilities.

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like of shape (n_samples, n_features)

Test samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like of shape (n_samples,), default=None

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:
self : estimator instance

Estimator instance.

Examples using sportsbet.evaluation.ClassifierBettor