Classifier bettor

This example illustrates how to use a classifier-based bettor and evaluate its performance on soccer historical data.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

import numpy as np
from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score, TimeSeriesSplit

Extracting the training data

We extract the training data for the Spanish league. We also remove columns that contain missing values and select the market maximum odds.

dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, O_train = dataloader.extract_train_data(
    drop_na_thres=1.0, odds_type='market_maximum'
)

Out:

Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

The input data:

home_team away_team league division year home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score match_quality
date
2016-08-19 La Coruna Eibar Spain 1 2017 66.52 62.29 0.5003 0.2260 0.2738 1.47 0.79 64.335545
2016-08-19 Malaga Osasuna Spain 1 2017 72.57 56.93 0.5475 0.1897 0.2628 1.56 0.70 63.805561
2016-08-20 Sevilla Espanol Spain 1 2017 78.76 68.75 0.5952 0.1760 0.2288 1.89 0.88 73.415362
2016-08-20 Granada Villarreal Spain 1 2017 55.69 76.79 0.3194 0.3917 0.2889 1.07 1.19 64.559709
2016-08-20 Barcelona Betis Spain 1 2017 96.35 69.95 0.9591 0.0071 0.0337 3.40 0.42 81.054510
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-01-23 Girona Lugo Spain 2 2022 40.64 30.24 0.5490 0.1963 0.2548 1.76 0.95 34.677020
2022-01-23 Huesca Ponferradina Spain 2 2022 34.06 29.71 0.4505 0.2406 0.3089 1.30 0.86 31.736635
2022-01-23 Granada Osasuna Spain 1 2022 62.04 67.01 0.3926 0.3145 0.2929 1.27 1.10 64.429297
2022-01-24 Almeria Eibar Spain 2 2022 42.75 35.27 0.5156 0.2166 0.2677 1.64 0.97 38.651436
2022-01-24 Sociedad B Cartagena Spain 2 2022 26.28 30.22 0.4016 0.3063 0.2921 1.34 1.14 28.112623

3792 rows × 13 columns



The multi-output targets:

away_win__full_time_goals draw__full_time_goals home_win__full_time_goals over_2.5__full_time_goals under_2.5__full_time_goals
0 False False True True False
1 False True False False True
2 False False True True False
3 False True False False True
4 False False True True False
... ... ... ... ... ...
3787 False True False False True
3788 True False False True False
3789 True False False False True
3790 True False False False True
3791 True False False True False

3792 rows × 5 columns



The odds:

market_maximum__away_win__odds market_maximum__draw__odds market_maximum__home_win__odds market_maximum__over_2.5__odds market_maximum__under_2.5__odds
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
... ... ... ... ... ...
3787 5.78 3.75 1.75 2.28 1.73
3788 3.85 3.24 2.31 2.48 1.60
3789 3.18 3.30 2.62 2.36 1.70
3790 4.00 3.45 2.05 2.26 1.74
3791 2.65 3.32 2.96 2.21 1.78

3792 rows × 5 columns



The simplify the training process we keep only the numerical features of the input data:

num_features = [
    col
    for col in X_train.columns
    if X_train[col].dtype in (np.dtype(int), np.dtype(float))
]
X_train = X_train[num_features]

Classifier bettor

We can use ClassifierBettor class to create a classifier-based bettor. A KNeighborsClassifier is selected.

Any bettor is a classifier, therefore we can fit it on the training data.

Out:

ClassifierBettor(classifier=KNeighborsClassifier())

We can predict probabilities for the positive class.

Out:

array([[0.2, 0.4, 0.4, 0.4, 0.6],
       [0. , 0.4, 0.6, 0.4, 0.6],
       [0. , 0.2, 0.8, 0.6, 0.4],
       ...,
       [0.4, 0.6, 0. , 0.2, 0.8],
       [0.2, 0. , 0.8, 0.4, 0.6],
       [0.8, 0.2, 0. , 0.6, 0.4]])

We can predict the class label.

Out:

array([[False, False, False, False,  True],
       [False, False,  True, False,  True],
       [False, False,  True,  True, False],
       ...,
       [False,  True, False, False,  True],
       [False, False,  True, False,  True],
       [ True, False, False,  True, False]])

Finally, we can evaluate its cross-validation accuracy.

Out:

0.17025316455696204

Backtesting the bettor

We can backtest the bettor using the historical data.

Out:

ClassifierBettor(classifier=KNeighborsClassifier())

Various backtesting statistics are calculated.

Training Start Training End Training Period Testing Start Testing End Testing Period Start Value End Value Total Return [%] Max Drawdown [%] Max Drawdown Duration Total Bets Win Rate [%] Best Bet [%] Worst Bet [%] Avg Winning Bet [%] Avg Losing Bet [%] Profit Factor Sharpe Ratio Avg Bet Yield [%] Std Bet Yield [%]
0 2016-08-19 2018-03-03 561 days 2018-03-16 2019-02-02 324 days 1000.0 1000.00 0.000 NaN NaT 0.0 NaN NaN NaN NaN NaN NaN inf NaN NaN
1 2016-08-19 2019-02-02 897 days 2019-02-03 2019-11-03 274 days 1000.0 1002.97 0.297 3.075409 40 days 12:00:00 212.0 41.981132 386.666667 -166.666667 119.037712 -90.112466 1.015331 0.132261 -2.308854 121.176287
2 2016-08-19 2019-11-03 1171 days 2019-11-03 2020-09-30 333 days 1000.0 1068.79 6.879 1.714371 110 days 00:00:00 455.0 49.450549 464.000000 -171.428571 105.935316 -90.853492 1.165249 1.580977 6.459655 120.084163
3 2016-08-19 2020-09-30 1503 days 2020-10-01 2021-04-19 201 days 1000.0 919.51 -8.049 10.109908 198 days 12:00:00 593.0 43.001686 630.000000 -171.428571 112.054049 -92.941063 0.846405 -2.267925 -4.632977 122.453931
4 2016-08-19 2021-04-19 1704 days 2021-04-19 2022-01-24 281 days 1000.0 949.46 -5.054 5.634760 260 days 12:00:00 572.0 43.881119 2053.000000 -171.428571 121.249602 -95.709338 0.903978 -1.101773 -0.505328 152.064310


We can also plot the portfolio value for any testing period from the above backtesting results.

testing_period = 2
bettor.backtest_plot_value_(testing_period)


Estimating the value bets

We extract the fixtures data to estimate the value bets.

We can estimate the value bets by using the fitted classifier.

market_maximum__away_win__odds market_maximum__draw__odds market_maximum__home_win__odds market_maximum__over_2.5__odds market_maximum__under_2.5__odds
0 False True False False True
1 True False False True False
2 False False True False True
3 False True True False True
4 False False True False True
5 False True False False True
6 True False True False True
7 False False True False True
8 True False False False True


Total running time of the script: ( 0 minutes 44.064 seconds)

Gallery generated by Sphinx-Gallery