Classifier bettor

This example illustrates how to use a classifier-based bettor and evaluate its performance on soccer historical data.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_val_score, TimeSeriesSplit

Extracting the training data

We extract the training data for the Spanish league. We also remove columns that contain missing values and select the market maximum odds.

dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, O_train = dataloader.extract_train_data(
    drop_na_thres=1.0, odds_type='market_maximum'
)

Out:

Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

The input data:

print(X_train)

Out:

             home_team    away_team league  ...  home_team_projected_score  away_team_projected_score  match_quality
date                                        ...
2016-08-19   La Coruna        Eibar  Spain  ...                       1.47                       0.79      64.335545
2016-08-19      Malaga      Osasuna  Spain  ...                       1.56                       0.70      63.805561
2016-08-20   Barcelona        Betis  Spain  ...                       3.40                       0.42      81.054510
2016-08-20     Granada   Villarreal  Spain  ...                       1.07                       1.19      64.559709
2016-08-20     Sevilla      Espanol  Spain  ...                       1.89                       0.88      73.415362
...                ...          ...    ...  ...                        ...                        ...            ...
2022-01-30   Cartagena  Fuenlabrada  Spain  ...                       1.41                       1.02      27.477392
2022-01-30  Las Palmas   Sociedad B  Spain  ...                       1.48                       0.90      29.405740
2022-01-30    Mirandes       Malaga  Spain  ...                       1.45                       1.04      26.945016
2022-01-30     Leganes     Alcorcon  Spain  ...                       1.60                       0.71      27.042830
2022-01-31       Ibiza     Zaragoza  Spain  ...                       1.25                       1.02      30.858225

[3803 rows x 13 columns]

The multi-output targets:

print(Y_train)

Out:

      away_win__full_time_goals  draw__full_time_goals  ...  over_2.5__full_time_goals  under_2.5__full_time_goals
0                         False                  False  ...                       True                       False
1                         False                   True  ...                      False                        True
2                         False                  False  ...                       True                       False
3                         False                   True  ...                      False                        True
4                         False                  False  ...                       True                       False
...                         ...                    ...  ...                        ...                         ...
3798                      False                  False  ...                       True                       False
3799                      False                   True  ...                      False                        True
3800                      False                  False  ...                       True                       False
3801                      False                  False  ...                      False                        True
3802                      False                   True  ...                       True                       False

[3803 rows x 5 columns]

The odds data:

print(O_train)

Out:

      market_maximum__away_win__odds  ...  market_maximum__under_2.5__odds
0                                NaN  ...                              NaN
1                                NaN  ...                              NaN
2                                NaN  ...                              NaN
3                                NaN  ...                              NaN
4                                NaN  ...                              NaN
...                              ...  ...                              ...
3798                            4.22  ...                             1.68
3799                            6.56  ...                             1.80
3800                            3.67  ...                             1.72
3801                            5.60  ...                             1.68
3802                            4.10  ...                             1.64

[3803 rows x 5 columns]

Classifier bettor

We can use ClassifierBettor class to create a classifier-based bettor. A DummyClassifier is selected for convenience.

Any bettor is a classifier, therefore we can fit it on the training data.

Out:

ClassifierBettor(classifier=DummyClassifier())

We can predict probabilities for the positive class.

Out:

array([[0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461],
       [0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461],
       [0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461],
       ...,
       [0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461],
       [0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461],
       [0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461]])

We can also predict the class label.

Out:

array([[False, False, False, False,  True],
       [False, False, False, False,  True],
       [False, False, False, False,  True],
       ...,
       [False, False, False, False,  True],
       [False, False, False, False,  True],
       [False, False, False, False,  True]])

Finally, we can evaluate its cross-validation accuracy.

Out:

0.0

Backtesting the bettor

We can backtest the bettor using the historical data.

Out:

ClassifierBettor(classifier=DummyClassifier())

Various backtesting statistics are calculated.

Training Start Training End Training Period Testing Start Testing End Testing Period Start Value End Value Total Return [%] Max Drawdown [%] Max Drawdown Duration Total Bets Win Rate [%] Best Bet [%] Worst Bet [%] Avg Winning Bet [%] Avg Losing Bet [%] Profit Factor Sharpe Ratio Avg Bet Yield [%] Std Bet Yield [%]
0 2016-08-19 2018-03-17 575 days 2018-03-17 2019-02-03 324 days 1000.0 1000.00 0.000 NaN NaT 0.0 NaN NaN NaN NaN NaN NaN inf NaN NaN
1 2016-08-19 2019-02-03 898 days 2019-02-03 2019-11-09 280 days 1000.0 1070.17 7.017 1.414246 14 days 00:00:00 217.0 45.161290 450.000000 -166.666667 145.481114 -99.972969 1.347256 2.136920 10.877262 143.166404
2 2016-08-19 2019-11-09 1177 days 2019-11-09 2020-10-03 330 days 1000.0 951.10 -4.890 8.298392 328 days 12:00:00 448.0 42.187500 778.000000 -177.777778 129.504184 -103.412343 0.910817 -0.735663 -4.689021 143.149822
3 2016-08-19 2020-10-03 1506 days 2020-10-03 2021-04-22 202 days 1000.0 990.72 -0.928 6.738122 137 days 12:00:00 555.0 43.063063 1286.666667 -175.000000 131.176280 -101.983616 0.983034 -0.196894 -1.394068 146.042467
4 2016-08-19 2021-04-22 1707 days 2021-04-22 2022-01-31 285 days 1000.0 916.30 -8.370 10.105401 274 days 12:00:00 583.0 39.965695 2053.000000 -181.818182 153.433361 -106.113691 0.873435 -1.475663 -2.383909 176.240988


We can also plot the portfolio value for any testing period from the above backtesting results.

testing_period = 2
bettor.backtest_plot_value_(testing_period)


Estimating the value bets

We extract the fixtures data to estimate the value bets.

We can estimate the value bets by using the fitted classifier.

market_maximum__away_win__odds market_maximum__draw__odds market_maximum__home_win__odds market_maximum__over_2.5__odds market_maximum__under_2.5__odds
0 False False True True False
1 True True False True False
2 True True False False True
3 False True True False True
4 True True False False False
5 False True True False True
6 True True False False True
7 False False True True False
8 True False True True False


Total running time of the script: ( 0 minutes 44.955 seconds)

Gallery generated by Sphinx-Gallery