Classifier bettor

This example illustrates how to use a classifier-based bettor and evaluate its performance on soccer historical data.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_val_score, TimeSeriesSplit

Extracting the training data

We extract the training data for the Spanish league. We also remove columns that contain missing values and select the market maximum odds.

dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, O_train = dataloader.extract_train_data(
    drop_na_thres=1.0, odds_type='market_maximum'
)

Out:

Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

The input data:

print(X_train)

Out:

               home_team    away_team league  division  year  home_team_soccer_power_index  away_team_soccer_power_index  home_team_probability_win  away_team_probability_win  probability_draw  home_team_projected_score  away_team_projected_score  match_quality
date
2016-08-19     La Coruna        Eibar  Spain         1  2017                         66.52                         62.29                     0.5003                     0.2260            0.2738                       1.47                       0.79      64.335545
2016-08-19        Malaga      Osasuna  Spain         1  2017                         72.57                         56.93                     0.5475                     0.1897            0.2628                       1.56                       0.70      63.805561
2016-08-20     Barcelona        Betis  Spain         1  2017                         96.35                         69.95                     0.9591                     0.0071            0.0337                       3.40                       0.42      81.054510
2016-08-20       Granada   Villarreal  Spain         1  2017                         55.69                         76.79                     0.3194                     0.3917            0.2889                       1.07                       1.19      64.559709
2016-08-20       Sevilla      Espanol  Spain         1  2017                         78.76                         68.75                     0.5952                     0.1760            0.2288                       1.89                       0.88      73.415362
...                  ...          ...    ...       ...   ...                           ...                           ...                        ...                        ...               ...                        ...                        ...            ...
2022-01-30    Las Palmas   Sociedad B  Spain         2  2022                         34.45                         25.65                     0.4938                     0.2206            0.2856                       1.48                       0.90      29.405740
2022-01-30       Leganes     Alcorcon  Spain         2  2022                         34.16                         22.38                     0.5720                     0.1584            0.2696                       1.60                       0.71      27.042830
2022-01-30  Ponferradina     Tenerife  Spain         2  2022                         30.87                         36.36                     0.3169                     0.3649            0.3182                       1.03                       1.13      33.390843
2022-01-30     Cartagena  Fuenlabrada  Spain         2  2022                         30.99                         24.68                     0.4450                     0.2639            0.2912                       1.41                       1.02      27.477392
2022-01-31         Ibiza     Zaragoza  Spain         2  2022                         31.70                         30.06                     0.4019                     0.2920            0.3062                       1.25                       1.02      30.858225

[3803 rows x 13 columns]

The multi-output targets:

print(Y_train)

Out:

      away_win__full_time_goals  draw__full_time_goals  home_win__full_time_goals  over_2.5__full_time_goals  under_2.5__full_time_goals
0                         False                  False                       True                       True                       False
1                         False                   True                      False                      False                        True
2                         False                  False                       True                       True                       False
3                         False                   True                      False                      False                        True
4                         False                  False                       True                       True                       False
...                         ...                    ...                        ...                        ...                         ...
3798                      False                   True                      False                      False                        True
3799                      False                  False                       True                      False                        True
3800                       True                  False                      False                       True                       False
3801                      False                  False                       True                       True                       False
3802                      False                   True                      False                       True                       False

[3803 rows x 5 columns]

The odds data:

print(O_train)

Out:

      market_maximum__away_win__odds  market_maximum__draw__odds  market_maximum__home_win__odds  market_maximum__over_2.5__odds  market_maximum__under_2.5__odds
0                                NaN                         NaN                             NaN                             NaN                              NaN
1                                NaN                         NaN                             NaN                             NaN                              NaN
2                                NaN                         NaN                             NaN                             NaN                              NaN
3                                NaN                         NaN                             NaN                             NaN                              NaN
4                                NaN                         NaN                             NaN                             NaN                              NaN
...                              ...                         ...                             ...                             ...                              ...
3798                            6.56                        4.00                            1.62                            2.19                             1.80
3799                            5.60                        3.71                            1.78                            2.31                             1.68
3800                            3.23                        3.03                            2.63                            2.77                             1.50
3801                            4.22                        3.41                            2.00                            2.31                             1.68
3802                            4.10                        3.34                            2.19                            2.45                             1.64

[3803 rows x 5 columns]

Classifier bettor

We can use ClassifierBettor class to create a classifier-based bettor. A DummyClassifier is selected for convenience.

Any bettor is a classifier, therefore we can fit it on the training data.

Out:

ClassifierBettor(classifier=DummyClassifier())

We can predict probabilities for the positive class.

Out:

array([[0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461],
       [0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461],
       [0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461],
       ...,
       [0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461],
       [0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461],
       [0.26531686, 0.29555614, 0.439127  , 0.42650539, 0.57349461]])

We can also predict the class label.

Out:

array([[False, False, False, False,  True],
       [False, False, False, False,  True],
       [False, False, False, False,  True],
       ...,
       [False, False, False, False,  True],
       [False, False, False, False,  True],
       [False, False, False, False,  True]])

Finally, we can evaluate its cross-validation accuracy.

Out:

0.0

Backtesting the bettor

We can backtest the bettor using the historical data.

Out:

ClassifierBettor(classifier=DummyClassifier())

Various backtesting statistics are calculated.

Training Start Training End Training Period Testing Start Testing End Testing Period Start Value End Value Total Return [%] Max Drawdown [%] Max Drawdown Duration Total Bets Win Rate [%] Best Bet [%] Worst Bet [%] Avg Winning Bet [%] Avg Losing Bet [%] Profit Factor Sharpe Ratio Avg Bet Yield [%] Std Bet Yield [%]
0 2016-08-19 2018-03-17 575 days 2018-03-17 2019-02-03 324 days 1000.0 1000.00 0.000 NaN NaT 0.0 NaN NaN NaN NaN NaN NaN inf NaN NaN
1 2016-08-19 2019-02-03 898 days 2019-02-03 2019-11-09 280 days 1000.0 1070.17 7.017 1.414246 14 days 00:00:00 217.0 45.161290 450.000000 -166.666667 145.481114 -99.972969 1.347256 2.136920 10.877262 143.166404
2 2016-08-19 2019-11-09 1177 days 2019-11-09 2020-10-03 330 days 1000.0 951.10 -4.890 8.298392 328 days 12:00:00 448.0 42.187500 778.000000 -177.777778 129.504184 -103.412343 0.910817 -0.735663 -4.689021 143.149822
3 2016-08-19 2020-10-03 1506 days 2020-10-03 2021-04-22 202 days 1000.0 990.72 -0.928 6.738122 137 days 12:00:00 555.0 43.063063 1286.666667 -175.000000 131.176280 -101.983616 0.983034 -0.196894 -1.394068 146.042467
4 2016-08-19 2021-04-22 1707 days 2021-04-22 2022-01-31 285 days 1000.0 916.30 -8.370 10.105401 274 days 12:00:00 583.0 39.965695 2053.000000 -181.818182 153.433361 -106.113691 0.873435 -1.475663 -2.383909 176.240988


We can also plot the portfolio value for any testing period from the above backtesting results.

testing_period = 2
bettor.backtest_plot_value_(testing_period)


Estimating the value bets

We extract the fixtures data to estimate the value bets.

We can estimate the value bets by using the fitted classifier.

market_maximum__away_win__odds market_maximum__draw__odds market_maximum__home_win__odds market_maximum__over_2.5__odds market_maximum__under_2.5__odds
0 False False True True False
1 True True False True False
2 True True False False True
3 False True True False True
4 True True False False False
5 False True True False True
6 True True False False True
7 False False True True False
8 True False True True False


Total running time of the script: ( 0 minutes 45.093 seconds)

Gallery generated by Sphinx-Gallery