Classifier bettor

This example illustrates how to use a classifier-based bettor and evaluate its performance on soccer historical data.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_val_score, TimeSeriesSplit

Extracting the training data

We extract the training data for the Spanish league. We also remove columns that contain missing values and select the market maximum odds.

dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, O_train = dataloader.extract_train_data(
    drop_na_thres=1.0, odds_type='market_maximum'
)

Out:

Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

The input data:

print(X_train)

Out:

           league  division  year    home_team   away_team  match_quality  home_team_soccer_power_index  away_team_soccer_power_index  home_team_probability_win  away_team_probability_win  probability_draw  home_team_projected_score  away_team_projected_score
date
2016-08-19  Spain         1  2017       Malaga     Osasuna      63.805561                         72.57                         56.93                     0.5475                     0.1897            0.2628                       1.56                       0.70
2016-08-19  Spain         1  2017    La Coruna       Eibar      64.335545                         66.52                         62.29                     0.5003                     0.2260            0.2738                       1.47                       0.79
2016-08-20  Spain         1  2017      Sevilla     Espanol      73.415362                         78.76                         68.75                     0.5952                     0.1760            0.2288                       1.89                       0.88
2016-08-20  Spain         1  2017      Granada  Villarreal      64.559709                         55.69                         76.79                     0.3194                     0.3917            0.2889                       1.07                       1.19
2016-08-20  Spain         1  2017    Barcelona       Betis      81.054510                         96.35                         69.95                     0.9591                     0.0071            0.0337                       3.40                       0.42
...           ...       ...   ...          ...         ...            ...                           ...                           ...                        ...                        ...               ...                        ...                        ...
2022-02-06  Spain         2  2022   Amorebieta      Oviedo      29.320147                         25.46                         34.56                     0.3230                     0.3671            0.3098                       1.08                       1.18
2022-02-06  Spain         2  2022  Fuenlabrada  Valladolid      31.126896                         23.66                         45.48                     0.1997                     0.5334            0.2669                       0.90                       1.64
2022-02-06  Spain         2  2022    Cartagena  Las Palmas      33.335691                         32.30                         34.44                     0.3855                     0.3313            0.2832                       1.38                       1.26
2022-02-06  Spain         2  2022     Tenerife     Leganes      35.222038                         36.71                         33.85                     0.4712                     0.2172            0.3115                       1.30                       0.78
2022-02-06  Spain         2  2022       Huesca    Mirandes      31.544311                         33.38                         29.90                     0.4501                     0.2616            0.2882                       1.43                       1.03

[2153 rows x 13 columns]

The multi-output targets:

print(Y_train)

Out:

      home_win__full_time_goals  draw__full_time_goals  away_win__full_time_goals  over_2.5__full_time_goals  under_2.5__full_time_goals
0                         False                   True                      False                      False                        True
1                          True                  False                      False                       True                       False
2                          True                  False                      False                       True                       False
3                         False                   True                      False                      False                        True
4                          True                  False                      False                       True                       False
...                         ...                    ...                        ...                        ...                         ...
2148                      False                   True                      False                      False                        True
2149                      False                   True                      False                      False                        True
2150                      False                  False                       True                      False                        True
2151                      False                   True                      False                      False                        True
2152                       True                  False                      False                       True                       False

[2153 rows x 5 columns]

The odds data:

print(O_train)

Out:

      market_maximum__home_win__odds  market_maximum__draw__odds  market_maximum__away_win__odds  market_maximum__over_2.5__odds  market_maximum__under_2.5__odds
0                                NaN                         NaN                             NaN                             NaN                              NaN
1                                NaN                         NaN                             NaN                             NaN                              NaN
2                                NaN                         NaN                             NaN                             NaN                              NaN
3                                NaN                         NaN                             NaN                             NaN                              NaN
4                                NaN                         NaN                             NaN                             NaN                              NaN
...                              ...                         ...                             ...                             ...                              ...
2148                            3.05                        3.11                            2.81                            2.56                             1.62
2149                            3.65                        3.40                            2.20                            2.32                             1.70
2150                            2.37                        3.25                            3.39                            2.29                             1.72
2151                            2.22                        3.20                            3.93                            2.66                             1.53
2152                            2.21                        3.34                            4.06                            2.27                             1.70

[2153 rows x 5 columns]

Classifier bettor

We can use ClassifierBettor class to create a classifier-based bettor. A DummyClassifier is selected for convenience.

Any bettor is a classifier, therefore we can fit it on the training data.

Out:

ClassifierBettor(classifier=DummyClassifier())

We can predict probabilities for the positive class.

Out:

array([[0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948],
       [0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948],
       [0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948],
       ...,
       [0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948],
       [0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948],
       [0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948]])

We can also predict the class label.

Out:

array([[False, False, False, False,  True],
       [False, False, False, False,  True],
       [False, False, False, False,  True],
       ...,
       [False, False, False, False,  True],
       [False, False, False, False,  True],
       [False, False, False, False,  True]])

Finally, we can evaluate its cross-validation accuracy.

Out:

0.0

Backtesting the bettor

We can backtest the bettor using the historical data.

Out:

ClassifierBettor(classifier=DummyClassifier())

Various backtesting statistics are calculated.

Training Start Training End Training Period Testing Start Testing End Testing Period Start Value End Value Total Return [%] Max Drawdown [%] Max Drawdown Duration Total Bets Win Rate [%] Best Bet [%] Worst Bet [%] Avg Winning Bet [%] Avg Losing Bet [%] Profit Factor Sharpe Ratio Avg Bet Yield [%] Std Bet Yield [%]
0 2016-08-19 2017-12-19 487 days 2017-12-20 2019-02-03 411 days 1000.0 1000.00 0.000 NaN NaT 0.0 NaN NaN NaN NaN NaN NaN inf NaN NaN
1 2016-08-19 2019-02-03 898 days 2019-02-03 2020-01-19 351 days 1000.0 968.26 -3.174 5.036836 119 days 12:00:00 234.0 36.752137 660.0 -160.000000 156.563455 -106.514966 0.857694 -0.982435 -9.372833 154.222103
2 2016-08-19 2020-01-19 1248 days 2020-01-19 2021-01-11 359 days 1000.0 993.75 -0.625 4.090217 140 days 00:00:00 451.0 39.467849 1600.0 -166.666667 164.045381 -109.295185 0.985368 -0.092957 -0.928864 174.661350
3 2016-08-19 2021-01-11 1606 days 2021-01-12 2021-12-05 328 days 1000.0 944.32 -5.568 7.028000 327 days 12:00:00 487.0 36.139630 2053.0 -175.000000 183.581047 -110.686120 0.884871 -1.096292 -4.339053 189.971872
4 2016-08-19 2021-12-05 1934 days 2021-08-13 2022-02-07 179 days 1000.0 962.78 -3.722 5.333228 105 days 12:00:00 336.0 39.880952 650.0 -166.666667 140.180313 -104.735485 0.888892 -1.334815 -6.749020 141.146992


We can also plot the portfolio value for any testing period from the above backtesting results.

testing_period = 2
bettor.backtest_plot_value_(testing_period)


Estimating the value bets

We extract the fixtures data to estimate the value bets.

We can estimate the value bets by using the fitted classifier.

market_maximum__home_win__odds market_maximum__draw__odds market_maximum__away_win__odds market_maximum__over_2.5__odds market_maximum__under_2.5__odds
0 False False True True False


Total running time of the script: ( 0 minutes 37.960 seconds)

Gallery generated by Sphinx-Gallery