Note
Click here to download the full example code
Classifier bettor¶
This example illustrates how to use a classifier-based bettor and evaluate its performance on soccer historical data.
# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT
from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_val_score, TimeSeriesSplit
Extracting the training data¶
We extract the training data for the Spanish league. We also remove columns that contain missing values and select the market maximum odds.
dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, O_train = dataloader.extract_train_data(
drop_na_thres=1.0, odds_type='market_maximum'
)
Out:
Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
The input data:
print(X_train)
Out:
league division year home_team away_team match_quality ... away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score
date ...
2016-08-19 Spain 1 2017 Malaga Osasuna 63.805561 ... 56.93 0.5475 0.1897 0.2628 1.56 0.70
2016-08-19 Spain 1 2017 La Coruna Eibar 64.335545 ... 62.29 0.5003 0.2260 0.2738 1.47 0.79
2016-08-20 Spain 1 2017 Granada Villarreal 64.559709 ... 76.79 0.3194 0.3917 0.2889 1.07 1.19
2016-08-20 Spain 1 2017 Sevilla Espanol 73.415362 ... 68.75 0.5952 0.1760 0.2288 1.89 0.88
2016-08-20 Spain 1 2017 Barcelona Betis 81.054510 ... 69.95 0.9591 0.0071 0.0337 3.40 0.42
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-04-17 Spain 1 2022 Sevilla Real Madrid 78.514148 ... 84.21 0.3087 0.4187 0.2727 1.21 1.46
2022-04-17 Spain 1 2022 Ath Madrid Espanol 70.290915 ... 62.90 0.6979 0.0987 0.2034 2.03 0.61
2022-04-17 Spain 1 2022 Granada Levante 59.139742 ... 62.59 0.3985 0.3456 0.2559 1.57 1.45
2022-04-17 Spain 1 2022 Ath Bilbao Celta 75.637057 ... 73.36 0.4816 0.2351 0.2833 1.44 0.92
2022-04-18 Spain 1 2022 Barcelona Cadiz 70.151267 ... 59.65 0.7060 0.1040 0.1899 2.21 0.71
[4026 rows x 13 columns]
The multi-output targets:
print(Y_train)
Out:
output__home_win__full_time_goals output__draw__full_time_goals output__away_win__full_time_goals output__over_2.5__full_time_goals output__under_2.5__full_time_goals
0 False True False False True
1 True False False True False
2 False True False False True
3 True False False True False
4 True False False True False
... ... ... ... ... ...
4021 False False True True False
4022 True False False True False
4023 False False True True False
4024 False False True False True
4025 False False True False True
[4026 rows x 5 columns]
The odds data:
print(O_train)
Out:
odds__market_maximum__home_win__full_time_goals odds__market_maximum__draw__full_time_goals odds__market_maximum__away_win__full_time_goals odds__market_maximum__over_2.5__full_time_goals odds__market_maximum__under_2.5__full_time_goals
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
... ... ... ... ... ...
4021 3.19 3.50 2.58 2.14 1.90
4022 1.46 4.75 9.30 1.96 1.98
4023 2.45 3.67 3.06 1.98 2.03
4024 1.90 3.69 5.00 2.27 1.75
4025 1.24 7.00 16.00 1.65 2.50
[4026 rows x 5 columns]
Classifier bettor¶
We can use ClassifierBettor
class to create
a classifier-based bettor. A DummyClassifier
is selected for convenience.
Any bettor is a classifier, therefore we can fit it on the training data.
Out:
ClassifierBettor(classifier=DummyClassifier())
We can predict probabilities for the positive class.
Out:
array([[0.44138102, 0.29334327, 0.26527571, 0.43070045, 0.56929955],
[0.44138102, 0.29334327, 0.26527571, 0.43070045, 0.56929955],
[0.44138102, 0.29334327, 0.26527571, 0.43070045, 0.56929955],
...,
[0.44138102, 0.29334327, 0.26527571, 0.43070045, 0.56929955],
[0.44138102, 0.29334327, 0.26527571, 0.43070045, 0.56929955],
[0.44138102, 0.29334327, 0.26527571, 0.43070045, 0.56929955]])
We can also predict the class label.
Out:
array([[False, False, False, False, True],
[False, False, False, False, True],
[False, False, False, False, True],
...,
[False, False, False, False, True],
[False, False, False, False, True],
[False, False, False, False, True]])
Finally, we can evaluate its cross-validation accuracy.
cross_val_score(bettor, X_train, Y_train, cv=TimeSeriesSplit()).mean()
Out:
0.0
Backtesting the bettor¶
We can backtest the bettor using the historical data.
Out:
ClassifierBettor(classifier=DummyClassifier())
Various backtesting statistics are calculated.
We can also plot the portfolio value for any testing period from the above backtesting results.
testing_period = 2
bettor.backtest_plot_value_(testing_period)
Estimating the value bets¶
We extract the fixtures data to estimate the value bets.
We can estimate the value bets by using the fitted classifier.
Total running time of the script: ( 0 minutes 49.857 seconds)