Note
Click here to download the full example code
Classifier bettor¶
This example illustrates how to use a classifier-based bettor and evaluate its performance on soccer historical data.
# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT
from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_val_score, TimeSeriesSplit
Extracting the training data¶
We extract the training data for the Spanish league. We also remove columns that contain missing values and select the market maximum odds.
dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, O_train = dataloader.extract_train_data(
drop_na_thres=1.0, odds_type='market_maximum'
)
Out:
Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
The input data:
print(X_train)
Out:
league division year home_team away_team match_quality home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score
date
2016-08-19 Spain 1 2017 Malaga Osasuna 63.805561 72.57 56.93 0.5475 0.1897 0.2628 1.56 0.70
2016-08-19 Spain 1 2017 La Coruna Eibar 64.335545 66.52 62.29 0.5003 0.2260 0.2738 1.47 0.79
2016-08-20 Spain 1 2017 Sevilla Espanol 73.415362 78.76 68.75 0.5952 0.1760 0.2288 1.89 0.88
2016-08-20 Spain 1 2017 Granada Villarreal 64.559709 55.69 76.79 0.3194 0.3917 0.2889 1.07 1.19
2016-08-20 Spain 1 2017 Barcelona Betis 81.054510 96.35 69.95 0.9591 0.0071 0.0337 3.40 0.42
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-02-06 Spain 2 2022 Amorebieta Oviedo 29.320147 25.46 34.56 0.3230 0.3671 0.3098 1.08 1.18
2022-02-06 Spain 2 2022 Fuenlabrada Valladolid 31.126896 23.66 45.48 0.1997 0.5334 0.2669 0.90 1.64
2022-02-06 Spain 2 2022 Cartagena Las Palmas 33.335691 32.30 34.44 0.3855 0.3313 0.2832 1.38 1.26
2022-02-06 Spain 2 2022 Tenerife Leganes 35.222038 36.71 33.85 0.4712 0.2172 0.3115 1.30 0.78
2022-02-06 Spain 2 2022 Huesca Mirandes 31.544311 33.38 29.90 0.4501 0.2616 0.2882 1.43 1.03
[2153 rows x 13 columns]
The multi-output targets:
print(Y_train)
Out:
home_win__full_time_goals draw__full_time_goals away_win__full_time_goals over_2.5__full_time_goals under_2.5__full_time_goals
0 False True False False True
1 True False False True False
2 True False False True False
3 False True False False True
4 True False False True False
... ... ... ... ... ...
2148 False True False False True
2149 False True False False True
2150 False False True False True
2151 False True False False True
2152 True False False True False
[2153 rows x 5 columns]
The odds data:
print(O_train)
Out:
market_maximum__home_win__odds market_maximum__draw__odds market_maximum__away_win__odds market_maximum__over_2.5__odds market_maximum__under_2.5__odds
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
... ... ... ... ... ...
2148 3.05 3.11 2.81 2.56 1.62
2149 3.65 3.40 2.20 2.32 1.70
2150 2.37 3.25 3.39 2.29 1.72
2151 2.22 3.20 3.93 2.66 1.53
2152 2.21 3.34 4.06 2.27 1.70
[2153 rows x 5 columns]
Classifier bettor¶
We can use ClassifierBettor
class to create
a classifier-based bettor. A DummyClassifier
is selected for convenience.
Any bettor is a classifier, therefore we can fit it on the training data.
Out:
ClassifierBettor(classifier=DummyClassifier())
We can predict probabilities for the positive class.
Out:
array([[0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948],
[0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948],
[0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948],
...,
[0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948],
[0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948],
[0.44124477, 0.28286112, 0.2758941 , 0.46679052, 0.53320948]])
We can also predict the class label.
Out:
array([[False, False, False, False, True],
[False, False, False, False, True],
[False, False, False, False, True],
...,
[False, False, False, False, True],
[False, False, False, False, True],
[False, False, False, False, True]])
Finally, we can evaluate its cross-validation accuracy.
cross_val_score(bettor, X_train, Y_train, cv=TimeSeriesSplit()).mean()
Out:
0.0
Backtesting the bettor¶
We can backtest the bettor using the historical data.
Out:
ClassifierBettor(classifier=DummyClassifier())
Various backtesting statistics are calculated.
We can also plot the portfolio value for any testing period from the above backtesting results.
testing_period = 2
bettor.backtest_plot_value_(testing_period)
Estimating the value bets¶
We extract the fixtures data to estimate the value bets.
We can estimate the value bets by using the fitted classifier.
Total running time of the script: ( 0 minutes 37.960 seconds)