Note
Click here to download the full example code
Classifier bettor¶
This example illustrates how to use a classifier-based bettor and evaluate its performance on soccer historical data.
# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT
from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_val_score, TimeSeriesSplit
Extracting the training data¶
We extract the training data for the Spanish league. We also remove columns that contain missing values and select the market maximum odds.
dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, O_train = dataloader.extract_train_data(
drop_na_thres=1.0, odds_type='market_maximum'
)
Out:
Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
The input data:
print(X_train)
Out:
home_team away_team league division year home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score match_quality
date
2016-08-19 La Coruna Eibar Spain 1 2017 66.52 62.29 0.5003 0.2260 0.2738 1.47 0.79 64.335545
2016-08-19 Malaga Osasuna Spain 1 2017 72.57 56.93 0.5475 0.1897 0.2628 1.56 0.70 63.805561
2016-08-20 Barcelona Betis Spain 1 2017 96.35 69.95 0.9591 0.0071 0.0337 3.40 0.42 81.054510
2016-08-20 Granada Villarreal Spain 1 2017 55.69 76.79 0.3194 0.3917 0.2889 1.07 1.19 64.559709
2016-08-20 Sevilla Espanol Spain 1 2017 78.76 68.75 0.5952 0.1760 0.2288 1.89 0.88 73.415362
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-01-30 Las Palmas Sociedad B Spain 2 2022 34.45 25.65 0.4938 0.2206 0.2856 1.48 0.90 29.405740
2022-01-30 Leganes Alcorcon Spain 2 2022 34.16 22.38 0.5720 0.1584 0.2696 1.60 0.71 27.042830
2022-01-30 Ponferradina Tenerife Spain 2 2022 30.87 36.36 0.3169 0.3649 0.3182 1.03 1.13 33.390843
2022-01-30 Cartagena Fuenlabrada Spain 2 2022 30.99 24.68 0.4450 0.2639 0.2912 1.41 1.02 27.477392
2022-01-31 Ibiza Zaragoza Spain 2 2022 31.70 30.06 0.4019 0.2920 0.3062 1.25 1.02 30.858225
[3803 rows x 13 columns]
The multi-output targets:
print(Y_train)
Out:
away_win__full_time_goals draw__full_time_goals home_win__full_time_goals over_2.5__full_time_goals under_2.5__full_time_goals
0 False False True True False
1 False True False False True
2 False False True True False
3 False True False False True
4 False False True True False
... ... ... ... ... ...
3798 False True False False True
3799 False False True False True
3800 True False False True False
3801 False False True True False
3802 False True False True False
[3803 rows x 5 columns]
The odds data:
print(O_train)
Out:
market_maximum__away_win__odds market_maximum__draw__odds market_maximum__home_win__odds market_maximum__over_2.5__odds market_maximum__under_2.5__odds
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
... ... ... ... ... ...
3798 6.56 4.00 1.62 2.19 1.80
3799 5.60 3.71 1.78 2.31 1.68
3800 3.23 3.03 2.63 2.77 1.50
3801 4.22 3.41 2.00 2.31 1.68
3802 4.10 3.34 2.19 2.45 1.64
[3803 rows x 5 columns]
Classifier bettor¶
We can use ClassifierBettor
class to create
a classifier-based bettor. A DummyClassifier
is selected for convenience.
Any bettor is a classifier, therefore we can fit it on the training data.
Out:
ClassifierBettor(classifier=DummyClassifier())
We can predict probabilities for the positive class.
Out:
array([[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461],
[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461],
[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461],
...,
[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461],
[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461],
[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461]])
We can also predict the class label.
Out:
array([[False, False, False, False, True],
[False, False, False, False, True],
[False, False, False, False, True],
...,
[False, False, False, False, True],
[False, False, False, False, True],
[False, False, False, False, True]])
Finally, we can evaluate its cross-validation accuracy.
cross_val_score(bettor, X_train, Y_train, cv=TimeSeriesSplit()).mean()
Out:
0.0
Backtesting the bettor¶
We can backtest the bettor using the historical data.
Out:
ClassifierBettor(classifier=DummyClassifier())
Various backtesting statistics are calculated.
We can also plot the portfolio value for any testing period from the above backtesting results.
testing_period = 2
bettor.backtest_plot_value_(testing_period)
Estimating the value bets¶
We extract the fixtures data to estimate the value bets.
We can estimate the value bets by using the fitted classifier.
Total running time of the script: ( 0 minutes 45.093 seconds)