Note
Click here to download the full example code
Classifier bettor¶
This example illustrates how to use a classifier-based bettor and evaluate its performance on soccer historical data.
# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT
from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_val_score, TimeSeriesSplit
Extracting the training data¶
We extract the training data for the Spanish league. We also remove columns that contain missing values and select the market maximum odds.
dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, O_train = dataloader.extract_train_data(
drop_na_thres=1.0, odds_type='market_maximum'
)
Out:
Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
The input data:
print(X_train)
Out:
home_team away_team league ... home_team_projected_score away_team_projected_score match_quality
date ...
2016-08-19 La Coruna Eibar Spain ... 1.47 0.79 64.335545
2016-08-19 Malaga Osasuna Spain ... 1.56 0.70 63.805561
2016-08-20 Barcelona Betis Spain ... 3.40 0.42 81.054510
2016-08-20 Granada Villarreal Spain ... 1.07 1.19 64.559709
2016-08-20 Sevilla Espanol Spain ... 1.89 0.88 73.415362
... ... ... ... ... ... ... ...
2022-01-30 Cartagena Fuenlabrada Spain ... 1.41 1.02 27.477392
2022-01-30 Las Palmas Sociedad B Spain ... 1.48 0.90 29.405740
2022-01-30 Mirandes Malaga Spain ... 1.45 1.04 26.945016
2022-01-30 Leganes Alcorcon Spain ... 1.60 0.71 27.042830
2022-01-31 Ibiza Zaragoza Spain ... 1.25 1.02 30.858225
[3803 rows x 13 columns]
The multi-output targets:
print(Y_train)
Out:
away_win__full_time_goals draw__full_time_goals ... over_2.5__full_time_goals under_2.5__full_time_goals
0 False False ... True False
1 False True ... False True
2 False False ... True False
3 False True ... False True
4 False False ... True False
... ... ... ... ... ...
3798 False False ... True False
3799 False True ... False True
3800 False False ... True False
3801 False False ... False True
3802 False True ... True False
[3803 rows x 5 columns]
The odds data:
print(O_train)
Out:
market_maximum__away_win__odds ... market_maximum__under_2.5__odds
0 NaN ... NaN
1 NaN ... NaN
2 NaN ... NaN
3 NaN ... NaN
4 NaN ... NaN
... ... ... ...
3798 4.22 ... 1.68
3799 6.56 ... 1.80
3800 3.67 ... 1.72
3801 5.60 ... 1.68
3802 4.10 ... 1.64
[3803 rows x 5 columns]
Classifier bettor¶
We can use ClassifierBettor
class to create
a classifier-based bettor. A DummyClassifier
is selected for convenience.
Any bettor is a classifier, therefore we can fit it on the training data.
Out:
ClassifierBettor(classifier=DummyClassifier())
We can predict probabilities for the positive class.
Out:
array([[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461],
[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461],
[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461],
...,
[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461],
[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461],
[0.26531686, 0.29555614, 0.439127 , 0.42650539, 0.57349461]])
We can also predict the class label.
Out:
array([[False, False, False, False, True],
[False, False, False, False, True],
[False, False, False, False, True],
...,
[False, False, False, False, True],
[False, False, False, False, True],
[False, False, False, False, True]])
Finally, we can evaluate its cross-validation accuracy.
cross_val_score(bettor, X_train, Y_train, cv=TimeSeriesSplit()).mean()
Out:
0.0
Backtesting the bettor¶
We can backtest the bettor using the historical data.
Out:
ClassifierBettor(classifier=DummyClassifier())
Various backtesting statistics are calculated.
We can also plot the portfolio value for any testing period from the above backtesting results.
testing_period = 2
bettor.backtest_plot_value_(testing_period)
Estimating the value bets¶
We extract the fixtures data to estimate the value bets.
We can estimate the value bets by using the fitted classifier.
Total running time of the script: ( 0 minutes 44.955 seconds)