sportsbet.evaluation
.ClassifierBettor¶
-
class
sportsbet.evaluation.
ClassifierBettor
(classifier)[source]¶ Bettor based on a Scikit-Learn classifier.
Read more in the user guide.
Parameters: - classifier : classifier object
A scikit-learn classifier object implementing fit, score and predict_proba.
Examples
>>> from sklearn.tree import DecisionTreeClassifier >>> from sklearn.preprocessing import OneHotEncoder >>> from sklearn.impute import SimpleImputer >>> from sklearn.pipeline import make_pipeline >>> from sklearn.compose import make_column_transformer >>> from sportsbet.evaluation import ClassifierBettor >>> from sportsbet.datasets import FDSoccerDataLoader >>> # Select only backtesting data for the Italian league and years 2020, 2021 >>> param_grid = {'league': ['Italy'], 'year': [2020, 2021]} >>> dataloader = FDSoccerDataLoader(param_grid) >>> # Select the odds of Pinnacle bookmaker >>> X, Y, O = dataloader.extract_train_data( ... odds_type='pinnacle', ... drop_na_thres=1.0 ... ) Football-Data.co.uk... >>> # Create a pipeline to handle categorical features and missing values >>> clf_pipeline = make_pipeline( ... make_column_transformer( ... (OneHotEncoder(handle_unknown='ignore'), ['league', 'home_team', 'away_team']), ... remainder='passthrough' ... ), ... SimpleImputer(), ... DecisionTreeClassifier(random_state=0) ... ) >>> # Backtest the bettor >>> bettor = ClassifierBettor(clf_pipeline) >>> bettor.backtest(X, Y, O) ClassifierBettor(classifier=... >>> # Display backtesting results >>> bettor.backtest_results_ Training Start ... Avg Bet Yield [%] Std Bet Yield [%] ...
-
backtest
(X, Y, O, tscv=None, init_cash=1000, refit=True)¶ Backtest the bettor.
Parameters: - X :
DataFrame
object The input data. Each row of X represents information that is available before the start of a specific match. The rows should be sorted by an index named as
'date'
.- Y :
DataFrame
object The multi-output targets. Each row of Y represents information that is available after the end of a specific match. The column names follow the convention for the output data Y of the method
extract_train_data()
.- O :
DataFrame
object The odds data. Each row of O represents information that is available after the end of a specific match. The column names follow the convention for the output data
Y
of the methodextract_train_data()
.- tscv :
TimeSeriesSplit
object, default=None Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. The default value of the parameter is
None
.- init_cash : init, default=1000
The initial cash to use for backtesting.
- refit : bool, default=True
Refit the bettor using the whole input data and multi-output targets.
Returns: - self : bettor object.
The backtested bettor.
- X :
-
bet
(X, O)¶ Predict the value bets for the provided input data and odds.
Parameters: - X : {array-like, sparse matrix} of shape (n_samples, n_features)
The input data.
- O : {array-like, sparse matrix} of shape (n_samples, n_outputs)
The odds data.
Returns: - B : {array-like, sparse matrix} of shape (n_samples, n_outputs)
The value bets.
-
fit
(X, Y)¶ Fit the bettor to the input data and multi-output targets.
Parameters: - X : {array-like, sparse matrix} of shape (n_samples, n_features)
The input data.
- Y : {array-like, sparse matrix} of shape (n_samples, n_outputs)
The multi-output targets.
Returns: - self : Bettor object
The fitted bettor object.
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: - deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : dict
Parameter names mapped to their values.
-
predict
(X)¶ Predict class probabilities for multi-output targets.
Parameters: - X : {array-like, sparse matrix} of shape (n_samples, n_features)
The input data.
Returns: - Y : {array-like, sparse matrix} of shape (n_samples, n_outputs)
The positive class probabilities.
-
predict_proba
(X)¶ Predict class probabilities for multi-output targets.
Parameters: - X : {array-like, sparse matrix} of shape (n_samples, n_features)
The input data.
Returns: - Y : {array-like, sparse matrix} of shape (n_samples, n_outputs)
The positive class probabilities.
-
score
(X, y, sample_weight=None)¶ Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: - X : array-like of shape (n_samples, n_features)
Test samples.
- y : array-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weight : array-like of shape (n_samples,), default=None
Sample weights.
Returns: - score : float
Mean accuracy of
self.predict(X)
wrt. y.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters: - **params : dict
Estimator parameters.
Returns: - self : estimator instance
Estimator instance.