sportsbet.datasets
.DummySoccerDataLoader¶
-
class
sportsbet.datasets.
DummySoccerDataLoader
(param_grid=None)[source]¶ Dataloader for soccer dummy data.
The data are provided only for convenience, since they require no downloading, and to familiarize the user with the methods of the dataloader objects.
Read more in the user guide.
Parameters: - param_grid : dict of str to sequence, or sequence of such parameter, default=None
It selects the type of information that the data include. The keys of dictionaries might be parameters like
'league'
or'division'
while the values are sequences of allowed values. It works in a similar way as theparam_grid
parameter of theParameterGrid
class. The default valueNone
corresponds to all parameters.
Examples
>>> from sportsbet.datasets import DummySoccerDataLoader >>> import pandas as pd >>> # Get all available parameters to select the training data >>> DummySoccerDataLoader.get_all_params() [{'division': 1, 'league': 'France', 'year': 2000}, ... >>> # Select only the traning data for the Spanish league >>> dataloader = DummySoccerDataLoader(param_grid={'league': ['Spain']}) >>> # Get available odds types >>> dataloader.get_odds_types() ['interwetten', 'williamhill'] >>> # Select the odds of Interwetten bookmaker >>> X_train, Y_train, O_train = dataloader.extract_train_data( ... odds_type='interwetten') >>> # Training input data >>> print(X_train) # doctest: +NORMALIZE_WHITESPACE division league year ... odds__williamhill__draw__full_time_goals date 1997-05-04 1 Spain 1997 ... 2.5 1999-03-04 2 Spain 1999 ... NaN >>> # Training output data >>> print(Y_train) output__home_win__full_time_goals ... output__away_win__full_time_goals 0 True ... False 1 False ... False >>> # Training odds data >>> print(O_train) odds__interwetten__home_win__full_time_goals ... 0 1.5 ... 1 2.5 ... >>> # Extract the corresponding fixtures data >>> X_fix, Y_fix, O_fix = dataloader.extract_fixtures_data() >>> # Training and fixtures input and odds data have the same column names >>> pd.testing.assert_index_equal(X_train.columns, X_fix.columns) >>> pd.testing.assert_index_equal(O_train.columns, O_fix.columns) >>> # Fixtures data have always no output >>> Y_fix is None True
-
extract_fixtures_data
()¶ Extract the fixtures data.
Read more in the user guide.
It returns fixtures data that can be used to make predictions for upcoming matches based on a betting strategy.
Before calling the
extract_fixtures_data()
method for the first time, theextract__data()
should be called, in order to match the columns of the input, output and odds data.The data contain information about the matches known before the start of the match, i.e. the training data
X
and the odds dataO
. The multi-output targetsY
is always equal toNone
and are only included for consistency with the methodextract_train_data()
.The
param_grid
parameter of the initialization method__init__()
has no effect on the fixtures data.Returns: - (X, None, O) : tuple of
DataFrame
objects Each of the components represent the fixtures input data
X
, the multi-output targetsY
equal toNone
and the corresponding oddsO
, respectively.
- (X, None, O) : tuple of
-
extract_train_data
(drop_na_thres=0.0, odds_type=None)¶ Extract the training data.
Read more in the user guide.
It returns historical data that can be used to create a betting strategy based on heuristics or machine learning models.
The data contain information about the matches that belong in two categories. The first category includes any information known before the start of the match, i.e. the training data
X
and the odds dataO
. The second category includes the outcomes of matches i.e. the multi-output targetsY
.The method selects only the the data allowed by the
param_grid
parameter of the initialization method__init__()
. Additionally, columns with missing values are dropped through thedrop_na_thres
parameter, while the types of odds returned is defined by theodds_type
parameter.Parameters: - drop_na_thres : float, default=0.0
The threshold that specifies the input columns to drop. It is a float in the
range. Higher values result in dropping more values. The default value
drop_na_thres=0.0
keeps all columns while the maximum valuedrop_na_thres=1.0
keeps only columns with non missing values.- odds_type : str, default=None
The selected odds type. It should be one of the available odds columns prefixes returned by the method
get_odds_types()
. Ifodds_type=None
then no odds are returned.
Returns: - (X, Y, O) : tuple of
DataFrame
objects Each of the components represent the training input data
X
, the multi-output targetsY
and the corresponding oddsO
, respectively.
-
classmethod
get_all_params
()¶ Get the available parameters.
It can be used to get the allowed names and values for the
param_grid
parameter of the dataloader object.Returns: - param_grid: list
A list of all allowed params and values.
-
get_odds_types
()¶ Get the available odds types.
It can be used to get the allowed odds types of the dataloader’s method
extract_train_data()
.Returns: - odds_types: list of str
A list of available odds types.
-
save
(path)¶ Save the dataloader object.
Parameters: - path : str
The path to save the object.
Returns: - self: object
The dataloader object.