sportsbet.datasets
.FDSoccerDataLoader¶
-
class
sportsbet.datasets.
FDSoccerDataLoader
(param_grid=None)[source]¶ Dataloader for Football-Data.co.uk soccer data.
It downloads historical and fixtures data from Football-Data.co.uk.
Read more in the user guide.
Parameters: - param_grid : dict of str to sequence, or sequence of such parameter, default=None
It selects the type of information that the data include. The keys of dictionaries might be parameters like
'league'
or'division'
while the values are sequences of allowed values. It works in a similar way as theparam_grid
parameter of theParameterGrid
class. The default valueNone
corresponds to all parameters.
Examples
>>> from sportsbet.datasets import FDSoccerDataLoader >>> import pandas as pd >>> # Get all available parameters to select the training data >>> FDSoccerDataLoader.get_all_params() [{'division': [1], 'league': ['Argentina'], 'year': [2013]}, ... >>> # Select only the traning data for the English league and 2020, 2021 years >>> dataloader = FDSoccerDataLoader( ... param_grid={'league': ['England'], 'year': [2020, 2021]}) >>> # Get available odds types >>> dataloader.get_odds_types() [..., 'market_average', ...] >>> # Select the market average odds and drop colums with missing values >>> X_train, Y_train, O_train = dataloader.extract_train_data( ... odds_type='market_average', drop_na_thres=1.0) Football-Data.co.uk... >>> # Odds data include the selected market average odds >>> O_train.columns Index(['market_average__away_win__odds', 'market_average__draw__odds', ... >>> # Extract the corresponding fixtures data >>> X_fix, Y_fix, O_fix = dataloader.extract_fixtures_data() >>> # Training and fixtures input and odds data have the same column names >>> pd.testing.assert_index_equal(X_train.columns, X_fix.columns) >>> pd.testing.assert_index_equal(O_train.columns, O_fix.columns) >>> # Fixtures data have always no output >>> Y_fix is None True
-
extract_fixtures_data
()¶ Extract the fixtures data.
Read more in the user guide.
It returns fixtures data that can be used to make predictions for upcoming matches based on a betting strategy.
Before calling the
extract_fixtures_data()
method for the first time, theextract__data()
should be called, in order to match the columns of the input, output and odds data.The data contain information about the matches known before the start of the match, i.e. the training data
X
and the odds dataO
. The multi-output targetsY
is always equal toNone
and are only included for consistency with the methodextract_train_data()
.The
param_grid
parameter of the initialization method__init__()
has no effect on the fixtures data.Returns: - (X, None, O) : tuple of
DataFrame
objects Each of the components represent the fixtures input data
X
, the multi-output targetsY
equal toNone
and the corresponding oddsO
, respectively.
- (X, None, O) : tuple of
-
extract_train_data
(drop_na_thres=0.0, odds_type=None)¶ Extract the training data.
Read more in the user guide.
It returns historical data that can be used to create a betting strategy based on heuristics or machine learning models.
The data contain information about the matches that belong in two categories. The first category includes any information known before the start of the match, i.e. the training data
X
and the odds dataO
. The second category includes the outcomes of matches i.e. the multi-output targetsY
.The method selects only the the data allowed by the
param_grid
parameter of the initialization method__init__()
. Additionally, columns with missing values are dropped through thedrop_na_thres
parameter, while the types of odds returned is defined by theodds_type
parameter.Parameters: - drop_na_thres : float, default=0.0
The threshold that specifies the input columns to drop. It is a float in the
range. Higher values result in dropping more values. The default value
drop_na_thres=0.0
keeps all columns while the maximum valuedrop_na_thres=1.0
keeps only columns with non missing values.- odds_type : str, default=None
The selected odds type. It should be one of the available odds columns prefixes returned by the method
get_odds_types()
. Ifodds_type=None
then no odds are returned.
Returns: - (X, Y, O) : tuple of
DataFrame
objects Each of the components represent the training input data
X
, the multi-output targetsY
and the corresponding oddsO
, respectively.
-
classmethod
get_all_params
()¶ Get the available parameters.
It can be used to get the allowed names and values for the
param_grid
parameter of the dataloader object.Returns: - param_grid: list
A list of all allowed params and values.
-
classmethod
get_odds_types
()¶ Get the available odds types.
It can be used to get the allowed odds types of the dataloader’s class method
extract_train_data()
.Returns: - odds_types: list of str
A list of available odds types.
-
save
(path)¶ Save the dataloader object.
Parameters: - path : str
The path to save the object.
Returns: - self: object
The dataloader object.