Introduction¶
The goal of the project is to provide simple tools to extract sports betting data and create backtesting strategies. It integrates with other well-known Python libraries like pandas, scikit-learn and vectorbt.
Data extraction¶
The data extraction is based on the dataloader objects.
Currently, there are various dataloaders available and sports-betting
aims to include more in the future. The data are extracted in a suitable
format for modelling. Particularly, the data are always returned in a
data tuple (X, Y, O)
.
Input data¶
The input data X
are the first commponent of the data tuple. X
is a DataFrame
that contains information known before
the start of the betting event like the date, the names of the
opponents, indices related to the strength of the opponents etc. It may
also include odds data. The index of X
is a DatetimeIndex
and the data are always sorted by date.
Multi-output targets¶
The multi-output targets Y
are the second component of the data
tuple. Y
is a DataFrame
that contains information
known after the end of the betting event like goals or points
scored, fouls commited etc. Column names follow a naming convention
of the form 'betting_market__key'
. Some examples are
'home_win__full_time_goals'
, 'over_2.5__full_time_goals'
and
'draw__half_time_goals'
. More generally, 'betting_market'
prefix
is any supported betting market like home win, over 2.5, draw and home points
while 'key'
postfix is the outcome that was used to extract the targets like
'full_time_goals'
, 'half_time_goals'
and 'full_time_points'
.
Odds data¶
The odds data O
are the third component of the data tuple. O
is a
DataFrame
that contains information related to the odds for
various betting markets. Column names follow a naming convention of the form
'bookmaker__betting_market__odds'
. Some examples are
'pinnacle__home_win__odds'
, 'market_average__over_2.5_goals__odds'
and
'bet365__over_2.5__half_time_goals'
. More generally, 'bookmaker'
prefix
is any supported bookmaker or aggregation of bookmakers like Pinnacle, Bet365 and
market maximum, 'betting_market'
infix is similar to the one appearing to the
columns of Y
while 'odds'
postfix is always present to denote an odd column.
Data matching¶
An effort is made to extract data suitable for modelling. Odds data
are not always available but when they are extracted, then Y
and
O
columns always match, i.e. Y
and O
have the same shape and
'betting_market__key'
column of Y
is at the same position as the
'bookmaker__betting_market__odds'
column of O
. For example if Y
has
the columns ['home_win__full_time_goals', 'pinnacle____full_time_goals']
then
O
may have the columns ['pinnacle__home_win__odds', 'pinnacle__draw__odds']
.
Evaluation¶
The evaluation of models is based on the bettor objects. All bettors
are classifiers,
therefore they provide various methods that can be used to fit the training data and
evaluate their performance on test data. Specifically, bettors implement the
fit()
method that fits the model
to the input data X
and the multi-ouput targets Y
. The model can be a
machine learning classifier but any other model is also supported. Also the
bettors provide the predict()
and
predict_proba()
methods that
predict class labels and positive class probabilities, respectively. Additionally,
the betors provide the method backtest()
that calculates various backtesting statistics, as well as the method
bet()
that returns the value bets.