Soccer value bets

This example illustrates how to estimate value bets for soccer fixtures by training a machine learning multi-output classifier.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

import numpy as np
import pandas as pd
from sportsbet.datasets import SoccerDataLoader
from sklearn.neighbors import KNeighborsClassifier

Extracting the training data

We extract the training data for the spanish league. We also remove any missing values and select the market average odds.

dataloader = SoccerDataLoader(param_grid={'league': ['Spain']})
X_train, Y_train, _ = dataloader.extract_train_data(
    drop_na_thres=1.0, odds_type='market_average'
)

Out:

Football-Data.co.uk: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

The input data:

X_train
home_team away_team league division year home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score match_quality
date
2016-08-19 Malaga Osasuna Spain 1 2017 72.57 56.93 0.5475 0.1897 0.2628 1.56 0.70 63.805561
2016-08-19 La Coruna Eibar Spain 1 2017 66.52 62.29 0.5003 0.2260 0.2738 1.47 0.79 64.335545
2016-08-20 Barcelona Betis Spain 1 2017 96.35 69.95 0.9591 0.0071 0.0337 3.40 0.42 81.054510
2016-08-20 Sevilla Espanol Spain 1 2017 78.76 68.75 0.5952 0.1760 0.2288 1.89 0.88 73.415362
2016-08-20 Granada Villarreal Spain 1 2017 55.69 76.79 0.3194 0.3917 0.2889 1.07 1.19 64.559709
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-12-19 Fuenlabrada Oviedo Spain 2 2022 23.93 34.47 0.3181 0.3483 0.3336 0.97 1.03 28.248873
2021-12-19 Ponferradina Amorebieta Spain 2 2022 32.85 25.44 0.4926 0.2281 0.2793 1.53 0.95 28.674009
2021-12-31 Burgos Amorebieta Spain 2 2022 25.64 25.98 0.3896 0.3090 0.3014 1.26 1.09 25.808880
2021-12-31 Oviedo Ponferradina Spain 2 2022 34.04 32.07 0.4257 0.2608 0.3135 1.24 0.91 33.025648
2021-12-31 Eibar Sociedad B Spain 2 2022 35.73 25.54 0.5271 0.2011 0.2718 1.60 0.88 29.787635

2724 rows × 13 columns



The targets:

Y_train
away_win__full_time_goals draw__full_time_goals home_win__full_time_goals over_2.5__full_time_goals under_2.5__full_time_goals
0 False True False False True
1 False False True True False
2 False False True True False
3 False False True True False
4 False True False False True
... ... ... ... ... ...
2719 False True False False True
2720 False True False False True
2721 False True False True False
2722 False False True False True
2723 False False True True False

2724 rows × 5 columns



Training a multi-output classifier

We train a KNeighborsClassifier using only numerical features from the input data. We also use the extracted targets.

num_features = [
    col
    for col in X_train.columns
    if X_train[col].dtype in (np.dtype(int), np.dtype(float))
]
clf = KNeighborsClassifier()
clf.fit(X_train[num_features], Y_train)

Out:

KNeighborsClassifier()

Extracting the fixtures data

We extract the fixtures data. The columns by default match the columns of the training data.

X_fix, _, Odds_fix = dataloader.extract_fixtures_data()

The input data:

X_fix
home_team away_team league division year home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score match_quality
date
2022-01-19 Volos NFC Apollon Greece 1 2022 28.01 17.98 0.5408 0.2233 0.2359 1.86 1.11 21.901274
2022-01-19 Toulouse Nancy France 2 2022 49.42 21.70 0.7299 0.0985 0.1716 2.48 0.80 30.157874
2022-01-19 Amiens Ajaccio France 2 2022 35.07 33.24 0.2896 0.3429 0.3674 0.78 0.87 34.130487
2022-01-19 Montpellier Troyes France 1 2022 58.58 50.44 0.4956 0.2383 0.2661 1.53 0.97 54.206113
2022-01-19 Lille Lorient France 1 2022 71.29 49.23 0.6553 0.1131 0.2316 1.74 0.56 58.241067
2022-01-19 Clermont Strasbourg France 1 2022 50.37 68.50 0.2471 0.5034 0.2496 1.11 1.68 58.052410
2022-01-19 Celta Osasuna Spain 1 2022 72.96 67.38 0.4928 0.2258 0.2814 1.46 0.89 70.059068
2022-01-19 Genk Mechelen Belgium 1 2022 57.62 45.28 0.5788 0.2222 0.1990 2.33 1.40 50.710080
2022-01-19 Trabzonspor Giresunspor Turkey 1 2022 58.51 32.62 0.6752 0.0899 0.2349 1.68 0.44 41.887330
2022-01-19 Fenerbahce Altay Turkey 1 2022 50.82 25.50 0.6821 0.1183 0.1996 2.09 0.73 33.959906
2022-01-19 Rizespor Antalyaspor Turkey 1 2022 26.65 30.45 0.3710 0.3574 0.2716 1.33 1.30 28.423555
2022-01-19 Kayserispor Buyuksehyr Turkey 1 2022 36.49 48.19 0.2791 0.4473 0.2736 1.08 1.44 41.531722
2022-01-19 Lugo Almeria Spain 2 2022 29.53 43.44 0.2335 0.4879 0.2786 0.98 1.54 35.159194
2022-01-19 Goztep Sivasspor Turkey 1 2022 34.33 47.22 0.3210 0.4018 0.2773 1.17 1.35 39.756287
2022-01-19 Valencia Sevilla Spain 1 2022 68.99 76.95 0.3376 0.3768 0.2856 1.20 1.29 72.752919
2022-01-20 Alanyaspor Hatayspor Turkey 1 2022 40.25 41.04 0.4123 0.3404 0.2473 1.63 1.46 40.641161
2022-01-20 Getafe Granada Spain 1 2022 68.92 62.73 0.4738 0.2103 0.3159 1.23 0.71 65.679477
2022-01-20 Galatasaray Kasimpasa Turkey 1 2022 46.80 35.85 0.5029 0.2358 0.2613 1.58 1.00 40.599637


The market average odds:

Odds_fix
market_average__away_win__odds market_average__draw__odds market_average__home_win__odds market_average__over_2.5__odds market_average__under_2.5__odds
0 4.52 3.64 1.74 1.97 1.81
1 6.39 4.19 1.50 1.75 2.02
2 2.42 3.00 3.14 2.25 1.62
3 4.43 3.96 1.75 1.81 2.02
4 6.33 4.33 1.51 1.83 1.98
5 2.27 3.46 3.12 1.93 1.87
6 4.45 3.41 1.89 2.22 1.67
7 4.63 4.16 1.65 1.48 2.59
8 5.30 3.93 1.62 1.91 1.89
9 7.55 4.85 1.38 1.60 2.32
10 2.65 3.37 2.58 1.95 1.85
11 2.19 3.25 3.37 1.98 1.82
12 2.18 3.16 3.39 2.15 1.68
13 2.93 3.23 2.43 2.10 1.73
14 2.38 3.17 3.23 2.31 1.62
15 2.85 3.47 2.36 1.74 2.08
16 3.90 3.10 2.13 2.59 1.49
17 4.66 3.80 1.71 1.83 1.96


Estimating the value bets

We can estimate the value bets by using the fitted classifier.

Y_pred_prob = np.concatenate(
    [prob[:, 1].reshape(-1, 1) for prob in clf.predict_proba(X_fix[num_features])],
    axis=1,
)
X_fix_info = X_fix[['home_team', 'away_team']].reset_index()
value_bets = pd.concat([X_fix_info, Y_pred_prob * Odds_fix > 1], axis=1).set_index(
    'date'
)
value_bets.rename(
    columns={
        col: col.split('__')[1] for col in value_bets.columns if col.endswith('odds')
    }
)
home_team away_team away_win draw home_win over_2.5 under_2.5
date
2022-01-19 Volos NFC Apollon False True False False True
2022-01-19 Toulouse Nancy True False False False True
2022-01-19 Amiens Ajaccio False True True False True
2022-01-19 Montpellier Troyes False False True False True
2022-01-19 Lille Lorient False False True True False
2022-01-19 Clermont Strasbourg False True True True False
2022-01-19 Celta Osasuna True False False True False
2022-01-19 Genk Mechelen True False False True False
2022-01-19 Trabzonspor Giresunspor False False True False True
2022-01-19 Fenerbahce Altay False True False False False
2022-01-19 Rizespor Antalyaspor True False False False True
2022-01-19 Kayserispor Buyuksehyr False False True False True
2022-01-19 Lugo Almeria False False True False True
2022-01-19 Goztep Sivasspor False True False True False
2022-01-19 Valencia Sevilla False True True False True
2022-01-20 Alanyaspor Hatayspor False True False False True
2022-01-20 Getafe Granada False True False True False
2022-01-20 Galatasaray Kasimpasa False True True False True


Total running time of the script: ( 0 minutes 36.774 seconds)

Gallery generated by Sphinx-Gallery