FiveThirtyEight soccer data

This example illustrates the usage of FiveThirtyEight soccer dataloader.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

import pandas as pd
from sportsbet.datasets import FTESoccerDataLoader

Getting the available parameters

We can get the available parameters in order to select the training data to be extracted, using the get_all_params() class method.

The available parameters can be presented as a DataFrame.

params_df = pd.DataFrame(params).sort_values(
    ['league', 'year', 'division'], ignore_index=True
)
params_df
division league year
0 1 Argentina 2018
1 1 Argentina 2019
2 1 Argentina 2020
3 1 Argentina 2022
4 1 Australia 2019
... ... ... ...
174 1 USA 2022
175 1 United-Soccer-League 2019
176 1 United-Soccer-League 2020
177 1 United-Soccer-League 2021
178 1 United-Soccer-League 2022

179 rows × 3 columns



We select to extract training data only for the year 2021 of all the divisions of English league.

param_grid = {'league': ['England'], 'year': [2021]}

Getting the available odds types

We can get the available odds types in order to match the output of the training data, using the get_odds_types() class method.

Out:

[]

Therefore no odds data are available.

Extracting the training data

We extract the training data using the default values for the parameters odds_type and drop_na_thres.

The input data:

X_train
league home_team away_team home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score home_team_match_importance away_team_match_importance division year match_quality
date
2020-09-11 England Watford Middlesbrough 65.12 46.31 0.6387 0.1423 0.2190 2.06 0.85 53.0 16.5 2 2021 54.127384
2020-09-12 England AFC Bournemouth Blackburn 65.15 47.18 0.6309 0.1492 0.2199 2.06 0.89 55.9 20.2 2 2021 54.727624
2020-09-12 England Millwall Stoke City 48.58 53.50 0.3535 0.3645 0.2820 1.31 1.34 16.3 24.1 2 2021 50.921434
2020-09-12 England Derby County Reading 47.33 44.25 0.4367 0.2891 0.2742 1.52 1.19 16.6 22.0 2 2021 45.738207
2020-09-12 England Cardiff City Sheffield Wednesday 51.14 45.38 0.4510 0.2754 0.2736 1.54 1.15 18.6 37.2 2 2021 48.088131
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-05-18 England Oxford United Blackpool 34.85 32.66 0.3761 0.3517 0.2722 1.33 1.28 100.0 100.0 3 2021 33.719479
2021-05-19 England Lincoln City Sunderland 26.97 33.16 0.3314 0.3961 0.2725 1.22 1.36 100.0 100.0 3 2021 29.746389
2021-05-21 England Blackpool Oxford United 32.11 33.66 0.4339 0.2807 0.2854 1.33 1.01 100.0 100.0 3 2021 32.866736
2021-05-22 England Sunderland Lincoln City 31.40 28.48 0.4421 0.2867 0.2712 1.44 1.11 100.0 100.0 3 2021 29.868804
2021-05-30 England Blackpool Lincoln City 35.10 28.07 0.6188 0.3812 0.0000 1.35 0.94 100.0 100.0 3 2021 31.193826

2051 rows × 15 columns



The targets:

Y_train
away_win__full_time_goals draw__full_time_goals home_win__full_time_goals over_1.5__full_time_goals over_2.5__full_time_goals over_3.5__full_time_goals over_4.5__full_time_goals under_1.5__full_time_goals under_2.5__full_time_goals under_3.5__full_time_goals under_4.5__full_time_goals
0 False False True False False False False True True True True
1 False False True True True True True False False False False
2 False True False False False False False True True True True
3 True False False True False False False False True True True
4 True False False True False False False False True True True
... ... ... ... ... ... ... ... ... ... ... ...
2046 True False False True True False False False False True True
2047 False False True True False False False False True True True
2048 False True False True True True True False False False False
2049 False False True True True False False False False True True
2050 False False True True True False False False False True True

2051 rows × 11 columns



Extracting the fixtures data

We extract the fixtures data with columns that match the columns of the training data. On the other hand, the fixtures data are not affected by the param_grid selection.

The input data:

X_fix
league home_team away_team home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score home_team_match_importance away_team_match_importance division year match_quality
date
2022-01-19 France Lille Lorient 71.29 49.23 0.6553 0.1131 0.2316 1.74 0.56 39.3 38.8 1 2022 58.241067
2022-01-19 England Leicester City Tottenham Hotspur 72.87 79.72 0.3177 0.4439 0.2384 1.44 1.73 20.2 70.4 1 2022 76.141246
2022-01-19 England Hull City Blackburn 34.62 49.99 0.2480 0.4697 0.2823 0.93 1.40 42.3 52.8 2 2022 40.908966
2022-01-19 Belgium Genk KV Mechelen 57.62 45.28 0.5788 0.2222 0.1990 2.33 1.40 21.9 14.8 1 2022 50.710080
2022-01-19 France Amiens AC Ajaccio 35.07 33.24 0.2896 0.3429 0.3674 0.78 0.87 10.3 72.1 2 2022 34.130487
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-05-29 Spain Leganes Almeria 37.00 43.44 0.3469 0.3361 0.3170 1.10 1.08 NaN NaN 2 2022 39.962208
2022-05-29 Spain Real Oviedo UD Ibiza 35.70 29.97 0.4813 0.2240 0.2947 1.41 0.87 NaN NaN 2 2022 32.585016
2022-05-29 Spain Tenerife FC Cartagena 33.87 30.22 0.4638 0.2406 0.2956 1.39 0.92 NaN NaN 2 2022 31.941064
2022-05-29 Spain Real Sociedad II Real Zaragoza 26.28 30.12 0.3558 0.3206 0.3237 1.08 1.01 NaN NaN 2 2022 28.069277
2022-05-29 Spain AD Alcorcon Eibar 22.04 35.27 0.2750 0.4532 0.2718 1.17 1.57 NaN NaN 2 2022 27.127929

3635 rows × 15 columns



Total running time of the script: ( 0 minutes 13.316 seconds)

Gallery generated by Sphinx-Gallery