FiveThirtyEight soccer data

This example illustrates the usage of FiveThirtyEight soccer dataloader.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

import pandas as pd
from sportsbet.datasets import FTESoccerDataLoader

Getting the available parameters

We can get the available parameters in order to select the training data to be extracted, using the get_all_params() class method.

The available parameters can be presented as a DataFrame.

params_df = pd.DataFrame(params).sort_values(
    ['league', 'year', 'division'], ignore_index=True
)
params_df
division league year
0 1 Argentina 2018
1 1 Argentina 2019
2 1 Argentina 2020
3 1 Argentina 2022
4 1 Australia 2019
... ... ... ...
174 1 USA 2022
175 1 United-Soccer-League 2019
176 1 United-Soccer-League 2020
177 1 United-Soccer-League 2021
178 1 United-Soccer-League 2022

179 rows × 3 columns



We select to extract training data only for the year 2021 of all the divisions of English league.

param_grid = {'league': ['England'], 'year': [2021]}

Getting the available odds types

We can get the available odds types in order to match the output of the training data, using the get_odds_types() class method.

Out:

[]

Therefore no odds data are available.

Extracting the training data

We extract the training data using the default values for the parameters odds_type and drop_na_thres.

The input data:

league home_team away_team home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score home_team_match_importance away_team_match_importance division year match_quality
date
2020-09-11 England Watford Middlesbrough 65.12 46.31 0.6387 0.1423 0.2190 2.06 0.85 53.0 16.5 2 2021 54.127384
2020-09-12 England Oldham Athletic Leyton Orient 18.46 12.80 0.5072 0.2612 0.2317 1.90 1.31 40.7 23.6 4 2021 15.117594
2020-09-12 England Salford City Exeter City 23.97 15.12 0.5480 0.2178 0.2342 1.86 1.09 61.4 39.4 4 2021 18.543177
2020-09-12 England Barrow Stevenage 11.43 7.54 0.4562 0.2750 0.2689 1.48 1.08 19.1 20.6 4 2021 9.086157
2020-09-12 England Port Vale Crawley Town 11.37 12.36 0.3885 0.3404 0.2711 1.36 1.26 24.0 26.0 4 2021 11.844349
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-05-23 England Wolverhampton Manchester United 69.50 85.69 0.2910 0.4464 0.2627 1.16 1.50 0.0 0.0 1 2021 76.750499
2021-05-23 England Morecambe Tranmere Rovers 18.07 10.76 0.4706 0.2586 0.2708 1.47 1.02 100.0 100.0 4 2021 13.488255
2021-05-29 England Brentford Swansea City 66.87 47.28 0.7221 0.2779 0.0000 1.58 0.79 100.0 100.0 2 2021 55.394018
2021-05-30 England Blackpool Lincoln City 35.10 28.07 0.6188 0.3812 0.0000 1.35 0.94 100.0 100.0 3 2021 31.193826
2021-05-31 England Morecambe Newport County 17.71 12.81 0.5309 0.4691 0.0000 1.23 1.13 100.0 100.0 4 2021 14.866651

2051 rows × 15 columns



The targets:

away_win__full_time_goals draw__full_time_goals home_win__full_time_goals over_1.5__full_time_goals over_2.5__full_time_goals over_3.5__full_time_goals over_4.5__full_time_goals under_1.5__full_time_goals under_2.5__full_time_goals under_3.5__full_time_goals under_4.5__full_time_goals
0 False False True False False False False True True True True
1 True False False False False False False True True True True
2 False True False True True True False False False False True
3 False True False True False False False False True True True
4 False False True True False False False False True True True
... ... ... ... ... ... ... ... ... ... ... ...
2046 True False False True True False False False False True True
2047 False True False True False False False False True True True
2048 False False True True False False False False True True True
2049 False False True True True False False False False True True
2050 False False True False False False False True True True True

2051 rows × 11 columns



Extracting the fixtures data

We extract the fixtures data with columns that match the columns of the training data. On the other hand, the fixtures data are not affected by the param_grid selection.

The input data:

league home_team away_team home_team_soccer_power_index away_team_soccer_power_index home_team_probability_win away_team_probability_win probability_draw home_team_projected_score away_team_projected_score home_team_match_importance away_team_match_importance division year match_quality
date
2022-01-26 Belgium Anderlecht Cercle Brugge 59.47 43.24 0.6332 0.1502 0.2166 1.94 0.82 57.4 2.6 1 2022 50.072686
2022-01-26 Belgium RFC Seraing KFCO Beerschot-Wilrijk 23.62 26.52 0.3999 0.3381 0.2620 1.42 1.29 54.1 42.0 1 2022 24.986135
2022-01-26 Scotland Hearts Celtic 42.55 65.23 0.1885 0.5805 0.2310 0.93 1.81 17.5 56.6 1 2022 51.503739
2022-01-26 Scotland Motherwell Hibernian 34.75 38.96 0.3566 0.3590 0.2844 1.17 1.17 1.5 1.2 1 2022 36.734771
2022-01-26 Scotland Rangers Livingston 66.73 32.39 0.7946 0.0481 0.1573 2.16 0.35 76.5 11.2 1 2022 43.611475
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-05-29 Spain Sporting Gijón Las Palmas 27.67 34.45 0.3403 0.3584 0.3013 1.17 1.21 NaN NaN 2 2022 30.690003
2022-05-29 Spain Mirandes Fuenlabrada 28.39 24.68 0.4453 0.2584 0.2963 1.37 0.97 NaN NaN 2 2022 26.405321
2022-05-29 Spain Real Oviedo UD Ibiza 33.67 31.70 0.4426 0.2637 0.2937 1.39 1.00 NaN NaN 2 2022 32.655316
2022-05-29 Spain Lugo Málaga 30.51 25.64 0.4657 0.2496 0.2847 1.47 1.00 NaN NaN 2 2022 27.863808
2022-05-29 Spain Real Valladolid SD Huesca 45.18 33.19 0.5291 0.2027 0.2682 1.63 0.91 NaN NaN 2 2022 38.267812

3413 rows × 15 columns



Total running time of the script: ( 0 minutes 12.774 seconds)

Gallery generated by Sphinx-Gallery