sktime.utils.data_processing¶
-
sktime.utils.data_processing.
are_columns_nested
(X)[source]¶ Checks whether any cells have nested structure in each DataFrame column.
- Parameters
X (pd.DataFrame) – DataFrame to check for nested data structures.
- Returns
any_nested – If True, at least one column is nested. If False, no nested columns.
- Return type
-
sktime.utils.data_processing.
convert_from_dictionary
(ts_dict)[source]¶ - Simple conversion from a dictionary of each series, e.g. univariate
- x = {
“Series1”: [1.0,2.0,3.0,1.0,2.0], “Series2”: [3.0,2.0,1.0,3.0,2.0],
}
or multivariate, e.g. to sktime panda format TODO: Adapt for multivariate
-
sktime.utils.data_processing.
from_2d_array_to_nested
(X, index=None, columns=None, time_index=None, cells_as_numpy=False)[source]¶ Convert tabular pandas DataFrame with only primitives in cells into nested pandas DataFrame with a single column.
- Parameters
X (pd.DataFrame) –
cells_as_numpy (bool, default = False) – If True, then nested cells contain NumPy array If False, then nested cells contain pandas Series
index (array-like, shape=[n_samples], optional (default = None)) – Sample (row) index of transformed DataFrame
time_index (array-like, shape=[n_obs], optional (default = None)) – Time series index of transformed DataFrame
- Returns
Xt – Transformed DataFrame in nested format
- Return type
pd.DataFrame
-
sktime.utils.data_processing.
from_3d_numpy_to_2d_array
(X)[source]¶ Converts 3d NumPy array (n_instances, n_columns, n_timepoints) to a 2d NumPy array with shape (n_instances, n_columns*n_timepoints)
- Parameters
X (np.ndarray) – The input 3d-NumPy array with shape (n_instances, n_columns, n_timepoints)
- Returns
array_2d – A 2d-NumPy array with shape (n_instances, n_columns*n_timepoints)
- Return type
np.ndarray
-
sktime.utils.data_processing.
from_3d_numpy_to_multi_index
(X, instance_index=None, time_index=None, column_names=None)[source]¶ Convert 3-dimensional NumPy array (n_instances, n_columns, n_timepoints) to panel data stored as pandas multi-indexed DataFrame.
- Parameters
- Returns
X_mi – The multi-indexed pandas DataFrame
- Return type
pd.DataFrame
-
sktime.utils.data_processing.
from_3d_numpy_to_nested
(X, column_names=None, cells_as_numpy=False)[source]¶ Convert NumPy ndarray with shape (n_instances, n_columns, n_timepoints) into nested pandas DataFrame (with time series as pandas Series in cells)
- Parameters
X (np.ndarray) – 3-dimensional Numpy array to convert to nested pandas DataFrame format
column_names (list-like, default = None) – Optional list of names to use for naming nested DataFrame’s columns
cells_as_numpy (bool, default = False) – If True, then nested cells contain NumPy array If False, then nested cells contain pandas Series
- Returns
df
- Return type
pd.DataFrame
-
sktime.utils.data_processing.
from_long_to_nested
(X_long, instance_column_name='case_id', time_column_name='reading_id', dimension_column_name='dim_id', value_column_name='value', column_names=None)[source]¶ Convert long DataFrame to a nested DataFrame.
- Parameters
X_long (pd.DataFrame) – The long DataFrame
instance_column_name (str, default = 'case_id') – The name of column corresponding to the DataFrame’s instances.
time_column_name (str, default = 'reading_id') – The name of the column corresponding to the DataFrame’s timepoints.
dimension_column_name (str, default = 'dim_id') – The name of the column corresponding to the DataFrame’s dimensions.
value_column_name (str, default = 'value') – The name of the column corresponding to the DataFrame’s values.
column_names (list, optional) – Optional list of column names to use for nested DataFrame columns.
- Returns
X_nested – Nested pandas DataFrame
- Return type
pd.DataFrame
-
sktime.utils.data_processing.
from_multi_index_to_3d_numpy
(X, instance_index=None, time_index=None)[source]¶ Convert panel data stored as pandas multi-index DataFrame to Numpy 3-dimensional NumPy array (n_instances, n_columns, n_timepoints).
- Parameters
- Returns
X_3d – 3-dimensional NumPy array (n_instances, n_columns, n_timepoints)
- Return type
np.ndarray
-
sktime.utils.data_processing.
from_multi_index_to_nested
(multi_ind_dataframe, instance_index=None, cells_as_numpy=False)[source]¶ Converts a pandas DataFrame witha multi-index to a nested DataFrame
- Parameters
multi_ind_dataframe (pd.DataFrame) – Input multi-indexed pandas DataFrame
instance_index_name (str) – The name of multi-index level corresponding to the DataFrame’s instances
cells_as_numpy (bool, default = False) – If True, then nested cells contain NumPy array If False, then nested cells contain pandas Series
- Returns
x_nested – The nested version of the DataFrame
- Return type
pd.DataFrame
-
sktime.utils.data_processing.
from_nested_to_2d_array
(X, return_numpy=False)[source]¶ Convert nested pandas DataFrame or Series with NumPy arrays or pandas Series in cells into tabular pandas DataFrame with primitives in cells, i.e. a data frame with the same number of rows as the input data and as many columns as there are observations in the nested series. Requires series to be have the same index.
- Parameters
X (nested pd.DataFrame or nested pd.Series) –
return_numpy (bool, default = False) –
If True, returns a NumPy array of the tabular data.
If False, returns a pandas DataFrame with row and column names.
- Returns
Xt – Transformed DataFrame in tabular format
- Return type
pandas DataFrame
-
sktime.utils.data_processing.
from_nested_to_3d_numpy
(X)[source]¶ Convert nested pandas DataFrame (with time series as pandas Series in cells) into NumPy ndarray with shape (n_instances, n_columns, n_timepoints).
- Parameters
X (pd.DataFrame) – Nested pandas DataFrame
- Returns
X_3d – 3-dimensional NumPy array
- Return type
np.ndarrray
-
sktime.utils.data_processing.
from_nested_to_long
(X, instance_column_name=None, time_column_name=None, dimension_column_name=None)[source]¶ Convert nested DataFrame to long DataFrame.
- Parameters
X (pd.DataFrame) – The nested DataFrame
instance_column_name (str) – The name of column corresponding to the DataFrame’s instances
time_column_name (str) – The name of the column corresponding to the DataFrame’s timepoints.
dimension_column_name (str) – The name of the column corresponding to the DataFrame’s dimensions.
- Returns
long_df – Long pandas DataFrame
- Return type
pd.DataFrame
-
sktime.utils.data_processing.
from_nested_to_multi_index
(X, instance_index=None, time_index=None)[source]¶ Converts nested pandas DataFrame (with time series as pandas Series or NumPy array in cells) into multi-indexed pandas DataFrame.
Can convert mixed nested and primitive DataFrame to multi-index DataFrame.
- Parameters
- Returns
X_mi – The multi-indexed pandas DataFrame
- Return type
pd.DataFrame
-
sktime.utils.data_processing.
is_nested_dataframe
(X)[source]¶ Checks whether the input is a nested DataFrame.
To allow for a mixture of nested and primitive columns types the the considers whether any column is a nested np.ndarray or pd.Series.
Column is consider nested if any cells in column have a nested structure.
- Parameters
X – Input that is checked to determine if it is a nested DataFrame.
- Returns
Whether the input is a nested DataFrame
- Return type