sktime.utils.data_processing

sktime.utils.data_processing.are_columns_nested(X)[source]

Checks whether any cells have nested structure in each DataFrame column.

Parameters

X (pd.DataFrame) – DataFrame to check for nested data structures.

Returns

any_nested – If True, at least one column is nested. If False, no nested columns.

Return type

bool

sktime.utils.data_processing.convert_from_dictionary(ts_dict)[source]
Simple conversion from a dictionary of each series, e.g. univariate
x = {

“Series1”: [1.0,2.0,3.0,1.0,2.0], “Series2”: [3.0,2.0,1.0,3.0,2.0],

}

or multivariate, e.g. to sktime panda format TODO: Adapt for multivariate

sktime.utils.data_processing.from_2d_array_to_nested(X, index=None, columns=None, time_index=None, cells_as_numpy=False)[source]

Convert tabular pandas DataFrame with only primitives in cells into nested pandas DataFrame with a single column.

Parameters
  • X (pd.DataFrame) –

  • cells_as_numpy (bool, default = False) – If True, then nested cells contain NumPy array If False, then nested cells contain pandas Series

  • index (array-like, shape=[n_samples], optional (default = None)) – Sample (row) index of transformed DataFrame

  • time_index (array-like, shape=[n_obs], optional (default = None)) – Time series index of transformed DataFrame

Returns

Xt – Transformed DataFrame in nested format

Return type

pd.DataFrame

sktime.utils.data_processing.from_3d_numpy_to_2d_array(X)[source]

Converts 3d NumPy array (n_instances, n_columns, n_timepoints) to a 2d NumPy array with shape (n_instances, n_columns*n_timepoints)

Parameters

X (np.ndarray) – The input 3d-NumPy array with shape (n_instances, n_columns, n_timepoints)

Returns

array_2d – A 2d-NumPy array with shape (n_instances, n_columns*n_timepoints)

Return type

np.ndarray

sktime.utils.data_processing.from_3d_numpy_to_multi_index(X, instance_index=None, time_index=None, column_names=None)[source]

Convert 3-dimensional NumPy array (n_instances, n_columns, n_timepoints) to panel data stored as pandas multi-indexed DataFrame.

Parameters
  • X (np.ndarray) – 3-dimensional NumPy array (n_instances, n_columns, n_timepoints)

  • instance_index (str) – Name of the multi-index level corresponding to the DataFrame’s instances

  • time_index (str) – Name of multi-index level corresponding to DataFrame’s timepoints

Returns

X_mi – The multi-indexed pandas DataFrame

Return type

pd.DataFrame

sktime.utils.data_processing.from_3d_numpy_to_nested(X, column_names=None, cells_as_numpy=False)[source]

Convert NumPy ndarray with shape (n_instances, n_columns, n_timepoints) into nested pandas DataFrame (with time series as pandas Series in cells)

Parameters
  • X (np.ndarray) – 3-dimensional Numpy array to convert to nested pandas DataFrame format

  • column_names (list-like, default = None) – Optional list of names to use for naming nested DataFrame’s columns

  • cells_as_numpy (bool, default = False) – If True, then nested cells contain NumPy array If False, then nested cells contain pandas Series

Returns

df

Return type

pd.DataFrame

sktime.utils.data_processing.from_long_to_nested(X_long, instance_column_name='case_id', time_column_name='reading_id', dimension_column_name='dim_id', value_column_name='value', column_names=None)[source]

Convert long DataFrame to a nested DataFrame.

Parameters
  • X_long (pd.DataFrame) – The long DataFrame

  • instance_column_name (str, default = 'case_id') – The name of column corresponding to the DataFrame’s instances.

  • time_column_name (str, default = 'reading_id') – The name of the column corresponding to the DataFrame’s timepoints.

  • dimension_column_name (str, default = 'dim_id') – The name of the column corresponding to the DataFrame’s dimensions.

  • value_column_name (str, default = 'value') – The name of the column corresponding to the DataFrame’s values.

  • column_names (list, optional) – Optional list of column names to use for nested DataFrame columns.

Returns

X_nested – Nested pandas DataFrame

Return type

pd.DataFrame

sktime.utils.data_processing.from_multi_index_to_3d_numpy(X, instance_index=None, time_index=None)[source]

Convert panel data stored as pandas multi-index DataFrame to Numpy 3-dimensional NumPy array (n_instances, n_columns, n_timepoints).

Parameters
  • X (pd.DataFrame) – The multi-index pandas DataFrame

  • instance_index (str) – Name of the multi-index level corresponding to the DataFrame’s instances

  • time_index (str) – Name of multi-index level corresponding to DataFrame’s timepoints

Returns

X_3d – 3-dimensional NumPy array (n_instances, n_columns, n_timepoints)

Return type

np.ndarray

sktime.utils.data_processing.from_multi_index_to_nested(multi_ind_dataframe, instance_index=None, cells_as_numpy=False)[source]

Converts a pandas DataFrame witha multi-index to a nested DataFrame

Parameters
  • multi_ind_dataframe (pd.DataFrame) – Input multi-indexed pandas DataFrame

  • instance_index_name (str) – The name of multi-index level corresponding to the DataFrame’s instances

  • cells_as_numpy (bool, default = False) – If True, then nested cells contain NumPy array If False, then nested cells contain pandas Series

Returns

x_nested – The nested version of the DataFrame

Return type

pd.DataFrame

sktime.utils.data_processing.from_nested_to_2d_array(X, return_numpy=False)[source]

Convert nested pandas DataFrame or Series with NumPy arrays or pandas Series in cells into tabular pandas DataFrame with primitives in cells, i.e. a data frame with the same number of rows as the input data and as many columns as there are observations in the nested series. Requires series to be have the same index.

Parameters
  • X (nested pd.DataFrame or nested pd.Series) –

  • return_numpy (bool, default = False) –

    • If True, returns a NumPy array of the tabular data.

    • If False, returns a pandas DataFrame with row and column names.

Returns

Xt – Transformed DataFrame in tabular format

Return type

pandas DataFrame

sktime.utils.data_processing.from_nested_to_3d_numpy(X)[source]

Convert nested pandas DataFrame (with time series as pandas Series in cells) into NumPy ndarray with shape (n_instances, n_columns, n_timepoints).

Parameters

X (pd.DataFrame) – Nested pandas DataFrame

Returns

X_3d – 3-dimensional NumPy array

Return type

np.ndarrray

sktime.utils.data_processing.from_nested_to_long(X, instance_column_name=None, time_column_name=None, dimension_column_name=None)[source]

Convert nested DataFrame to long DataFrame.

Parameters
  • X (pd.DataFrame) – The nested DataFrame

  • instance_column_name (str) – The name of column corresponding to the DataFrame’s instances

  • time_column_name (str) – The name of the column corresponding to the DataFrame’s timepoints.

  • dimension_column_name (str) – The name of the column corresponding to the DataFrame’s dimensions.

Returns

long_df – Long pandas DataFrame

Return type

pd.DataFrame

sktime.utils.data_processing.from_nested_to_multi_index(X, instance_index=None, time_index=None)[source]

Converts nested pandas DataFrame (with time series as pandas Series or NumPy array in cells) into multi-indexed pandas DataFrame.

Can convert mixed nested and primitive DataFrame to multi-index DataFrame.

Parameters
  • X (pd.DataFrame) – The nested DataFrame to convert to a multi-indexed pandas DataFrame

  • instance_index (str) – Name of the multi-index level corresponding to the DataFrame’s instances

  • time_index (str) – Name of multi-index level corresponding to DataFrame’s timepoints

Returns

X_mi – The multi-indexed pandas DataFrame

Return type

pd.DataFrame

sktime.utils.data_processing.is_nested_dataframe(X)[source]

Checks whether the input is a nested DataFrame.

To allow for a mixture of nested and primitive columns types the the considers whether any column is a nested np.ndarray or pd.Series.

Column is consider nested if any cells in column have a nested structure.

Parameters

X – Input that is checked to determine if it is a nested DataFrame.

Returns

Whether the input is a nested DataFrame

Return type

bool