sktime.utils.data_io

exception sktime.utils.data_io.LongFormatDataParseException[source]

Bases: Exception

Should be raised when parsing a .csv file with long-formatted date and the format is incorrect

exception sktime.utils.data_io.TsFileParseException[source]

Bases: Exception

Should be raised when parsing a .ts file and the format is incorrect.

sktime.utils.data_io.generate_example_long_table(num_cases=50, series_len=20, num_dims=2)[source]

Generates example from long table format file.

Parameters
  • num_cases (int) – Number of cases.

  • series_len (int) – Length of the series.

  • num_dims (int) – Number of dimensions.

Returns

Return type

DataFrame

sktime.utils.data_io.load_from_arff_to_dataframe(full_file_path_and_name, has_class_labels=True, return_separate_X_and_y=True, replace_missing_vals_with='NaN')[source]

Loads data from a .ts file into a Pandas DataFrame.

Parameters
  • full_file_path_and_name (str) – The full pathname of the .ts file to read.

  • has_class_labels (bool) – true then line contains separated strings and class value contains list of separated strings, check for ‘return_separate_X_and_y’ false otherwise.

  • return_separate_X_and_y (bool) – true then X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data.

  • replace_missing_vals_with (str) – The value that missing values in the text file should be replaced with prior to parsing.

Returns

  • DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.

  • DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.

sktime.utils.data_io.load_from_long_to_dataframe(full_file_path_and_name, separator=',')[source]

Loads data from a long format file into a Pandas DataFrame.

Parameters
  • full_file_path_and_name (str) – The full pathname of the .csv file to read.

  • separator (str) – The character that the csv uses as a delimiter

Returns

A dataframe with sktime-formatted data

Return type

DataFrame

sktime.utils.data_io.load_from_tsfile_to_dataframe(full_file_path_and_name, return_separate_X_and_y=True, replace_missing_vals_with='NaN')[source]

Loads data from a .ts file into a Pandas DataFrame.

Parameters
  • full_file_path_and_name (str) – The full pathname of the .ts file to read.

  • return_separate_X_and_y (bool) – true if X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data that

  • replace_missing_vals_with (str) – The value that missing values in the text file should be replaced with prior to parsing.

Returns

  • DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.

  • DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.

sktime.utils.data_io.load_from_ucr_tsv_to_dataframe(full_file_path_and_name, return_separate_X_and_y=True)[source]

Loads data from a .tsv file into a Pandas DataFrame.

Parameters
  • full_file_path_and_name (str) – The full pathname of the .tsv file to read.

  • return_separate_X_and_y (bool) – true then X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data.

Returns

  • DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.

  • DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.

sktime.utils.data_io.make_multi_index_dataframe(n_instances=50, n_columns=3, n_timepoints=20)[source]

Generates example multi-index DataFrame.

Parameters
  • n_instances (int) – Number of instances.

  • n_columns (int) – Number of columns (series) in multi-indexed DataFrame.

  • n_timepoints (int) – Number of timepoints per instance-column pair.

Returns

mi_df – The multi-indexed DataFrame with shape (n_instances*n_timepoints, n_column).

Return type

pd.DataFrame

sktime.utils.data_io.write_dataframe_to_tsfile(data, path, problem_name='sample_data', timestamp=False, univariate=True, class_label=None, class_value_list=None, equal_length=False, series_length=- 1, missing_values='NaN', comment=None)[source]

Output a dataset in dataframe format to .ts file :param data: the dataset in a dataframe to be written as a ts file

which must be of the structure specified in the documentation https://github.com/whackteachers/sktime/blob/master/examples/loading_data.ipynb index | dim_0 | dim_1 | … | dim_c-1

0 | pd.Series | pd.Series | pd.Series | pd.Series 1 | pd.Series | pd.Series | pd.Series | pd.Series

… | … | … | … | …

n | pd.Series | pd.Series | pd.Series | pd.Series

Parameters
  • path (str) – The full path to output the ts file

  • problem_name (str) – The problemName to print in the header of the ts file and also the name of the file.

  • timestamp ({False, bool}, optional) – Indicate whether the data contains timestamps in the header.

  • univariate ({True, bool}, optional) – Indicate whether the data is univariate or multivariate in the header. If univariate, only the first dimension will be written to file

  • class_label ({list, None}, optional) – Provide class label to show the possible class values for classification problems in the header.

  • class_value_list ({list/ndarray, []}, optional) – ndarray containing the class values for each case in classification problems

  • equal_length ({False, bool}, optional) – Indicate whether each series has equal length. It only write to file if true.

  • series_length ({-1, int}, optional) – Indicate each series length if they are of equal length. It only write to file if true.

  • missing_values ({NaN, str}, optional) – Representation for missing value, default is NaN.

  • comment ({None, str}, optional) – Comment text to be inserted before the header in a block.

Returns

Return type

None

Notes

This version currently does not support writing timestamp data.

References

The code for writing series data into file is adopted from https://stackoverflow.com/questions/37877708/ how-to-turn-a-pandas-dataframe-row-into-a-comma-separated-string

sktime.utils.data_io.write_results_to_uea_format(path, strategy_name, dataset_name, y_true, y_pred, split='TEST', resample_seed=0, y_proba=None, second_line='N/A')[source]