sktime.utils.data_io¶
-
exception
sktime.utils.data_io.
LongFormatDataParseException
[source]¶ Bases:
Exception
Should be raised when parsing a .csv file with long-formatted date and the format is incorrect
-
exception
sktime.utils.data_io.
TsFileParseException
[source]¶ Bases:
Exception
Should be raised when parsing a .ts file and the format is incorrect.
-
sktime.utils.data_io.
generate_example_long_table
(num_cases=50, series_len=20, num_dims=2)[source]¶ Generates example from long table format file.
-
sktime.utils.data_io.
load_from_arff_to_dataframe
(full_file_path_and_name, has_class_labels=True, return_separate_X_and_y=True, replace_missing_vals_with='NaN')[source]¶ Loads data from a .ts file into a Pandas DataFrame.
- Parameters
full_file_path_and_name (str) – The full pathname of the .ts file to read.
has_class_labels (bool) – true then line contains separated strings and class value contains list of separated strings, check for ‘return_separate_X_and_y’ false otherwise.
return_separate_X_and_y (bool) – true then X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data.
replace_missing_vals_with (str) – The value that missing values in the text file should be replaced with prior to parsing.
- Returns
DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.
DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.
-
sktime.utils.data_io.
load_from_long_to_dataframe
(full_file_path_and_name, separator=',')[source]¶ Loads data from a long format file into a Pandas DataFrame.
-
sktime.utils.data_io.
load_from_tsfile_to_dataframe
(full_file_path_and_name, return_separate_X_and_y=True, replace_missing_vals_with='NaN')[source]¶ Loads data from a .ts file into a Pandas DataFrame.
- Parameters
full_file_path_and_name (str) – The full pathname of the .ts file to read.
return_separate_X_and_y (bool) – true if X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data that
replace_missing_vals_with (str) – The value that missing values in the text file should be replaced with prior to parsing.
- Returns
DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.
DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.
-
sktime.utils.data_io.
load_from_ucr_tsv_to_dataframe
(full_file_path_and_name, return_separate_X_and_y=True)[source]¶ Loads data from a .tsv file into a Pandas DataFrame.
- Parameters
- Returns
DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.
DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.
-
sktime.utils.data_io.
make_multi_index_dataframe
(n_instances=50, n_columns=3, n_timepoints=20)[source]¶ Generates example multi-index DataFrame.
- Parameters
- Returns
mi_df – The multi-indexed DataFrame with shape (n_instances*n_timepoints, n_column).
- Return type
pd.DataFrame
-
sktime.utils.data_io.
write_dataframe_to_tsfile
(data, path, problem_name='sample_data', timestamp=False, univariate=True, class_label=None, class_value_list=None, equal_length=False, series_length=- 1, missing_values='NaN', comment=None)[source]¶ Output a dataset in dataframe format to .ts file :param data: the dataset in a dataframe to be written as a ts file
which must be of the structure specified in the documentation https://github.com/whackteachers/sktime/blob/master/examples/loading_data.ipynb index | dim_0 | dim_1 | … | dim_c-1
0 | pd.Series | pd.Series | pd.Series | pd.Series 1 | pd.Series | pd.Series | pd.Series | pd.Series
- … | … | … | … | …
n | pd.Series | pd.Series | pd.Series | pd.Series
- Parameters
path (str) – The full path to output the ts file
problem_name (str) – The problemName to print in the header of the ts file and also the name of the file.
timestamp ({False, bool}, optional) – Indicate whether the data contains timestamps in the header.
univariate ({True, bool}, optional) – Indicate whether the data is univariate or multivariate in the header. If univariate, only the first dimension will be written to file
class_label ({list, None}, optional) – Provide class label to show the possible class values for classification problems in the header.
class_value_list ({list/ndarray, []}, optional) – ndarray containing the class values for each case in classification problems
equal_length ({False, bool}, optional) – Indicate whether each series has equal length. It only write to file if true.
series_length ({-1, int}, optional) – Indicate each series length if they are of equal length. It only write to file if true.
missing_values ({NaN, str}, optional) – Representation for missing value, default is NaN.
comment ({None, str}, optional) – Comment text to be inserted before the header in a block.
- Returns
- Return type
Notes
This version currently does not support writing timestamp data.
References
The code for writing series data into file is adopted from https://stackoverflow.com/questions/37877708/ how-to-turn-a-pandas-dataframe-row-into-a-comma-separated-string