sktime.datasets.base
.load_japanese_vowels¶
-
sktime.datasets.base.
load_japanese_vowels
(split=None, return_X_y=False)[source]¶ Loads the JapaneseVowels time series classification problem and returns X and y.
- Parameters
split (None or str{"train", "test"}, optional (default=None)) – Whether to load the train or test partition of the problem. By
it loads both. (default) –
return_X_y (bool, optional (default=False)) – If True, returns (features, target) separately instead of a single dataframe with columns for features and the target.
- Returns
X (pandas DataFrame with m rows and c columns) – The time series data for the problem with m cases and c dimensions
y (numpy array) – The class labels for each case in X
Details
——-
Dimensionality (multivariate, 12)
Series length (29)
Train cases (270)
Test cases (370)
Number of classes (9)
A UCI Archive dataset. 9 Japanese-male speakers were recorded saying
the vowels ‘a’ and ‘e’. A ‘12-degree
linear prediction analysis’ is applied to the raw recordings to
obtain time-series with 12 dimensions, a
originally a length between 7 and 29. In this dataset, instances
have been padded to the longest length,
29. The classification task is to predict the speaker. Therefore,
each instance is a transformed utterance,
12*29 values with a single class label attached, [1…9]. The given
training set is comprised of 30
utterances for each speaker, however the test set has a varied
distribution based on external factors of
timing and experimental availability, between 24 and 88 instances per
speaker. Reference (M. Kudo, J. Toyama)
and M. Shimbo. (1999). “Multidimensional Curve Classification Using
Passing-Through Regions”. Pattern
Recognition Letters, Vol. 20, No. 11–13, pages 1103–1111.
Dataset details (http://timeseriesclassification.com/description.php)
?Dataset=JapaneseVowels