ColumnTransformer

class sktime.transformations.panel.compose.ColumnTransformer(transformers, remainder='drop', sparse_threshold=0.3, n_jobs=1, transformer_weights=None, preserve_dataframe=True)[source]

Applies transformations to columns of an array or pandas DataFrame. Simply takes the column transformer from sklearn and adds capability to handle pandas dataframe.

This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space. This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer.

Parameters
  • transformers (list of tuples) –

    List of (name, transformer, column(s)) tuples specifying the transformer objects to be applied to subsets of the data. name : string

    Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using set_params and searched in grid search.

    transformerestimator or {“passthrough”, “drop”}

    Estimator must support fit and transform. Special-cased strings “drop” and “passthrough” are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively.

    column(s) : str or int, array-like of string or int, slice, boolean mask array or callable

    Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where transformer expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data X and can return any of the above.

  • remainder ({"drop", "passthrough"} or estimator, default "drop") – By default, only the specified columns in transformations are transformed and combined in the output, and the non-specified columns are dropped. (default of "drop"). By specifying remainder="passthrough", all remaining columns that were not specified in transformations will be automatically passed through. This subset of columns is concatenated with the output of the transformations. By setting remainder to be an estimator, the remaining non-specified columns will use the remainder estimator. The estimator must support fit and transform.

  • sparse_threshold (float, default = 0.3) – If the output of the different transformations contains sparse matrices, these will be stacked as a sparse matrix if the overall density is lower than this value. Use sparse_threshold=0 to always return dense. When the transformed output consists of all dense data, the stacked result will be dense, and this keyword will be ignored.

  • n_jobs (int or None, optional (default=None)) – Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • transformer_weights (dict, optional) – Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.

  • preserve_dataframe (boolean) – If True, pandas dataframe is returned. If False, numpy array is returned.

transformers_[source]

The collection of fitted transformations as tuples of (name, fitted_transformer, column). fitted_transformer can be an estimator, “drop”, or “passthrough”. In case there were no columns selected, this will be the unfitted transformer. If there are remaining columns, the final element is a tuple of the form: (“remainder”, transformer, remaining_columns) corresponding to the remainder parameter. If there are remaining columns, then len(transformers_)==len(transformations)+1, otherwise len(transformers_)==len(transformations).

Type

list

named_transformers_[source]

Read-only attribute to access any transformer by given name. Keys are transformer names and values are the fitted transformer objects.

Type

Bunch object, a dictionary with attribute access

sparse_output_[source]

Boolean flag indicating wether the output of transform is a sparse matrix or a dense numpy array, which depends on the output of the individual transformations and the sparse_threshold keyword.

Type

bool

__init__(transformers, remainder='drop', sparse_threshold=0.3, n_jobs=1, transformer_weights=None, preserve_dataframe=True)[source]

Initialize self. See help(type(self)) for accurate signature.