ColumnTransformer¶
-
class
sktime.transformations.panel.compose.
ColumnTransformer
(transformers, remainder='drop', sparse_threshold=0.3, n_jobs=1, transformer_weights=None, preserve_dataframe=True)[source]¶ Applies transformations to columns of an array or pandas DataFrame. Simply takes the column transformer from sklearn and adds capability to handle pandas dataframe.
This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space. This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer.
- Parameters
transformers (list of tuples) –
List of (name, transformer, column(s)) tuples specifying the transformer objects to be applied to subsets of the data. name : string
Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using
set_params
and searched in grid search.- transformerestimator or {“passthrough”, “drop”}
Estimator must support fit and transform. Special-cased strings “drop” and “passthrough” are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively.
column(s) : str or int, array-like of string or int, slice, boolean mask array or callable
Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where
transformer
expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data X and can return any of the above.remainder ({"drop", "passthrough"} or estimator, default "drop") – By default, only the specified columns in transformations are transformed and combined in the output, and the non-specified columns are dropped. (default of
"drop"
). By specifyingremainder="passthrough"
, all remaining columns that were not specified in transformations will be automatically passed through. This subset of columns is concatenated with the output of the transformations. By settingremainder
to be an estimator, the remaining non-specified columns will use theremainder
estimator. The estimator must support fit and transform.sparse_threshold (float, default = 0.3) – If the output of the different transformations contains sparse matrices, these will be stacked as a sparse matrix if the overall density is lower than this value. Use
sparse_threshold=0
to always return dense. When the transformed output consists of all dense data, the stacked result will be dense, and this keyword will be ignored.n_jobs (int or None, optional (default=None)) – Number of jobs to run in parallel.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.transformer_weights (dict, optional) – Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.
preserve_dataframe (boolean) – If True, pandas dataframe is returned. If False, numpy array is returned.
-
transformers_
[source]¶ The collection of fitted transformations as tuples of (name, fitted_transformer, column). fitted_transformer can be an estimator, “drop”, or “passthrough”. In case there were no columns selected, this will be the unfitted transformer. If there are remaining columns, the final element is a tuple of the form: (“remainder”, transformer, remaining_columns) corresponding to the
remainder
parameter. If there are remaining columns, thenlen(transformers_)==len(transformations)+1
, otherwiselen(transformers_)==len(transformations)
.- Type
-
named_transformers_
[source]¶ Read-only attribute to access any transformer by given name. Keys are transformer names and values are the fitted transformer objects.
- Type
Bunch object, a dictionary with attribute access