nextstep.model subpackage¶
base_model¶
-
class
nextstep.model.base_model.
base_model
¶ Bases:
object
base model class
-
evaluation
(y_pre, y_true)¶ model evaluation method. Metrics include MAE, MSE and RMSE
- Parameters
y_pre (array-like, such as python list) – predicted values
y_true (array-like, such as python list) – true values
-
split
(data, label_column, train_size, seed)¶ perform train test split.
- Parameters
data (pandas dataframe) – dataset
label_column (string) – label column name
train_size – training size as a ratio over entire data size
seed (int) – pseudorandom number generator initializing value, if you provide same seed value before generating random data it will produce the same data
-
split_noshuffle
(data, label_column, train_size, seed)¶ perform train test split in the non-shuffle manner.
- Parameters
data (pandas dataframe) – dataset
label_column (string) – label column name
train_size – training size as a ratio over entire data size
seed (int) – pseudorandom number generator initializing value, if you provide same seed value before generating random data it will produce the same data
-
XGboost¶
adaboost¶
-
class
nextstep.model.adaboost.
adaboost
(config)¶ Bases:
nextstep.model.base_model.base_model
adaboost class
-
build_model
(data)¶ building the adaboost model, including train-test split and model evaluation.
- Parameters
data (pandas dataframe) – dataset
- Returns
fitted adaboost model
-
predict
(X_new)¶ use fitted module for prediction.
- Parameters
X_new (array-like) – data of shape (n_samples, n_features)
-
example config
user_config = {
'label_column' : 'USEP', # label column name
'train_size' : 0.9, # train-test split
'seed' : 33,
'base_estimator': random_forest_model, # a fitted model
'n_estimators' : 10, # number of estimators
'learning_rate' : 1, # learning rate
'loss' : 'square' # loss function
}
arima¶
-
class
nextstep.model.arima.
arima
(config)¶ Bases:
nextstep.model.base_model.base_model
arima class.
-
autocorrelation
(data, number_of_time_step=20)¶ plot autocorrelation.
- Parameters
data (pandas dataframe) – dataset
number_of_time_step (int, default to be 20) – number of time step needs to be considered for autocorrelation
Note
data length must be larger than specified number_of_time_step.
-
build_model
(data)¶ building the arima model, including train-test split and model evaluation.
- Parameters
data (pandas dataframe) – dataset
- Returns
fitted adaboost model
-
partial_autocorrelation
(data, lags=20)¶ plot partial autocorrelation.
- Parameters
data (pandas dataframe) – dataset
lags (int, default to be 20) – number of lags needs to be considered for partial autocorrelation
Note
data length must be larger than specified lags.
-
predict_next_n
(step)¶ use fitted module for prediction.
- Parameters
step – the number of values to be predicted
-
residual_density_plot
()¶ plot residual density plot.
-
residual_plot
()¶ plot residual.
-
lstm¶
-
class
nextstep.model.lstm.
lstm_univariate
(config)¶ Bases:
nextstep.model.base_model.base_model
long short-term memory class.
-
build_model
(data)¶ building the lstm model, including train-test split and model evaluation.
- Parameters
data (pandas dataframe) – dataset
- Returns
fitted adaboost model
-
predict
(X_new)¶ use fitted module for prediction.
- Parameters
X_new (array-like) – data of shape (n_samples, n_features)
-
random_forest¶
-
class
nextstep.model.random_forest.
random_forest
(config)¶ Bases:
nextstep.model.base_model.base_model
random forest class.
-
build_model
(data)¶ building the random forest model, including train-test split and model evaluation.
- Parameters
data (pandas dataframe) – dataset
- Returns
fitted adaboost model
-
predict
(X_new)¶ use fitted module for prediction.
- Parameters
X_new (array-like) – data of shape (n_samples, n_features)
-
sarima¶
-
class
nextstep.model.sarima.
sarima
(config)¶ Bases:
nextstep.model.base_model.base_model
sarima class.
-
autocorrelation
(data, lags=20)¶ plot autocorrelation.
- Parameters
data (pandas dataframe) – dataset
lags (int, default to be 20) – number of lags needs to be considered for autocorrelation
Note
data length must be larger than specified lags.
-
build_model
(data)¶ building the sarima model, including train-test split and model evaluation.
- Parameters
data (pandas dataframe) – dataset
- Returns
fitted adaboost model
-
partial_autocorrelation
(data, lags=20)¶ plot partial autocorrelation.
- Parameters
data (pandas dataframe) – dataset
lags (int, default to be 20) – number of lags needs to be considered for partial autocorrelation
Note
data length must be larger than specified lags.
-
predict
(X_new)¶
-
predict_next_n
(steps)¶
-
residual_density_plot
()¶ plot residual density plot.
-
residual_plot
()¶ plot residual.
-