nextstep.model subpackage

base_model

class nextstep.model.base_model.base_model

Bases: object

base model class

evaluation(y_pre, y_true)

model evaluation method. Metrics include MAE, MSE and RMSE

Parameters
  • y_pre (array-like, such as python list) – predicted values

  • y_true (array-like, such as python list) – true values

split(data, label_column, train_size, seed)

perform train test split.

Parameters
  • data (pandas dataframe) – dataset

  • label_column (string) – label column name

  • train_size – training size as a ratio over entire data size

  • seed (int) – pseudorandom number generator initializing value, if you provide same seed value before generating random data it will produce the same data

split_noshuffle(data, label_column, train_size, seed)

perform train test split in the non-shuffle manner.

Parameters
  • data (pandas dataframe) – dataset

  • label_column (string) – label column name

  • train_size – training size as a ratio over entire data size

  • seed (int) – pseudorandom number generator initializing value, if you provide same seed value before generating random data it will produce the same data

XGboost

adaboost

class nextstep.model.adaboost.adaboost(config)

Bases: nextstep.model.base_model.base_model

adaboost class

build_model(data)

building the adaboost model, including train-test split and model evaluation.

Parameters

data (pandas dataframe) – dataset

Returns

fitted adaboost model

predict(X_new)

use fitted module for prediction.

Parameters

X_new (array-like) – data of shape (n_samples, n_features)

example config

user_config = {
   'label_column' : 'USEP', # label column name
   'train_size' : 0.9, # train-test split
   'seed' : 33,
   'base_estimator': random_forest_model, # a fitted model
   'n_estimators' : 10, # number of estimators
   'learning_rate' : 1, # learning rate
   'loss' : 'square' # loss function
   }

arima

class nextstep.model.arima.arima(config)

Bases: nextstep.model.base_model.base_model

arima class.

autocorrelation(data, number_of_time_step=20)

plot autocorrelation.

Parameters
  • data (pandas dataframe) – dataset

  • number_of_time_step (int, default to be 20) – number of time step needs to be considered for autocorrelation

Note

data length must be larger than specified number_of_time_step.

build_model(data)

building the arima model, including train-test split and model evaluation.

Parameters

data (pandas dataframe) – dataset

Returns

fitted adaboost model

partial_autocorrelation(data, lags=20)

plot partial autocorrelation.

Parameters
  • data (pandas dataframe) – dataset

  • lags (int, default to be 20) – number of lags needs to be considered for partial autocorrelation

Note

data length must be larger than specified lags.

predict_next_n(step)

use fitted module for prediction.

Parameters

step – the number of values to be predicted

residual_density_plot()

plot residual density plot.

residual_plot()

plot residual.

lstm

class nextstep.model.lstm.lstm_univariate(config)

Bases: nextstep.model.base_model.base_model

long short-term memory class.

build_model(data)

building the lstm model, including train-test split and model evaluation.

Parameters

data (pandas dataframe) – dataset

Returns

fitted adaboost model

predict(X_new)

use fitted module for prediction.

Parameters

X_new (array-like) – data of shape (n_samples, n_features)

random_forest

class nextstep.model.random_forest.random_forest(config)

Bases: nextstep.model.base_model.base_model

random forest class.

build_model(data)

building the random forest model, including train-test split and model evaluation.

Parameters

data (pandas dataframe) – dataset

Returns

fitted adaboost model

predict(X_new)

use fitted module for prediction.

Parameters

X_new (array-like) – data of shape (n_samples, n_features)

sarima

class nextstep.model.sarima.sarima(config)

Bases: nextstep.model.base_model.base_model

sarima class.

autocorrelation(data, lags=20)

plot autocorrelation.

Parameters
  • data (pandas dataframe) – dataset

  • lags (int, default to be 20) – number of lags needs to be considered for autocorrelation

Note

data length must be larger than specified lags.

build_model(data)

building the sarima model, including train-test split and model evaluation.

Parameters

data (pandas dataframe) – dataset

Returns

fitted adaboost model

partial_autocorrelation(data, lags=20)

plot partial autocorrelation.

Parameters
  • data (pandas dataframe) – dataset

  • lags (int, default to be 20) – number of lags needs to be considered for partial autocorrelation

Note

data length must be larger than specified lags.

predict(X_new)
predict_next_n(steps)
residual_density_plot()

plot residual density plot.

residual_plot()

plot residual.