autots.tools package

Submodules

autots.tools.holiday module

Manage holiday features.

autots.tools.holiday.holiday_flag(DTindex, country: str = 'US')

Create a 0/1 flag for given datetime index.

Parameters
  • DTindex (panda.DatetimeIndex) – DatetimeIndex of dates to create flags

  • country (str) – to pass through to python package Holidays

Returns

pandas.Series() with DatetimeIndex and column ‘HolidayFlag’

autots.tools.impute module

Fill NA

autots.tools.impute.FillNA(df, method: str = 'ffill', window: int = 10)

Fill NA values using different methods.

Parameters
  • method (str) – ‘ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling mean’ - fill with last n (window) values ‘ffill mean biased’ - simple avg of ffill and mean ‘fake date’ - shifts forward data over nan, thus values will have incorrect timestamps

  • window (int) – length of rolling windows for filling na, for rolling methods

autots.tools.impute.biased_ffill(df, mean_weight: float = 1)

Fill NaN with average of last value and mean.

autots.tools.impute.fake_date_fill(df, back_method: str = 'slice')

Return a dataframe where na values are removed and values shifted forward.

Warning

Thus, values will likely have incorrect timestamps!

Parameters

back_method (str) – how to deal with tails left by shifting NaN - ‘bfill’ -back fill the last value - ‘slice’ - drop any rows with any na - ‘keepNA’ - keep the lagging na

autots.tools.impute.fill_forward(df)

Fill NaN with previous values.

autots.tools.impute.fill_mean(df)

Fill NaN with mean.

autots.tools.impute.fill_median(df)

Fill NaN with median.

autots.tools.impute.fill_zero(df)

Fill NaN with zero.

autots.tools.impute.rolling_mean(df, window: int = 10)

Fill NaN with mean of last window values.

autots.tools.probabilistic module

Point to Probabilistic

autots.tools.probabilistic.Point_to_Probability(train, forecast, prediction_interval=0.9, method: str = 'historic_quantile')

Data driven placeholder for model error estimation.

Catlin Point to Probability method (‘a mixture of dark magic and gum disease’)

Parameters
  • train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex

  • forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.

  • prediction_interval (float) – confidence or perhaps credible interval

  • method (str) – spell to cast to create dark magic. ‘historic_quantile’, ‘inferred_normal’, ‘variable_pct_change’ gum disease available separately upon request.

Returns

upper_error, lower_error (two pandas.DataFrames for upper and lower bound respectively)

autots.tools.probabilistic.Variable_Point_to_Probability(train, forecast, alpha=0.3, beta=1)

Data driven placeholder for model error estimation.

ErrorRange = beta * (En + alpha * En-1 [cum sum of En]) En = abs(0.5 - QTP) * D D = abs(Xn - ((Avg % Change of Train * Xn-1) + Xn-1)) Xn = Forecast Value QTP = Percentile of Score in All Percent Changes of Train Score = Percent Change (from Xn-1 to Xn)

Parameters
  • train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex

  • forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.

  • alpha (float) – parameter which effects the broadening of error range over time Usually 0 < alpha < 1 (although it can be larger than 1)

  • beta (float) – parameter which effects the general width of the error bar Usually 0 < beta < 1 (although it can be larger than 1)

Returns

error width for each value of forecast.

Return type

ErrorRange (pandas.DataFrame)

autots.tools.probabilistic.historic_quantile(df_train, prediction_interval: float = 0.9)

Computes the difference between the median and the prediction interval range in historic data.

Parameters
  • df_train (pd.DataFrame) – a dataframe of training data

  • prediction_interval (float) – the desired forecast interval range

Returns

two 1D arrays

Return type

lower, upper (np.array)

autots.tools.probabilistic.inferred_normal(train, forecast, n: int = 5, prediction_interval: float = 0.9)

A corruption of Bayes theorem. It will be sensitive to the transformations of the data.

autots.tools.probabilistic.percentileofscore_appliable(x, a, kind='rank')

autots.tools.profile module

Profiling

autots.tools.profile.data_profile(df)

Input: a pd DataFrame of columns which are time series, and a datetime index

Output: a pd DataFrame of column per time series, with rows which are statistics

autots.tools.shaping module

Reshape data.

class autots.tools.shaping.NumericTransformer(na_strings: list = ['', ' ', 'NULL', 'NA', 'NaN', 'na', 'nan'], categorical_impute_strategy: str = 'constant', verbose: int = 0)

Bases: object

Test numeric conversion.

fit(df)

Fit categorical to numeric.

inverse_transform(df)

Convert numeric back to categorical.

transform(df)

Convert categorical dataset to numeric.

autots.tools.shaping.long_to_wide(df, date_col: str = 'datetime', value_col: str = 'value', id_col: str = 'series_id', frequency: str = 'infer', na_tolerance: float = 0.999, drop_data_older_than_periods: int = 100000, drop_most_recent: int = 0, aggfunc: str = 'first', verbose: int = 1)

Take long data and convert into wide, cleaner data.

Parameters
  • df (pd.DataFrame) –

  • date_col (str) –

  • value_col (str) –

    • the name of the column with the values of the time series (ie sales $)

  • id_col (str) –

    • name of the id column, unique for each time series

  • frequency (str) –

    • frequency in string of alias for DateOffset object, normally “1D” -daily, “MS” -month start etc.

    currently, aliases are listed somewhere in here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

  • na_tolerance (float) –

    • allow up to this percent of values to be NaN, else drop the entire series

    the default of 0.95 means a series can be 95% NaN values and still be included.

  • drop_data_older_than_periods (int) –

    • cut off older data because eventually you just get too much

    10,000 is meant to be rather high, normally for daily data I’d use only the last couple of years, say 1500 samples

  • drop_most_recent (int) –

    • if to drop the most recent data point

    useful if you pull monthly data before month end, and you don’t want an incomplete month appearing complete

  • aggfunc (str) –

    • passed to pd.pivot_table, determines how to aggregate duplicates for series_id and datetime

    other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. if categorical data is provided, aggfunc=’first’ is recommended

autots.tools.shaping.simple_train_test_split(df, forecast_length: int = 10, min_allowed_train_percent: float = 0.3, verbose: int = 1)

Uses the last periods of forecast_length as the test set, the rest as train

Parameters
  • forecast_length (int) – number of future periods to predict

  • min_allowed_train_percent (float) –

    • forecast length cannot be greater than 1 - this

    constrains the forecast length from being much larger than than the training data note this includes NaNs in current configuration

Returns

train, test (both pd DataFrames)

autots.tools.shaping.subset_series(df, weights, n: int = 1000, random_state: int = 2020)

Return a sample of time series.

Parameters
  • df (pd.DataFrame) – wide df with series as columns and DT index

  • n (int) – number of unique time series to keep, or None

  • random_state (int) – random seed

autots.tools.transform module

Preprocessing data methods.

class autots.tools.transform.CumSumTransformer

Bases: object

Cumulative Sum of Data.

Warning

Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision. inverse_transform can only be applied to the original series, or an immediately following forecast

fit(df)

Fits.

Parameters

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame

inverse_transform(df, trans_method: str = 'forecast')

Returns data to original or forecast form

Parameters
  • df (pandas.DataFrame) – input dataframe

  • trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original

transform(df)

Returns changed data :param df: input dataframe :type df: pandas.DataFrame

class autots.tools.transform.DatepartRegression(regression_model: dict = {'model': 'DecisionTree', 'model_params': {'max_depth': 5, 'min_samples_split': 2}}, datepart_method: str = 'expanded')

Bases: object

Remove a regression on datepart from the data.

fit(df)

Fits trend for later detrending.

Parameters

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fit and Return Detrended DataFrame.

Parameters

df (pandas.DataFrame) – input dataframe

inverse_transform(df)

Return data to original form.

Parameters

df (pandas.DataFrame) – input dataframe

transform(df)

Return detrended data.

Parameters

df (pandas.DataFrame) – input dataframe

class autots.tools.transform.Detrend

Bases: object

Remove a linear trend from the data.

fit(df)

Fits trend for later detrending.

Parameters

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fit and Return Detrended DataFrame.

Parameters

df (pandas.DataFrame) – input dataframe

inverse_transform(df)

Return data to original form.

Parameters

df (pandas.DataFrame) – input dataframe

transform(df)

Return detrended data.

Parameters

df (pandas.DataFrame) – input dataframe

class autots.tools.transform.DifferencedTransformer

Bases: object

Difference from lag n value. inverse_transform can only be applied to the original series, or an immediately following forecast

Parameters

lag (int) – number of periods to shift (not implemented, default = 1)

fit(df)

Fit. :param df: input dataframe :type df: pandas.DataFrame

fit_transform(df)

Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame

inverse_transform(df, trans_method: str = 'forecast')

Returns data to original or forecast form

Parameters
  • df (pandas.DataFrame) – input dataframe

  • trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original

transform(df)

Return differenced data.

Parameters

df (pandas.DataFrame) – input dataframe

class autots.tools.transform.EmptyTransformer

Bases: object

fit(df)
fit_transform(df)
inverse_transform(df)
transform(df)
class autots.tools.transform.GeneralTransformer(outlier_method: str = None, outlier_threshold: float = 3, outlier_position: str = 'first', fillna: str = 'ffill', transformation: str = None, second_transformation: str = None, transformation_param: str = None, detrend: str = None, third_transformation: str = None, transformation_param2: str = None, fourth_transformation: str = None, discretization: str = 'center', n_bins: int = None, random_seed: int = 2020)

Bases: object

Remove outliers, fillNA, then mathematical transformations.

Expects a chronologically sorted pandas.DataFrame with a DatetimeIndex, only numeric data, and a ‘wide’ (one column per series) shape.

Warning

  • inverse_transform will not fully return the original data under some conditions
    • outliers removed or clipped will be returned in the clipped or filled na form

    • NAs filled will be returned with the filled value

    • Discretization cannot be inversed

    • RollingMean, PctChange, CumSum, and DifferencedTransformer will only return original or an immediately following forecast
      • by default ‘forecast’ is expected, ‘original’ can be set in trans_method

Parameters
  • outlier_method (str) –

    • level of outlier removal, if any, per series

    ’None’ ‘clip’ - replace outliers with the highest value allowed by threshold ‘remove’ - remove outliers and replace with np.nan

  • outlier_threshold (float) – number of std deviations from mean to consider an outlier. Default 3.

  • outlier_position (str) – when to remove outliers ‘first’ - remove outliers before other transformations ‘middle’ - remove outliers after first_transformation ‘last’ - remove outliers after fourth_transformation

  • fillNA (str) –

    • method to fill NA, passed through to FillNA()

    ’ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling mean’ - fill with last n (window = 10) values ‘ffill mean biased’ - simple avg of ffill and mean ‘fake date’ - shifts forward data over nan, thus values will have incorrect timestamps

  • transformation (str) –

    • transformation to apply

    ’None’ ‘MinMaxScaler’ - Sklearn MinMaxScaler ‘PowerTransformer’ - Sklearn PowerTransformer ‘QuantileTransformer’ - Sklearn ‘MaxAbsScaler’ - Sklearn ‘StandardScaler’ - Sklearn ‘RobustScaler’ - Sklearn ‘PCA, ‘FastICA’ - performs sklearn decomposition and returns n-cols worth of n_components ‘Detrend’ - fit then remove a linear regression from the data ‘RollingMean’ - 10 period rolling average, can receive a custom window by transformation_param if used as second_transformation ‘FixedRollingMean’ - same as RollingMean, but with inverse_transform disabled, so smoothed forecasts are maintained. ‘RollingMean10’ - 10 period rolling average (smoothing) ‘RollingMean100thN’ - Rolling mean of periods of len(train)/100 (minimum 2) ‘DifferencedTransformer’ - makes each value the difference of that value and the previous value ‘PctChangeTransformer’ - converts to pct_change, not recommended if lots of zeroes in data ‘SinTrend’ - removes a sin trend (fitted to each column) from the data ‘CumSumTransformer’ - makes value sum of all previous ‘PositiveShift’ - makes all values >= 1 ‘Log’ - log transform (uses PositiveShift first as necessary) ‘IntermittentOccurrence’ - -1, 1 for non median values ‘SeasonalDifference’ - remove the last lag values from all values ‘SeasonalDifferenceMean’ - remove the average lag values from all ‘SeasonalDifference7’ also ‘12’ - non-parameterized version of Seasonal

  • second_transformation (str) – second transformation to apply. Same options as transformation, but with transformation_param passed in if used

  • detrend (str) – Model and remove a linear component from the data. None, ‘Linear’, ‘Poisson’, ‘Tweedie’, ‘Gamma’, ‘RANSAC’, ‘ARD’

  • second_transformation – second transformation to apply. Same options as transformation, but with transformation_param passed in if used

  • transformation_param (str) – passed to second_transformation, not used by most transformers.

  • fourth_transformation (str) – third transformation to apply. Sames options as transformation.

  • discretization (str) – method of binning to apply None - no discretization ‘center’ - values are rounded to center value of each bin ‘lower’ - values are rounded to lower range of closest bin ‘upper’ - values are rounded up to upper edge of closest bin ‘sklearn-quantile’, ‘sklearn-uniform’, ‘sklearn-kmeans’ - sklearn kbins discretizer

  • n_bins (int) – number of quantile bins to split data into

  • random_seed (int) – random state passed through where applicable

fill_na(df, window: int = 10)
Parameters
  • df (pandas.DataFrame) – Datetime Indexed

  • window (int) – passed through to rolling mean fill technique

Returns

pandas.DataFrame

fit(df)

Apply transformations and return transformer object.

Parameters

df (pandas.DataFrame) – Datetime Indexed

fit_transform(df)
inverse_transform(df, trans_method: str = 'forecast')

Undo the madness

Parameters
  • df (pandas.DataFrame) – Datetime Indexed

  • trans_method (str) – ‘forecast’ or ‘original’ passed through to RollingTransformer, DifferencedTransformer, if used

outlier_treatment(df)
Parameters

df (pandas.DataFrame) – Datetime Indexed

Returns

pandas.DataFrame

transform(df)

Apply transformations to convert df.

class autots.tools.transform.IntermittentOccurrence

Bases: object

Intermittent inspired binning predicts probability of not median.

fit(df)

Fits shift interval.

Parameters

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fit and Return Detrended DataFrame.

Parameters

df (pandas.DataFrame) – input dataframe

inverse_transform(df)

Return data to original form.

Parameters

df (pandas.DataFrame) – input dataframe

transform(df)

Return detrended data.

Parameters

df (pandas.DataFrame) – input dataframe

class autots.tools.transform.PctChangeTransformer

Bases: object

% Change of Data.

Warning

Because % change doesn’t play well with zeroes, zeroes are replaced by positive of the lowest non-zero value. Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision. inverse_transform can only be applied to the original series, or an immediately following forecast

fit(df)

Fits.

Parameters

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fit and Return Magical DataFrame. :param df: input dataframe :type df: pandas.DataFrame

inverse_transform(df, trans_method: str = 'forecast')

Returns data to original or forecast form

Parameters
  • df (pandas.DataFrame) – input dataframe

  • trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original

transform(df)

Returns changed data :param df: input dataframe :type df: pandas.DataFrame

class autots.tools.transform.PositiveShift(log: bool = False, center_one: bool = True, squared=False)

Bases: object

Shift each series if necessary to assure all values >= 1.

Parameters
  • log (bool) – whether to include a log transform.

  • center_one (bool) – whether to shift to 1 instead of 0.

fit(df)

Fits shift interval.

Parameters

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fit and Return Detrended DataFrame.

Parameters

df (pandas.DataFrame) – input dataframe

inverse_transform(df)

Return data to original form.

Parameters

df (pandas.DataFrame) – input dataframe

transform(df)

Return detrended data.

Parameters

df (pandas.DataFrame) – input dataframe

autots.tools.transform.RandomTransform()

Return a dict of randomly choosen transformation selections.

class autots.tools.transform.RollingMeanTransformer(window: int = 10, fixed: bool = False)

Bases: object

Attempt at Rolling Mean with built-in inverse_transform for time series inverse_transform can only be applied to the original series, or an immediately following forecast Does not play well with data with NaNs Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision.

Parameters

window (int) – number of periods to take mean over

fit(df)

Fits.

Parameters

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame

inverse_transform(df, trans_method: str = 'forecast')

Returns data to original or forecast form

Parameters
  • df (pandas.DataFrame) – input dataframe

  • trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original

transform(df)

Returns rolling data :param df: input dataframe :type df: pandas.DataFrame

class autots.tools.transform.SeasonalDifference(lag_1: int = 7, method: str = 'LastValue')

Bases: object

Remove seasonal component.

Parameters
  • lag_1 (int) – length of seasonal period to remove.

  • method (str) – ‘LastValue’, ‘Mean’, ‘Median’ to construct seasonality

fit(df)

Fits.

Parameters

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame

inverse_transform(df, trans_method: str = 'forecast')

Returns data to original or forecast form

Parameters
  • df (pandas.DataFrame) – input dataframe

  • trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original

transform(df)

Returns rolling data :param df: input dataframe :type df: pandas.DataFrame

class autots.tools.transform.SinTrend

Bases: object

Modelling sin.

fit(df)

Fits trend for later detrending :param df: input dataframe :type df: pandas.DataFrame

fit_sin(tt, yy)

Fit sin to the input time sequence, and return fitting parameters “amp”, “omega”, “phase”, “offset”, “freq”, “period” and “fitfunc”

from user unsym @ https://stackoverflow.com/questions/16716302/how-do-i-fit-a-sine-curve-to-my-data-with-pylab-and-numpy

fit_transform(df)

Fits and Returns Detrended DataFrame :param df: input dataframe :type df: pandas.DataFrame

inverse_transform(df)

Returns data to original form :param df: input dataframe :type df: pandas.DataFrame

transform(df)

Returns detrended data :param df: input dataframe :type df: pandas.DataFrame

class autots.tools.transform.StatsmodelsFilter(method: str = 'bkfilter')

Bases: object

Irreversible filters.

fit(df)

Fits filter.

Parameters

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fit and Return Detrended DataFrame.

Parameters

df (pandas.DataFrame) – input dataframe

inverse_transform(df)

Return data to original form.

Parameters

df (pandas.DataFrame) – input dataframe

transform(df)

Return detrended data.

Parameters

df (pandas.DataFrame) – input dataframe

autots.tools.transform.clip_outliers(df, std_threshold: float = 3)

Replace outliers above threshold with that threshold. Axis = 0.

Parameters
  • df (pandas.DataFrame) – DataFrame containing numeric data

  • std_threshold (float) – The number of standard deviations away from mean to count as outlier.

autots.tools.transform.remove_outliers(df, std_threshold: float = 3)

Replace outliers with np.nan. https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-pandas-data-frame

Parameters
  • df (pandas.DataFrame) – DataFrame containing numeric data, DatetimeIndex

  • std_threshold (float) – The number of standard deviations away from mean to count as outlier.

autots.tools.transform.simple_context_slicer(df, method: str = 'None', forecast_length: int = 30)

Condensed version of context_slicer with more limited options.

Parameters
  • df (pandas.DataFrame) – training data frame to slice

  • method (str) –

    Option to slice dataframe ‘None’ - return unaltered dataframe ‘HalfMax’ - return half of dataframe ‘ForecastLength’ - return dataframe equal to length of forecast ‘2ForecastLength’ - return dataframe equal to twice length of forecast

    (also takes 4, 6, 8, 10 in addition to 2)

Module contents

basic utilities