autots.tools package¶
Submodules¶
autots.tools.holiday module¶
Manage holiday features.
-
autots.tools.holiday.
holiday_flag
(DTindex, country: str = 'US')¶ Create a 0/1 flag for given datetime index.
- Parameters
DTindex (panda.DatetimeIndex) – DatetimeIndex of dates to create flags
country (str) – to pass through to python package Holidays
- Returns
pandas.Series() with DatetimeIndex and column ‘HolidayFlag’
autots.tools.impute module¶
Fill NA
-
autots.tools.impute.
FillNA
(df, method: str = 'ffill', window: int = 10)¶ Fill NA values using different methods.
- Parameters
method (str) – ‘ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling mean’ - fill with last n (window) values ‘ffill mean biased’ - simple avg of ffill and mean ‘fake date’ - shifts forward data over nan, thus values will have incorrect timestamps
window (int) – length of rolling windows for filling na, for rolling methods
-
autots.tools.impute.
biased_ffill
(df, mean_weight: float = 1)¶ Fill NaN with average of last value and mean.
-
autots.tools.impute.
fake_date_fill
(df, back_method: str = 'slice')¶ Return a dataframe where na values are removed and values shifted forward.
Warning
Thus, values will likely have incorrect timestamps!
- Parameters
back_method (str) – how to deal with tails left by shifting NaN - ‘bfill’ -back fill the last value - ‘slice’ - drop any rows with any na - ‘keepNA’ - keep the lagging na
-
autots.tools.impute.
fill_forward
(df)¶ Fill NaN with previous values.
-
autots.tools.impute.
fill_mean
(df)¶ Fill NaN with mean.
-
autots.tools.impute.
fill_median
(df)¶ Fill NaN with median.
-
autots.tools.impute.
fill_zero
(df)¶ Fill NaN with zero.
-
autots.tools.impute.
rolling_mean
(df, window: int = 10)¶ Fill NaN with mean of last window values.
autots.tools.probabilistic module¶
Point to Probabilistic
-
autots.tools.probabilistic.
Point_to_Probability
(train, forecast, prediction_interval=0.9, method: str = 'historic_quantile')¶ Data driven placeholder for model error estimation.
Catlin Point to Probability method (‘a mixture of dark magic and gum disease’)
- Parameters
train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex
forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.
prediction_interval (float) – confidence or perhaps credible interval
method (str) – spell to cast to create dark magic. ‘historic_quantile’, ‘inferred_normal’, ‘variable_pct_change’ gum disease available separately upon request.
- Returns
upper_error, lower_error (two pandas.DataFrames for upper and lower bound respectively)
-
autots.tools.probabilistic.
Variable_Point_to_Probability
(train, forecast, alpha=0.3, beta=1)¶ Data driven placeholder for model error estimation.
ErrorRange = beta * (En + alpha * En-1 [cum sum of En]) En = abs(0.5 - QTP) * D D = abs(Xn - ((Avg % Change of Train * Xn-1) + Xn-1)) Xn = Forecast Value QTP = Percentile of Score in All Percent Changes of Train Score = Percent Change (from Xn-1 to Xn)
- Parameters
train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex
forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.
alpha (float) – parameter which effects the broadening of error range over time Usually 0 < alpha < 1 (although it can be larger than 1)
beta (float) – parameter which effects the general width of the error bar Usually 0 < beta < 1 (although it can be larger than 1)
- Returns
error width for each value of forecast.
- Return type
ErrorRange (pandas.DataFrame)
-
autots.tools.probabilistic.
historic_quantile
(df_train, prediction_interval: float = 0.9)¶ Computes the difference between the median and the prediction interval range in historic data.
- Parameters
df_train (pd.DataFrame) – a dataframe of training data
prediction_interval (float) – the desired forecast interval range
- Returns
two 1D arrays
- Return type
lower, upper (np.array)
-
autots.tools.probabilistic.
inferred_normal
(train, forecast, n: int = 5, prediction_interval: float = 0.9)¶ A corruption of Bayes theorem. It will be sensitive to the transformations of the data.
-
autots.tools.probabilistic.
percentileofscore_appliable
(x, a, kind='rank')¶
autots.tools.profile module¶
Profiling
-
autots.tools.profile.
data_profile
(df)¶ Input: a pd DataFrame of columns which are time series, and a datetime index
Output: a pd DataFrame of column per time series, with rows which are statistics
autots.tools.shaping module¶
Reshape data.
-
class
autots.tools.shaping.
NumericTransformer
(na_strings: list = ['', ' ', 'NULL', 'NA', 'NaN', 'na', 'nan'], categorical_impute_strategy: str = 'constant', verbose: int = 0)¶ Bases:
object
Test numeric conversion.
-
fit
(df)¶ Fit categorical to numeric.
-
inverse_transform
(df)¶ Convert numeric back to categorical.
-
transform
(df)¶ Convert categorical dataset to numeric.
-
-
autots.tools.shaping.
long_to_wide
(df, date_col: str = 'datetime', value_col: str = 'value', id_col: str = 'series_id', frequency: str = 'infer', na_tolerance: float = 0.999, drop_data_older_than_periods: int = 100000, drop_most_recent: int = 0, aggfunc: str = 'first', verbose: int = 1)¶ Take long data and convert into wide, cleaner data.
- Parameters
df (pd.DataFrame) –
date_col (str) –
value_col (str) –
the name of the column with the values of the time series (ie sales $)
id_col (str) –
name of the id column, unique for each time series
frequency (str) –
frequency in string of alias for DateOffset object, normally “1D” -daily, “MS” -month start etc.
currently, aliases are listed somewhere in here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
na_tolerance (float) –
allow up to this percent of values to be NaN, else drop the entire series
the default of 0.95 means a series can be 95% NaN values and still be included.
drop_data_older_than_periods (int) –
cut off older data because eventually you just get too much
10,000 is meant to be rather high, normally for daily data I’d use only the last couple of years, say 1500 samples
drop_most_recent (int) –
if to drop the most recent data point
useful if you pull monthly data before month end, and you don’t want an incomplete month appearing complete
aggfunc (str) –
passed to pd.pivot_table, determines how to aggregate duplicates for series_id and datetime
other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. if categorical data is provided, aggfunc=’first’ is recommended
-
autots.tools.shaping.
simple_train_test_split
(df, forecast_length: int = 10, min_allowed_train_percent: float = 0.3, verbose: int = 1)¶ Uses the last periods of forecast_length as the test set, the rest as train
- Parameters
forecast_length (int) – number of future periods to predict
min_allowed_train_percent (float) –
forecast length cannot be greater than 1 - this
constrains the forecast length from being much larger than than the training data note this includes NaNs in current configuration
- Returns
train, test (both pd DataFrames)
-
autots.tools.shaping.
subset_series
(df, weights, n: int = 1000, random_state: int = 2020)¶ Return a sample of time series.
- Parameters
df (pd.DataFrame) – wide df with series as columns and DT index
n (int) – number of unique time series to keep, or None
random_state (int) – random seed
autots.tools.transform module¶
Preprocessing data methods.
-
class
autots.tools.transform.
CumSumTransformer
¶ Bases:
object
Cumulative Sum of Data.
Warning
Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision. inverse_transform can only be applied to the original series, or an immediately following forecast
-
fit
(df)¶ Fits.
- Parameters
df (pandas.DataFrame) – input dataframe
-
fit_transform
(df)¶ Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame
-
inverse_transform
(df, trans_method: str = 'forecast')¶ Returns data to original or forecast form
- Parameters
df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original
-
transform
(df)¶ Returns changed data :param df: input dataframe :type df: pandas.DataFrame
-
-
class
autots.tools.transform.
DatepartRegression
(regression_model: dict = {'model': 'DecisionTree', 'model_params': {'max_depth': 5, 'min_samples_split': 2}}, datepart_method: str = 'expanded')¶ Bases:
object
Remove a regression on datepart from the data.
-
fit
(df)¶ Fits trend for later detrending.
- Parameters
df (pandas.DataFrame) – input dataframe
-
fit_transform
(df)¶ Fit and Return Detrended DataFrame.
- Parameters
df (pandas.DataFrame) – input dataframe
-
inverse_transform
(df)¶ Return data to original form.
- Parameters
df (pandas.DataFrame) – input dataframe
-
transform
(df)¶ Return detrended data.
- Parameters
df (pandas.DataFrame) – input dataframe
-
-
class
autots.tools.transform.
Detrend
¶ Bases:
object
Remove a linear trend from the data.
-
fit
(df)¶ Fits trend for later detrending.
- Parameters
df (pandas.DataFrame) – input dataframe
-
fit_transform
(df)¶ Fit and Return Detrended DataFrame.
- Parameters
df (pandas.DataFrame) – input dataframe
-
inverse_transform
(df)¶ Return data to original form.
- Parameters
df (pandas.DataFrame) – input dataframe
-
transform
(df)¶ Return detrended data.
- Parameters
df (pandas.DataFrame) – input dataframe
-
-
class
autots.tools.transform.
DifferencedTransformer
¶ Bases:
object
Difference from lag n value. inverse_transform can only be applied to the original series, or an immediately following forecast
- Parameters
lag (int) – number of periods to shift (not implemented, default = 1)
-
fit
(df)¶ Fit. :param df: input dataframe :type df: pandas.DataFrame
-
fit_transform
(df)¶ Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame
-
inverse_transform
(df, trans_method: str = 'forecast')¶ Returns data to original or forecast form
- Parameters
df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original
-
transform
(df)¶ Return differenced data.
- Parameters
df (pandas.DataFrame) – input dataframe
-
class
autots.tools.transform.
EmptyTransformer
¶ Bases:
object
-
fit
(df)¶
-
fit_transform
(df)¶
-
inverse_transform
(df)¶
-
transform
(df)¶
-
-
class
autots.tools.transform.
GeneralTransformer
(outlier_method: str = None, outlier_threshold: float = 3, outlier_position: str = 'first', fillna: str = 'ffill', transformation: str = None, second_transformation: str = None, transformation_param: str = None, detrend: str = None, third_transformation: str = None, transformation_param2: str = None, fourth_transformation: str = None, discretization: str = 'center', n_bins: int = None, random_seed: int = 2020)¶ Bases:
object
Remove outliers, fillNA, then mathematical transformations.
Expects a chronologically sorted pandas.DataFrame with a DatetimeIndex, only numeric data, and a ‘wide’ (one column per series) shape.
Warning
- inverse_transform will not fully return the original data under some conditions
outliers removed or clipped will be returned in the clipped or filled na form
NAs filled will be returned with the filled value
Discretization cannot be inversed
- RollingMean, PctChange, CumSum, and DifferencedTransformer will only return original or an immediately following forecast
by default ‘forecast’ is expected, ‘original’ can be set in trans_method
- Parameters
outlier_method (str) –
level of outlier removal, if any, per series
’None’ ‘clip’ - replace outliers with the highest value allowed by threshold ‘remove’ - remove outliers and replace with np.nan
outlier_threshold (float) – number of std deviations from mean to consider an outlier. Default 3.
outlier_position (str) – when to remove outliers ‘first’ - remove outliers before other transformations ‘middle’ - remove outliers after first_transformation ‘last’ - remove outliers after fourth_transformation
fillNA (str) –
method to fill NA, passed through to FillNA()
’ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling mean’ - fill with last n (window = 10) values ‘ffill mean biased’ - simple avg of ffill and mean ‘fake date’ - shifts forward data over nan, thus values will have incorrect timestamps
transformation (str) –
transformation to apply
’None’ ‘MinMaxScaler’ - Sklearn MinMaxScaler ‘PowerTransformer’ - Sklearn PowerTransformer ‘QuantileTransformer’ - Sklearn ‘MaxAbsScaler’ - Sklearn ‘StandardScaler’ - Sklearn ‘RobustScaler’ - Sklearn ‘PCA, ‘FastICA’ - performs sklearn decomposition and returns n-cols worth of n_components ‘Detrend’ - fit then remove a linear regression from the data ‘RollingMean’ - 10 period rolling average, can receive a custom window by transformation_param if used as second_transformation ‘FixedRollingMean’ - same as RollingMean, but with inverse_transform disabled, so smoothed forecasts are maintained. ‘RollingMean10’ - 10 period rolling average (smoothing) ‘RollingMean100thN’ - Rolling mean of periods of len(train)/100 (minimum 2) ‘DifferencedTransformer’ - makes each value the difference of that value and the previous value ‘PctChangeTransformer’ - converts to pct_change, not recommended if lots of zeroes in data ‘SinTrend’ - removes a sin trend (fitted to each column) from the data ‘CumSumTransformer’ - makes value sum of all previous ‘PositiveShift’ - makes all values >= 1 ‘Log’ - log transform (uses PositiveShift first as necessary) ‘IntermittentOccurrence’ - -1, 1 for non median values ‘SeasonalDifference’ - remove the last lag values from all values ‘SeasonalDifferenceMean’ - remove the average lag values from all ‘SeasonalDifference7’ also ‘12’ - non-parameterized version of Seasonal
second_transformation (str) – second transformation to apply. Same options as transformation, but with transformation_param passed in if used
detrend (str) – Model and remove a linear component from the data. None, ‘Linear’, ‘Poisson’, ‘Tweedie’, ‘Gamma’, ‘RANSAC’, ‘ARD’
second_transformation – second transformation to apply. Same options as transformation, but with transformation_param passed in if used
transformation_param (str) – passed to second_transformation, not used by most transformers.
fourth_transformation (str) – third transformation to apply. Sames options as transformation.
discretization (str) – method of binning to apply None - no discretization ‘center’ - values are rounded to center value of each bin ‘lower’ - values are rounded to lower range of closest bin ‘upper’ - values are rounded up to upper edge of closest bin ‘sklearn-quantile’, ‘sklearn-uniform’, ‘sklearn-kmeans’ - sklearn kbins discretizer
n_bins (int) – number of quantile bins to split data into
random_seed (int) – random state passed through where applicable
-
fill_na
(df, window: int = 10)¶ - Parameters
df (pandas.DataFrame) – Datetime Indexed
window (int) – passed through to rolling mean fill technique
- Returns
pandas.DataFrame
-
fit
(df)¶ Apply transformations and return transformer object.
- Parameters
df (pandas.DataFrame) – Datetime Indexed
-
fit_transform
(df)¶
-
inverse_transform
(df, trans_method: str = 'forecast')¶ Undo the madness
- Parameters
df (pandas.DataFrame) – Datetime Indexed
trans_method (str) – ‘forecast’ or ‘original’ passed through to RollingTransformer, DifferencedTransformer, if used
-
outlier_treatment
(df)¶ - Parameters
df (pandas.DataFrame) – Datetime Indexed
- Returns
pandas.DataFrame
-
transform
(df)¶ Apply transformations to convert df.
-
class
autots.tools.transform.
IntermittentOccurrence
¶ Bases:
object
Intermittent inspired binning predicts probability of not median.
-
fit
(df)¶ Fits shift interval.
- Parameters
df (pandas.DataFrame) – input dataframe
-
fit_transform
(df)¶ Fit and Return Detrended DataFrame.
- Parameters
df (pandas.DataFrame) – input dataframe
-
inverse_transform
(df)¶ Return data to original form.
- Parameters
df (pandas.DataFrame) – input dataframe
-
transform
(df)¶ Return detrended data.
- Parameters
df (pandas.DataFrame) – input dataframe
-
-
class
autots.tools.transform.
PctChangeTransformer
¶ Bases:
object
% Change of Data.
Warning
Because % change doesn’t play well with zeroes, zeroes are replaced by positive of the lowest non-zero value. Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision. inverse_transform can only be applied to the original series, or an immediately following forecast
-
fit
(df)¶ Fits.
- Parameters
df (pandas.DataFrame) – input dataframe
-
fit_transform
(df)¶ Fit and Return Magical DataFrame. :param df: input dataframe :type df: pandas.DataFrame
-
inverse_transform
(df, trans_method: str = 'forecast')¶ Returns data to original or forecast form
- Parameters
df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original
-
transform
(df)¶ Returns changed data :param df: input dataframe :type df: pandas.DataFrame
-
-
class
autots.tools.transform.
PositiveShift
(log: bool = False, center_one: bool = True, squared=False)¶ Bases:
object
Shift each series if necessary to assure all values >= 1.
- Parameters
log (bool) – whether to include a log transform.
center_one (bool) – whether to shift to 1 instead of 0.
-
fit
(df)¶ Fits shift interval.
- Parameters
df (pandas.DataFrame) – input dataframe
-
fit_transform
(df)¶ Fit and Return Detrended DataFrame.
- Parameters
df (pandas.DataFrame) – input dataframe
-
inverse_transform
(df)¶ Return data to original form.
- Parameters
df (pandas.DataFrame) – input dataframe
-
transform
(df)¶ Return detrended data.
- Parameters
df (pandas.DataFrame) – input dataframe
-
autots.tools.transform.
RandomTransform
()¶ Return a dict of randomly choosen transformation selections.
-
class
autots.tools.transform.
RollingMeanTransformer
(window: int = 10, fixed: bool = False)¶ Bases:
object
Attempt at Rolling Mean with built-in inverse_transform for time series inverse_transform can only be applied to the original series, or an immediately following forecast Does not play well with data with NaNs Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision.
- Parameters
window (int) – number of periods to take mean over
-
fit
(df)¶ Fits.
- Parameters
df (pandas.DataFrame) – input dataframe
-
fit_transform
(df)¶ Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame
-
inverse_transform
(df, trans_method: str = 'forecast')¶ Returns data to original or forecast form
- Parameters
df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original
-
transform
(df)¶ Returns rolling data :param df: input dataframe :type df: pandas.DataFrame
-
class
autots.tools.transform.
SeasonalDifference
(lag_1: int = 7, method: str = 'LastValue')¶ Bases:
object
Remove seasonal component.
- Parameters
lag_1 (int) – length of seasonal period to remove.
method (str) – ‘LastValue’, ‘Mean’, ‘Median’ to construct seasonality
-
fit
(df)¶ Fits.
- Parameters
df (pandas.DataFrame) – input dataframe
-
fit_transform
(df)¶ Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame
-
inverse_transform
(df, trans_method: str = 'forecast')¶ Returns data to original or forecast form
- Parameters
df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original
-
transform
(df)¶ Returns rolling data :param df: input dataframe :type df: pandas.DataFrame
-
class
autots.tools.transform.
SinTrend
¶ Bases:
object
Modelling sin.
-
fit
(df)¶ Fits trend for later detrending :param df: input dataframe :type df: pandas.DataFrame
-
fit_sin
(tt, yy)¶ Fit sin to the input time sequence, and return fitting parameters “amp”, “omega”, “phase”, “offset”, “freq”, “period” and “fitfunc”
from user unsym @ https://stackoverflow.com/questions/16716302/how-do-i-fit-a-sine-curve-to-my-data-with-pylab-and-numpy
-
fit_transform
(df)¶ Fits and Returns Detrended DataFrame :param df: input dataframe :type df: pandas.DataFrame
-
inverse_transform
(df)¶ Returns data to original form :param df: input dataframe :type df: pandas.DataFrame
-
transform
(df)¶ Returns detrended data :param df: input dataframe :type df: pandas.DataFrame
-
-
class
autots.tools.transform.
StatsmodelsFilter
(method: str = 'bkfilter')¶ Bases:
object
Irreversible filters.
-
fit
(df)¶ Fits filter.
- Parameters
df (pandas.DataFrame) – input dataframe
-
fit_transform
(df)¶ Fit and Return Detrended DataFrame.
- Parameters
df (pandas.DataFrame) – input dataframe
-
inverse_transform
(df)¶ Return data to original form.
- Parameters
df (pandas.DataFrame) – input dataframe
-
transform
(df)¶ Return detrended data.
- Parameters
df (pandas.DataFrame) – input dataframe
-
-
autots.tools.transform.
clip_outliers
(df, std_threshold: float = 3)¶ Replace outliers above threshold with that threshold. Axis = 0.
- Parameters
df (pandas.DataFrame) – DataFrame containing numeric data
std_threshold (float) – The number of standard deviations away from mean to count as outlier.
-
autots.tools.transform.
remove_outliers
(df, std_threshold: float = 3)¶ Replace outliers with np.nan. https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-pandas-data-frame
- Parameters
df (pandas.DataFrame) – DataFrame containing numeric data, DatetimeIndex
std_threshold (float) – The number of standard deviations away from mean to count as outlier.
-
autots.tools.transform.
simple_context_slicer
(df, method: str = 'None', forecast_length: int = 30)¶ Condensed version of context_slicer with more limited options.
- Parameters
df (pandas.DataFrame) – training data frame to slice
method (str) –
Option to slice dataframe ‘None’ - return unaltered dataframe ‘HalfMax’ - return half of dataframe ‘ForecastLength’ - return dataframe equal to length of forecast ‘2ForecastLength’ - return dataframe equal to twice length of forecast
(also takes 4, 6, 8, 10 in addition to 2)
Module contents¶
basic utilities