Classes of User-Specified Functions for operating on CEROs¶
The set of functions that could be applied to the CERO, and data series within the CERO, is infinitely large, so it is obviously impossible to provide all these functions. It is therefore necessary that the user provide functions as they are needed, by writing the appropriate python 3 code and including this function in libfuncs.py
. To minimise the difficulty and complexity of achieving this, ConCERO includes 3 classes of wrapper functions, that significantly reduce the difficulty for the user in extending the power of FromCERO
.
A wrapper function is a function that encapsulates another function, and therefore has access to both the inputs and outputs of the encapsulated function. Because the wrapper function has access to the inputs, it can provide pre-processing on the input to reshape it into a specific form, and because it has access to the output of the function, it can post-process the output of the function - mutating it into a desirable form.
A wrapper function can be directed to encapsulate a function by preceding the function with a decorator. A decorator is a simple one line statement that starts with the ‘@’ symbol and then the name of the wrapper function. For example, to encapsulate func
with the dataframe_op
wrapper, the code is:
@dataframe_op
def func(*args, **kwargs):
...
return cero
The wrapper functions themselves are stored in the libfuncs_wrappers
module, but the wrappers themselves should never be altered by the user.
What the 3 classes of wrappers are, and how to apply the function wrappers, are explained below, in addition to the case where no wrapper/decorator is provided.
Class 1 Functions - DataFrame Operations¶
Class 1 functions are the most general type of wrapper functions, and can be considered a superset of the other two. Class 1 functions operate on a pandas.DataFrame
object, and therefore can operate on an entire CERO if need be. A class 1 function must have the following function signature:
@dataframe_op
def func_name(df, *args, **kwargs):
...
return cero
Note the following key features:
- The function is proceeded by the
dataframe_op
decorator (imported fromlibfuncs_wrappers
).- The first argument provided to
func_name
, that isdf
, will be a CERO (an instance of a pandas.DataFrame), reduced by thearrays
/inputs
options.- The returned object (
cero
) must be a valid CERO. A valid CERO is apandas.DataFrame
object with a ``DatetimeIndex``for columns and tuples/string-type values for the index values.
The libfuncs
function merge
provides a simple example of how to apply this wrapper:
@dataframe_op
def merge(df):
df.iloc[0, :] = df.sum(axis=0) # Replaces the first series with the sum of all the series
return df
Class 2 Functions - Series Operations¶
Class 2 functions operate on a single pandas.Series
object. Note that a single row of a pandas.DataFrame
is an instance of a pandas.Series
. The series operations class can be considered a subset of DataFrame operations, and a superset of all recursive operations (discussed below).
Similar to class 1 functions, class 2 functions must fit the form:
@series_op
def func(series, *args, **kwargs):
...
return pandas_series
With similar features:
- The function is proceeded by the
@series_op
decorator (imported fromlibfuncs_wrappers
).- The first argument (
series
) must be ofpandas.Series
type.- Return an object of
pandas.Series
type (pandas_series
).pandas_series
must be of the sameshape
asseries
.
Class 3 Functions - Recursive Operations¶
Recursive operations must fit the form:
@recursive_op
def func(*args, **kwargs):
...
return calc
Noting that:
- Positional arguments are provided in the same order as their sequence in the data series.
- The return value
calc
must be a single floating-point value.
Note that options can be provided to an operation object to alter the behaviour of the recursive operation. Those options are:
init: list(float)
- values that precede the data series that serve as initialisation values.post: list(float)
- values that follow the data series for non-causal recursive functions.auto_init: init
- automatically prepend the first value in the array anauto_init
number of times to the series (and therefore using that as the initial conditions).auto_post: int
- automatically postpend the last value in the array anauto_post
number of times to the series (and therefore using that as the post conditions).init_cols: list(int)
- specifies the year(s) to use as initialisation values.post_cols: list(int)
- specifies the year(s) to use as post-pended values.init_icols: list(int)
- specifies the index (zero-indexed) to use as initialisation values.post_icols: list(int)
- specifies the index (zero-indexed) to use as post-pended values.inplace: bool
- IfTrue
, then the recursive operation will be applied on the array inplace, such that the result from a previous iteration is used in subsequent iterations. IfFalse
, the operation proceeds ignorant of the results of previous iterations.True
by default.
How these items are to be applied is probably best explained with an example - consider the recursive operation is a 3 sample moving point averaging filter. This can be implemented by including mv_avg_3()
(below) in libfuncs.py
:
@recursive_op
def mv_avg_3(a, b, c):
return (a + b + c)/3
It is also necessary to provide the arguments, init
and post
in the configuration file, so the operation object looks somthing like:
func: mv_avg_3
init:
- 1
post:
- 2
This operation would transform the data series [2, 1, 3]
to the values [1.3333, 1.7777, 2.2593]
- i.e. [(1+2+1)/3, (1.333+1+3)/3, (1.7777+3+2)/3]
. If, instead, the configuration file looks like:
func: mv_avg_3
init:
- 1
post:
- 2
inplace: False
Then the output of the same series would be [1.3333, 2, 2]
- that is, [(1+2+1)/3, (2+1+3)/3, (1+3+2)/3]
.
Wrapper-less Functions¶
It is strongly recommended that a user use the defined wrappers to encapsulate functions. This section should only be used as guidance to understand how the wrappers operate with the FromCERO
module, and for understanding how to write additional wrappers (which is a non-trivial exercise).
A function that is not decorated with a pre-defined wrapper (as discussed previously) must have the following function signature to be compatible with the FromCERO
module:
def func_name(df, *args, locs=None, **kwargs):
...
return cero
Where:
df
receives the entire CERO (as handled by the calling class), andlocs
receives a list of all identifiers specifying which series of the CERO have been specified, andcero
is the returned dataframe and must be of CERO type. The FromCERO module will overwrite any values of its own CERO with those provided bycero
, based on an index match (after renaming takes place).
Other Notes¶
- Avoid trying to create a renaming function - use the
cero.rename_index_values()
method - it has been designed to work around a bug in Pandas (Issue #19497).- The system module
libfuncs
serves as a source of examples for how to use the function wrappers.
Technical Specifications of Decorators¶
-
libfuncs_wrappers.
dataframe_op
(func)[source]¶ This decorator is designed to provide
func
(the encapsulated function) with a restricted form ofdf
(a CERO). A restricteddf
is the originaldf
limited to a subset of rows and/or columns. Note that a restriction ondf.columns
will be compact (the mathematical property), but this is not necessarily the case for restriction ondf.index
.
-
libfuncs_wrappers.
series_op
(func)[source]¶ This decorator provides
func
(the encapsulated function) with the firstpandas.Series
in apandas.DataFrame
(i.e. the first row indf
). Note that this wrapper is encapsulated within thedataframe_op
wrapper.
-
libfuncs_wrappers.
recursive_op
(func)[source]¶ Applies the encapsulated function (
func
) iteratively to the elements ofarray
from left to right, withinit
prepended toarray
andpost
postpended.
-
libfuncs_wrappers.
log_func
(func)[source]¶ Logging decorator - for debugging purposes. To apply to function
func
:@log_func def func(*args, **kwargs): ...
Created on Thu Dec 21 16:36:02 2017
@author: Lyle Collins @email: Lyle.Collins@csiro.au