Converting FROM the Collins Economics Result Object (CERO) format¶
Output-Independent Instructions¶
Setting up a FromCERO configuration file¶
Like all other configuration files for this program, the configuration file must be in YAML format. The highest hierarchical level (i.e. with the least/no indentation) is referred to as a FromCERO object). It is necessary (for the configuration to output anything meaningful) to define the option:
procedures: (list[dict|str])
, whereprocedures
is a list of one or more procedure objects. Procedure Objects are explained below .
Procedures define the mutation(s) and, if desired, the export of the mutated data to a file. If a procedure does not specify an export file then ConCERO will, by default, output the procedure to output.csv
in the current working directory. The default output file can be overwridden by specifying the file
option of the FromCERO object.
It is recommended that the following option be specified:
file: (str)
- Names the file for which all procedure objects are exported into. Procedure objects will export into this file unless a procedure-object-specificfile
has been defined. The extension offile
determines the exported file type. Supported file types are:
- Numpy arrays -
npy
- GAMS Data eXchange format -
gdx
(temporarily unsupported)- HAR files -
har
- Shock files -
shk
- Portable Network Graphics format -
png
- Portable Document Format -
- PostScript -
ps
- Encapsulated PostScript -
eps
- Scalable Vector Graphics -
svg
Other options include:
sets: (dict: str -> List[str])
- sets is a dictionary mappingstr
to alist
ofstr
.sets
provides an easy and convenient way to select groups of CERO identifiers (see CERO Identifiers), as opposed to simply listing all the identifiers that are of interest for output. More detail about sets is provided below in the section Sets.map: (dict: str -> str)
- key-value pairs that maps the “old” identifier to a “new” identifier.ref_dir: (str)
whereref_dir
is a file path relative to the current working directory. By default, all file names are interpreted as being relative to the configuration file. Providing this option overrides the default.lstrip: (str)
wherestr
, if provided, strips the left-most substring from all identifiers that make up the input. If the string does not match the start of the identifier (if the identifier is a str), or the first field of the identifier (if the identifier is a tuple), then a ValueError is raised. This option is designed to correspond to CEROs generated using ToCERO with theauto_prepend
option provided.libfuncs: (str|list[str])
- paths relative toref_dir
of python files containing functions to use as operation functions. Note that the.py
filename extension must be included. The structure of a libfuncs file is discussed below in Libfuncs Files.
Note that, in general, properties at a lower level (i.e. more indentation) ‘inherit’ from a higher level.
So, an example configuration, in YAML format, is:
file: a_csv_file.csv
procedures:
- <Procedure Object A>
- <Procedure Object B>
- <Procedure Object C>
- <etc.>
Examples of complete configuration files can be found in the tests/data
subdirectory of the ConCERO install path.
Procedure Objects¶
Conceptually, procedure objects provide the instructions to select data from a CERO, mutate that data (if necessary), and then either, (a): output this data into a file, or (b): return outputs for later export into a global file (specified by the file
option in the outputs object). Any mutations that are applied to a procedure object’s inputs
are isolated from any other procedure object and the CERO itself - i.e. each procedure can be considered a ‘silo’ separate from others.
A procedure object can be either a str
or a dict
. The dict
form is the more general form - if a procedure object is provided as the (str
) ser_obj
, it is immediately converted to the equivalent form {'name': ser_obj}
. The complete list of options is:
name: (str)
- the name given to the procedure. Will, by default, given the nameUnnamed_proc
.file: (str)
- if provided, the output from this procedure object, and only this procedure object, will be exported tofile
.inputs: list(str|list(str))
- is a list of identifiers corresponding to identifiers in the CERO. If an item of the list is a string with one or more commas, or is itself a list, then the item will be interpreted as a tuple-form identifier. See CERO Identifiers.outputs: list(str|list(str))
- a list of identifiers that are to be exported to the file. If outputs are not specified, then ConCERO will export all updated inputs after all operations are performed. Read Description of the output process-flow to understand what is meant by updated inputs. If it is desirable that none of the data series be exported to a file in a conventional manner, which is the case if - for example - plotting output, then specifyoutputs
, but leave the corresponding value blank to indicate a value ofNone
. If an item of the list is a string with one or more commas, or is itself a list, then the item will be interpreted as a tuple-form identifier. See CERO Identifiers.operations: list[operations objects]
- to mutate theinputs
into a desirable form for export, operations must be applied to mutate the data.operations
is a list of operations objects, which modify the data in a sequential manner. See Operations Objects for more information.libfuncs: (str|list[str])
- Identical in meaning to the equivalentFromCERO
object option. Is inherited from aFromCERO
object if not given.
Below is a shell showing the two different procedure object types:
procedures:
- name: (str)
inputs: (list[str])
operations: (list[operation])
output_file: (str)
- (str)
The 1st procedure object is in dictionary form, and the 2nd is in string form.
Inheritance paths¶
Below is an outline of how options are inherited:
inputs
- If inputs is undefined, theninputs
is the entire CERO (whatever that may be at runtime).outputs
- If outputs is undefined orTrue
, then allinputs
areoutputs
. Ifoutputs
is False or None, then there are nooutputs
. A list or str can be provided to select specificoutputs
.
Operations Objects¶
An operation refers to the process of applying a function to some inputs to return an output(s). Unlike separate procedures, operations (within the same procedure object) can not be considered to operate in a ‘silo-ed’ manner, and therefore the order of operations
is significant. Each item of the list operations
must be an operation object - that is, a dict
, which may contain the options:
func: (str)
-func
is the name of a function present in alibfuncs
library that is applied toarrays
(see below). The functions available can be easily expanded by:
- Correctly identifying the class of the new function - see Classes of User-Specified Functions for operating on CEROs.
- Adding the function to a python source code file, with the associated function decorator (as explained in Classes of User-Specified Functions for operating on CEROs), and providing that file to ConCERO with the
libfuncs
procedure option. The systemlibfuncs.py
will be searched after any referenced files.arrays: list(str|list(str))
-arrays
defines which of theinputs
thatfunc
will manipulate. Ifarrays
is not provided,arrays
defaults to all procedure objectinputs
. Note that any manipulation applied toarrays
will be in effect for all subsequentoperations
.rename: (list|dict)
- providing this option as a list renamesarrays
after the application offunc
(if provided). Ifrename
is provided as a list, then the list is parsed as identifiers (see CERO Identifiers) and must be the same length asarrays
. If provided as a dict, only thosearrays
matching keys in the dict are renamed to the corresponding value. Regardless of the form ofrename
(i.e. list or dict), references tosets
can be made. In the specific case that there is one and only onearrays
, thenrename
can be provided as a str. If rename is provided and the new identifier values are not already inarrays
, thenrename
expandsarrays
to include the new identifers (and the data series corresponding to the original identifiers are left untouched). By using this behaviour,rename
can be used to applyfunc
to specificarrays
without altering the originalarrays
.start_year: (int)
- this option constrains the dataset to years after and includingstart_year
. This option may be useful to avoid attempting to applyfunc
to missing data.end_year: (int)
- this option constrains the dataset to years before and includingend_year
. This option may be useful to avoid attempting to applyfunc
to missing data.
Any additional options are passed to func
as keyword arguments.
Sets¶
The sets
option must have the following form:
sets: dict[str -> list(str)]
The sets
option provides a powerful way to list many identifiers with a small amount of references. An example configuration of sets is:
sets:
ASET:
- a
- b
- c
A user can then specify all the elements of the set (for inputs
, arrays
and outputs
) by referencing the set. For example:
sets:
ASET:
- a
- b
- c
- d
- e
procedures:
- name: a_procedure
inputs: ASET
operations:
- func: a_func
- name: b_procedure
inputs: ASET
operations:
- func: b_func
Which is equivalent to the more verbose:
procedures:
- name: a_procedure
inputs:
- a
- b
- c
- d
- e
operations:
- func: a_func
- name: b_procedure
inputs:
- a
- b
- c
- d
- e
operations:
- func: b_func
Specifying sets
is even more powerful when using them in the context of tuple-identifiers. For example, consider that these (100*100 = 10,000) identifiers were in the CERO (in python list form):
[('1', '1'), ('1', '2'), ('1', '3'), ..., ('1', '100'), ('2', '1'), ('2', '2'), ..., ('2', '100'),
('3', '1'), ..., ('3', '100'), ..., ('100', '100')]
Rather than listing all 10,000 identifiers, a user can create a set:
sets = {'century': ['1', '2', '3', ..., '100']}
and select all 10,000 identifiers by referencing the set twice with a comma inbetween - e.g. in YAML:
inputs:
- century,century
Note that the selection takes place by using the cartesian product operation, and it is necessary that the cartesian product be convex.
Libfuncs Files¶
A libfuncs file is a standard python source file. However, to use the definitions as operations in ConCERO, it is necessary to wrap the functions with specialised wrappers. Therefore, an example python source code file that provides ConCERO-compatible operations is:
from concero.libfuncs_wrappers import recursive_op
@recursive_op
def double_values(x):
return 2*x
Where the double_values
function will simply double the value of all input series. Note that series_op
and dataframe_op
are also wrappers to encapsulate functions to ensure they are ConCERO-compatible. For more information on how to use the wrappers, please consult Classes of User-Specified Functions for operating on CEROs .
Description of the output process-flow¶
Each procedure object corresponds to the output of an object into a file. Every procedure takes inputs (from a CERO), mutates this inputs in some way (or not and then outputs some, if not all, of the mutated inputs into a file. More specifically, in converting a CERO to an output file, the general process flow is:
- From the given CERO, identify using
inputs
the relevant data series by their identifier.- Copy those
inputs
to avoid disturbing/mutating the original CERO.
- From the copied inputs, perform a sequence of operations where, for each operation:
- All of the
inputs
, or a subset ofinputs
is selected (that is, thearrays
).- A function mutates the
arrays
.- If given,
arrays
arerename
d.- The copied inputs get updated with the mutated
arrays
. For values ofarrays
that matchinputs
, thoseinputs
are overwritten. Otherwise (in the eventarrays
have beenrename
‘d) they are added toinputs
.
- Export
outputs
to either:
file
, iffile
is specified the procedure object, orfile
as defined in the FromCERO object, if specified, oroutput.csv
iffile
is unspecified in either the procedure or FromCERO objects.
FromCERO Technical Specification¶
-
class
from_cero.
FromCERO
(conf: dict, *args, parent=None, **kwargs)[source]¶ Any additional arguments and keyword arguments are passed to the superclass at initialisation (i.e. the dict class).
Parameters: - conf ("Union[str,dict]") – A dictionary containing the configuration. If a str is provided, it is interpreted as a file (in YAML format) containing a configuration dictionary (relative to the current working directory).
- parent (dict) – If provided, the created object will inherit from
parent
(a dict).
-
exec_procedures
(cero)[source]¶ Execute all the procedures of the FromCERO object . :param pandas.DataFrame cero: A CERO to serve as input for the procedures. The argument is not mutated/modified.
-
static
is_valid
(conf: dict, raise_exception=True)[source]¶ Performs static checks on
conf
to verify ifconf
can be converted to a FromCERO object.- Checks include:
- Valid type.
- Valid procedures.
- If
file
given, that the user has write permissions in that directory.
Parameters: Return bool: True if
conf
passes all static checks.
-
static
load_config
(conf, parent=None)[source]¶ Loads configuration of FromCERO. If conf is a str, this is interpreted as a file (in YAML format) containing a configuration dictionary (relative to the current working directory). Otherwise conf must be a dictionary.
Parameters: conf ('Union[str,dict]') – Return dict:
Created on Jan 22 08:44:08 2018
Section author: Lyle Collins <Lyle.Collins@csiro.au>
Code author: Lyle Collins <Lyle.Collins@csiro.au>