See also
- Purpose:
Groups / collates sets of input files, each into a separate summary.
This variant of @collate allows additional inputs or dependencies to be added dynamically to the task.
Output file names are determined from tasks_or_file_names, i.e. from the output of up stream tasks, or a list of file names.
This variant of @collate allows input file names to be derived in the same way.
add_inputs nests the the original input parameters in a list before adding additional dependencies.
inputs replaces the original input parameters wholescale.
Only out of date tasks (comparing input and output files) will be run
Example of add_inputs
regex(r".*(\..+)"), "\1.summary" creates a separate summary file for each suffix. But we also add date of birth data for each species:
animal_files = "tuna.fish", "shark.fish", "dog.mammals", "cat.mammals" # summarise by file suffix: @collate(animal_files, regex(r".+\.(.+)$"), add_inputs(r"\1.date_of_birth"), r'\1.summary') def summarize(infiles, summary_file): passThis results in the following equivalent function calls:
summarize([ ["shark.fish", "fish.date_of_birth" ], ["tuna.fish", "fish.date_of_birth" ] ], "fish.summary") summarize([ ["cat.mammals", "mammals.date_of_birth"], ["dog.mammals", "mammals.date_of_birth"] ], "mammals.summary")Example of add_inputs
using inputs(...) will summarise only the dates of births for each species group:
animal_files = "tuna.fish", "shark.fish", "dog.mammals", "cat.mammals" # summarise by file suffix: @collate(animal_files, regex(r".+\.(.+)$"), inputs(r"\1.date_of_birth"), r'\1.summary') def summarize(infiles, summary_file): passThis results in the following equivalent function calls:
summarize(["fish.date_of_birth" ], "fish.summary") summarize(["mammals.date_of_birth"], "mammals.summary")Parameters:
- tasks_or_file_names
can be a:
- Task / list of tasks (as in the example above).
File names are taken from the output of the specified task(s)
- (Nested) list of file name strings.
- File names containing *[]? will be expanded as a glob.
E.g.:"a.*" => "a.1", "a.2"
- matching_regex
is a python regular expression string, which must be wrapped in a regex indicator object See python regular expression (re) documentation for details of regular expression syntax
- matching_formatter
a formatter indicator object containing optionally a python regular expression (re).
- input_pattern
Specifies the resulting input(s) to each job. Must be wrapped in an inputs or an inputs indicator object.
Can be a:
- Task / list of tasks (as in the example above).
File names are taken from the output of the specified task(s)
- (Nested) list of file name strings.
Strings will be subject to substitution. File names containing *[]? will be expanded as a glob. E.g.:"a.*" => "a.1", "a.2"
- output_pattern
Specifies the resulting output file name(s).
- extra_parameters
Any extra parameters are passed verbatim to the task function
- outputs and optional extra parameters are passed to the functions after string substitution in any strings. Non-string values are passed through unchanged.
- Each collate job consists of input files which are aggregated by string substitution to a single set of output / extra parameter matches
See @collate for more straightforward ways to use collate.