Now that we have many smaller lists of numbers in separate files, we can calculate their sums and sum of squares in parallel.
All we need is a function which takes a *.chunk file, reads the numbers, calculates the answers and writes them back out to a corresponding *.sums file.
Ruffus magically takes care of applying this task function to all the different data files in parallel.
The first thing to note about this example is that the input files are not specified as a glob (e.g. *.chunk) but as the preceding task.Ruffus will take all the files produced by step_5_split_numbers_into_chunks() and feed them as the input into step 6.This handy shortcut also means that Ruffus knows that step_6_calculate_sum_of_squares depends on step_5_split_numbers_into_chunks and an additional @follows directive is unnecessary.
The use of suffix within the decorator tells Ruffus to take all input files with the .chunks suffix and substitute a .sums suffix to generate the corresponding output file name.
- Thus if step_5_split_numbers_into_chunks created
"1.chunks" "2.chunks" "3.chunks"This would result in the following function calls:
step_6_calculate_sum_of_squares ("1.chunk", "1.sums") step_6_calculate_sum_of_squares ("2.chunk", "2.sums") step_6_calculate_sum_of_squares ("3.chunk", "3.sums") # etc...Note
It is possible to generate output filenames using more powerful regular expressions as well. See the @transform syntax documentation for more details.