Step 7: Merging results back together

Note

Remember to look at the example code:

Now that we have all the partial solutions in *.sums, we can merge them together to generate the final answer: the variance of all 100,000 random numbers.

Calculating variances from the sums and sum of squares of all chunks

If we add up all the sums, and sum of squares we calculated previously, we can obtain the variance as follows:

variance = (sum_squared - sum * sum / N)/N

where N is the number of values

See the wikipedia entry for a discussion of why this is a very naive approach!

To do this, all we have to do is merge together all the values in *.sums, i.e. add up the sums and sum_squared for each chunk. We can then apply the above (naive) formula.

Merging files is straightforward in Ruffus:

../../_images/simple_tutorial_merge1.png

The @merge decorator tells Ruffus to take all the files from the step 6 task (i.e. *.sums), and produced a merged file in the form of "variance.result".

Thus if step_6_calculate_sum_of_squares created
1.sums and
2.sums etc.

This would result in the following function call:

../../_images/simple_tutorial_merge2.png

The final result is, of course, in "variance.result".