- Simple tutorial overview
- pipeline functions in detail
The trickiest part of developing pipelines is understanding how your data flows through the pipeline.
Parameters and files are passed from one task to another down the chain of pipelined functions.
Whether you are learning how to use ruffus, or trying out a new feature in ruffus, or just have a horrendously complicated pipeline to debug (we have colleagues with >100 criss-crossing pipelined stages), your best friend is pipeline_printout(...)
pipeline_printout(...) takes the same parameters as pipeline_run but just prints the tasks which are and are not up-to-date.
The verbose parameter controls how much detail is displayed.
Let us take the two step pipelined code we have previously written, but call pipeline_printout(...) instead of pipeline_run(...). This lists the tasks which will be run in the pipeline:
![]()
To see the input and output parameters of each job in the pipeline, we can increase the verbosity from the default (1) to 3:
![]()
- This is very useful for checking that the input and output parameters have been specified
- correctly.
It is often useful to see which tasks are or are not up-to-date. For example, if we were to run the pipeline in full, and then modify one of the intermediate files, the pipeline would be partially out of date.
Let us start by run the pipeline in full but then modify job1.stage so that the second task is no longer up-to-date:
pipeline_run([second_task]) # modify job1.stage1 open("job1.stage1", "w").close()At a verbosity of 5, even jobs which are up-to-date will be displayed. We can now see that the there is only one job in second_task(...) which needs to be re-run because job1.stage1 has been modified after job1.stage2 (highlighted in blue):
![]()