If you are looking for a quick introduction to Ruffus, you may want to look at the Simple Tutorial first, some of which content is shared with, or elaborated on, by this manual.
The Ruffus module is a lightweight way to run computational pipelines.
Computational pipelines often become quite simple if we breakdown the process into simple stages.
Note
Ruffus refers to each stage of your pipeline as a task.
Let us start with the usual “Hello World”.We have the following two python functions which we would like to turn into an automatic pipeline:![]()
The simplest Ruffus pipeline would look like this:
![]()
The functions which do the actual work of each stage of the pipeline remain unchanged. The role of Ruffus is to make sure these functions are called in the right order, with the right parameters, running in parallel using multiprocessing if desired.
There are three simple parts to building a ruffus pipeline
- importing ruffus
- “Decorating” functions which are part of the pipeline
- Running the pipeline!
The most convenient way to use ruffus is to import the various names directly:
from ruffus import *This will allow ruffus terms to be used directly in your code. This is also the style we have adopted for this manual.
Category Terms Pipeline functions pipeline_printout pipeline_printout_graph pipeline_run register_cleanupDecorators @follows @files @split @transform @merge @collate @posttask @jobs_limit @parallel @check_if_uptodate @files_reLoggers stderr_logger black_hole_loggerParameter disambiguating Indicators suffix regex inputs touch_file combine mkdir output_from
import ruffus
ruffus.pipeline_printout("...")
You need to tag or decorator existing code to tell Ruffus that they are part of the pipeline.
Note
decorators are ways to tag or mark out functions.
They start with an @ prefix and take a number of parameters in parenthesis.
![]()
The ruffus decorator @follows makes sure that second_task follows first_task.
We run the pipeline by specifying the last stage (task function) of your pipeline. Ruffus will know what other functions this depends on, following the appropriate chain of dependencies automatically, making sure that the entire pipeline is up-to-date.
In our example above, because second_task depends on first_task, both functions are executed in order.
>>> pipeline_run([second_task], verbose = 1)Ruffus by default prints out the verbose progress through your pipeline, interleaved with our Hello and World.
![]()