See also

@product

@product ( tasks_or_file_names, formatter(matching_formatter), [tasks_or_file_names, formatter(matching_formatter), ... ], output_pattern, [extra_parameters,...] )

Purpose:

Generates the Cartesian product, i.e. all vs all comparisons, between sets of input files.

The effect is analogous to the python itertools function of the same name, i.e. a nested for loop.

>>> from itertools import product
>>> # product('ABC', 'XYZ') --> AX AY AZ BX BY BZ CX CY CZ
>>> [ "".join(a) for a in product('ABC', 'XYZ')]
['AX', 'AY', 'AZ', 'BX', 'BY', 'BZ', 'CX', 'CY', 'CZ']

Only out of date tasks (comparing input and output files) will be run

Output file names and strings in the extra parameters are determined from tasks_or_file_names, i.e. from the output of up stream tasks, or a list of file names, after string replacement via formatter.

The replacement strings require an extra level of indirection to refer to parsed components:

  1. The first level refers to which set of inputs (e.g. A,B or P,Q or X,Y in the following example.)
  2. The second level refers to which input file in any particular set of inputs.
For example, '{basename[2][0]}' is the basename for
  • the third set of inputs (X,Y) and
  • the first file name string in each Input of that set ("x.1_start" and "y.1_start")

Example:

Calculates the @product of A,B and P,Q and X, Y files

from ruffus import *
from ruffus.combinatorics import *

#   Three sets of initial files
@originate([ 'a.start', 'b.start'])
def create_initial_files_ab(output_file):
    with open(output_file, "w") as oo: pass

@originate([ 'p.start', 'q.start'])
def create_initial_files_pq(output_file):
    with open(output_file, "w") as oo: pass

@originate([ ['x.1_start', 'x.2_start'],
             ['y.1_start', 'y.2_start'] ])
def create_initial_files_xy(output_file):
    with open(output_file, "w") as oo: pass

#   @product
@product(   create_initial_files_ab,        # Input
            formatter("(.start)$"),         # match input file set # 1

            create_initial_files_pq,        # Input
            formatter("(.start)$"),         # match input file set # 2

            create_initial_files_xy,        # Input
            formatter("(.start)$"),         # match input file set # 3

            "{path[0][0]}/"                 # Output Replacement string
            "{basename[0][0]}_vs_"          #
            "{basename[1][0]}_vs_"          #
            "{basename[2][0]}.product",     #

            "{path[0][0]}",                 # Extra parameter: path for 1st set of files, 1st file name

            ["{basename[0][0]}",            # Extra parameter: basename for 1st set of files, 1st file name
             "{basename[1][0]}",            #                               2nd
             "{basename[2][0]}",            #                               3rd
             ])
def product_task(input_file, output_parameter, shared_path, basenames):
    print "# basenames      = ", " ".join(basenames)
    print "input_parameter  = ", input_file
    print "output_parameter = ", output_parameter, "\n"


#
#       Run
#
pipeline_run(verbose=0)

This results in:

>>> pipeline_run(verbose=0)

# basenames      =  a p x
input_parameter  =  ('a.start', 'p.start', 'x.start')
output_parameter =  /home/lg/temp/a_vs_p_vs_x.product

# basenames      =  a p y
input_parameter  =  ('a.start', 'p.start', 'y.start')
output_parameter =  /home/lg/temp/a_vs_p_vs_y.product

# basenames      =  a q x
input_parameter  =  ('a.start', 'q.start', 'x.start')
output_parameter =  /home/lg/temp/a_vs_q_vs_x.product

# basenames      =  a q y
input_parameter  =  ('a.start', 'q.start', 'y.start')
output_parameter =  /home/lg/temp/a_vs_q_vs_y.product

# basenames      =  b p x
input_parameter  =  ('b.start', 'p.start', 'x.start')
output_parameter =  /home/lg/temp/b_vs_p_vs_x.product

# basenames      =  b p y
input_parameter  =  ('b.start', 'p.start', 'y.start')
output_parameter =  /home/lg/temp/b_vs_p_vs_y.product

# basenames      =  b q x
input_parameter  =  ('b.start', 'q.start', 'x.start')
output_parameter =  /home/lg/temp/b_vs_q_vs_x.product

# basenames      =  b q y
input_parameter  =  ('b.start', 'q.start', 'y.start')
output_parameter =  /home/lg/temp/b_vs_q_vs_y.product

Parameters:

  • tasks_or_file_names

    can be a:

    1. Task / list of tasks (as in the example above).

      File names are taken from the output of the specified task(s)

    2. (Nested) list of file name strings.
      File names containing *[]? will be expanded as a glob.

      E.g.:"a.*" => "a.1", "a.2"

  • output_pattern

    Specifies the resulting output file name(s) after string substitution

  • extra_parameters

    Optional extra parameters are passed to the functions after string substitution