cpg_flow.targets.target

This module defines the Target class, which represents a target that a stage can act upon. The Target class includes methods to retrieve sequencing groups and their IDs, compute a hash for alignment inputs, and provide job attributes and prefixes. It also includes a property to retrieve a unique target ID and a method to map internal IDs to participant or external IDs.

Classes: Target: Defines a target that a stage can act upon.

Methods: get_sequencing_groups(only_active: bool = True) -> list["SequencingGroup"]: Get a flat list of all sequencing groups corresponding to this target.

get_sequencing_group_ids(only_active: bool = True) -> list[str]:
    Get a flat list of all sequencing group IDs corresponding to this target.

get_alignment_inputs_hash() -> str:
    Compute a hash for the alignment inputs of the sequencing groups.

target_id() -> str:
    Property to retrieve a unique target ID.

get_job_attrs() -> dict:
    Retrieve attributes for Hail Batch job.

get_job_prefix() -> str:
    Retrieve prefix for job names.

rich_id_map() -> dict[str, str]:
    Map internal IDs to participant or external IDs, if the latter is provided.

Targets for workflow stages: SequencingGroup, Dataset, Cohort.

  1"""
  2
  3This module defines the `Target` class, which represents a target that a stage can act upon.
  4The `Target` class includes methods to retrieve sequencing groups and their IDs, compute a hash
  5for alignment inputs, and provide job attributes and prefixes. It also includes a property to
  6retrieve a unique target ID and a method to map internal IDs to participant or external IDs.
  7
  8Classes:
  9    Target: Defines a target that a stage can act upon.
 10
 11Methods:
 12    get_sequencing_groups(only_active: bool = True) -> list["SequencingGroup"]:
 13        Get a flat list of all sequencing groups corresponding to this target.
 14
 15    get_sequencing_group_ids(only_active: bool = True) -> list[str]:
 16        Get a flat list of all sequencing group IDs corresponding to this target.
 17
 18    get_alignment_inputs_hash() -> str:
 19        Compute a hash for the alignment inputs of the sequencing groups.
 20
 21    target_id() -> str:
 22        Property to retrieve a unique target ID.
 23
 24    get_job_attrs() -> dict:
 25        Retrieve attributes for Hail Batch job.
 26
 27    get_job_prefix() -> str:
 28        Retrieve prefix for job names.
 29
 30    rich_id_map() -> dict[str, str]:
 31        Map internal IDs to participant or external IDs, if the latter is provided.
 32
 33Targets for workflow stages: SequencingGroup, Dataset, Cohort.
 34
 35"""
 36
 37import hashlib
 38from typing import TYPE_CHECKING
 39
 40if TYPE_CHECKING:
 41    from cpg_flow.targets import SequencingGroup
 42
 43
 44class Target:
 45    """
 46    Defines a target that a stage can act upon.
 47    """
 48
 49    def __init__(self) -> None:
 50        # Whether to process even if outputs exist:
 51        self.forced: bool = False
 52
 53        # If not set, exclude from the workflow:
 54        self.active: bool = True
 55
 56        # create a self.alignment_inputs_hash variable to store the hash of the alignment inputs
 57        # this begins as None, and is set upon first calling
 58        self.alignment_inputs_hash: str | None = None
 59
 60    def get_sequencing_groups(
 61        self,
 62        only_active: bool = True,
 63    ) -> list['SequencingGroup']:
 64        """
 65        Get flat list of all sequencing groups corresponding to this target.
 66        """
 67        raise NotImplementedError
 68
 69    def get_sequencing_group_ids(self, only_active: bool = True) -> list[str]:
 70        """
 71        Get flat list of all sequencing group IDs corresponding to this target.
 72        """
 73        return [s.id for s in self.get_sequencing_groups(only_active=only_active)]
 74
 75    def get_alignment_inputs_hash(self) -> str:
 76        """
 77        If this hash has been set, return it, otherwise set it, then return it
 78        This should be safe as it matches the current usage:
 79        - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts)
 80            - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets)
 81        - we then set up the Stages, where alignment input hashes are generated
 82            - at this point, the alignment inputs are fixed
 83            - all calls to get_alignment_inputs_hash() need to return the same value
 84        """
 85        if self.alignment_inputs_hash is None:
 86            self.set_alignment_inputs_hash()
 87        if self.alignment_inputs_hash is None:
 88            raise TypeError('Alignment_inputs_hash was not populated by the setter method')
 89        return self.alignment_inputs_hash
 90
 91    def set_alignment_inputs_hash(self):
 92        """
 93        Unique hash string of sample alignment inputs. Useful to decide
 94        whether the analysis on the target needs to be rerun.
 95        """
 96        s = ' '.join(
 97            sorted(' '.join(str(s.alignment_input)) for s in self.get_sequencing_groups() if s.alignment_input),
 98        )
 99        h = hashlib.sha256(s.encode()).hexdigest()[:38]
100        self.alignment_inputs_hash = f'{h}_{len(self.get_sequencing_group_ids())}'
101
102    @property
103    def target_id(self) -> str:
104        """
105        ID should be unique across target of all levels.
106
107        We are raising NotImplementedError instead of making it an abstract class,
108        because mypy is not happy about binding TypeVar to abstract classes, see:
109        https://stackoverflow.com/questions/48349054/how-do-you-annotate-the-type-of
110        -an-abstract-class-with-mypy
111
112        Specifically,
113        ```
114        TypeVar('TargetT', bound=Target)
115        ```
116        Will raise:
117        ```
118        Only concrete class can be given where "Type[Target]" is expected
119        ```
120        """
121        raise NotImplementedError
122
123    def get_job_attrs(self) -> dict:
124        """
125        Attributes for Hail Batch job.
126        """
127        raise NotImplementedError
128
129    def get_job_prefix(self) -> str:
130        """
131        Prefix job names.
132        """
133        raise NotImplementedError
134
135    def rich_id_map(self) -> dict[str, str]:
136        """
137        Map if internal IDs to participant or external IDs, if the latter is provided.
138        """
139        return {s.id: s.rich_id for s in self.get_sequencing_groups() if s.participant_id != s.id}
class Target:
 45class Target:
 46    """
 47    Defines a target that a stage can act upon.
 48    """
 49
 50    def __init__(self) -> None:
 51        # Whether to process even if outputs exist:
 52        self.forced: bool = False
 53
 54        # If not set, exclude from the workflow:
 55        self.active: bool = True
 56
 57        # create a self.alignment_inputs_hash variable to store the hash of the alignment inputs
 58        # this begins as None, and is set upon first calling
 59        self.alignment_inputs_hash: str | None = None
 60
 61    def get_sequencing_groups(
 62        self,
 63        only_active: bool = True,
 64    ) -> list['SequencingGroup']:
 65        """
 66        Get flat list of all sequencing groups corresponding to this target.
 67        """
 68        raise NotImplementedError
 69
 70    def get_sequencing_group_ids(self, only_active: bool = True) -> list[str]:
 71        """
 72        Get flat list of all sequencing group IDs corresponding to this target.
 73        """
 74        return [s.id for s in self.get_sequencing_groups(only_active=only_active)]
 75
 76    def get_alignment_inputs_hash(self) -> str:
 77        """
 78        If this hash has been set, return it, otherwise set it, then return it
 79        This should be safe as it matches the current usage:
 80        - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts)
 81            - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets)
 82        - we then set up the Stages, where alignment input hashes are generated
 83            - at this point, the alignment inputs are fixed
 84            - all calls to get_alignment_inputs_hash() need to return the same value
 85        """
 86        if self.alignment_inputs_hash is None:
 87            self.set_alignment_inputs_hash()
 88        if self.alignment_inputs_hash is None:
 89            raise TypeError('Alignment_inputs_hash was not populated by the setter method')
 90        return self.alignment_inputs_hash
 91
 92    def set_alignment_inputs_hash(self):
 93        """
 94        Unique hash string of sample alignment inputs. Useful to decide
 95        whether the analysis on the target needs to be rerun.
 96        """
 97        s = ' '.join(
 98            sorted(' '.join(str(s.alignment_input)) for s in self.get_sequencing_groups() if s.alignment_input),
 99        )
100        h = hashlib.sha256(s.encode()).hexdigest()[:38]
101        self.alignment_inputs_hash = f'{h}_{len(self.get_sequencing_group_ids())}'
102
103    @property
104    def target_id(self) -> str:
105        """
106        ID should be unique across target of all levels.
107
108        We are raising NotImplementedError instead of making it an abstract class,
109        because mypy is not happy about binding TypeVar to abstract classes, see:
110        https://stackoverflow.com/questions/48349054/how-do-you-annotate-the-type-of
111        -an-abstract-class-with-mypy
112
113        Specifically,
114        ```
115        TypeVar('TargetT', bound=Target)
116        ```
117        Will raise:
118        ```
119        Only concrete class can be given where "Type[Target]" is expected
120        ```
121        """
122        raise NotImplementedError
123
124    def get_job_attrs(self) -> dict:
125        """
126        Attributes for Hail Batch job.
127        """
128        raise NotImplementedError
129
130    def get_job_prefix(self) -> str:
131        """
132        Prefix job names.
133        """
134        raise NotImplementedError
135
136    def rich_id_map(self) -> dict[str, str]:
137        """
138        Map if internal IDs to participant or external IDs, if the latter is provided.
139        """
140        return {s.id: s.rich_id for s in self.get_sequencing_groups() if s.participant_id != s.id}

Defines a target that a stage can act upon.

forced: bool
active: bool
alignment_inputs_hash: str | None
def get_sequencing_groups( self, only_active: bool = True) -> list[cpg_flow.targets.sequencing_group.SequencingGroup]:
61    def get_sequencing_groups(
62        self,
63        only_active: bool = True,
64    ) -> list['SequencingGroup']:
65        """
66        Get flat list of all sequencing groups corresponding to this target.
67        """
68        raise NotImplementedError

Get flat list of all sequencing groups corresponding to this target.

def get_sequencing_group_ids(self, only_active: bool = True) -> list[str]:
70    def get_sequencing_group_ids(self, only_active: bool = True) -> list[str]:
71        """
72        Get flat list of all sequencing group IDs corresponding to this target.
73        """
74        return [s.id for s in self.get_sequencing_groups(only_active=only_active)]

Get flat list of all sequencing group IDs corresponding to this target.

def get_alignment_inputs_hash(self) -> str:
76    def get_alignment_inputs_hash(self) -> str:
77        """
78        If this hash has been set, return it, otherwise set it, then return it
79        This should be safe as it matches the current usage:
80        - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts)
81            - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets)
82        - we then set up the Stages, where alignment input hashes are generated
83            - at this point, the alignment inputs are fixed
84            - all calls to get_alignment_inputs_hash() need to return the same value
85        """
86        if self.alignment_inputs_hash is None:
87            self.set_alignment_inputs_hash()
88        if self.alignment_inputs_hash is None:
89            raise TypeError('Alignment_inputs_hash was not populated by the setter method')
90        return self.alignment_inputs_hash

If this hash has been set, return it, otherwise set it, then return it This should be safe as it matches the current usage:

  • we set up the Targets in this workflow (populating SGs, Datasets, Cohorts)
    • at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets)
  • we then set up the Stages, where alignment input hashes are generated
    • at this point, the alignment inputs are fixed
    • all calls to get_alignment_inputs_hash() need to return the same value
def set_alignment_inputs_hash(self):
 92    def set_alignment_inputs_hash(self):
 93        """
 94        Unique hash string of sample alignment inputs. Useful to decide
 95        whether the analysis on the target needs to be rerun.
 96        """
 97        s = ' '.join(
 98            sorted(' '.join(str(s.alignment_input)) for s in self.get_sequencing_groups() if s.alignment_input),
 99        )
100        h = hashlib.sha256(s.encode()).hexdigest()[:38]
101        self.alignment_inputs_hash = f'{h}_{len(self.get_sequencing_group_ids())}'

Unique hash string of sample alignment inputs. Useful to decide whether the analysis on the target needs to be rerun.

target_id: str
103    @property
104    def target_id(self) -> str:
105        """
106        ID should be unique across target of all levels.
107
108        We are raising NotImplementedError instead of making it an abstract class,
109        because mypy is not happy about binding TypeVar to abstract classes, see:
110        https://stackoverflow.com/questions/48349054/how-do-you-annotate-the-type-of
111        -an-abstract-class-with-mypy
112
113        Specifically,
114        ```
115        TypeVar('TargetT', bound=Target)
116        ```
117        Will raise:
118        ```
119        Only concrete class can be given where "Type[Target]" is expected
120        ```
121        """
122        raise NotImplementedError

ID should be unique across target of all levels.

We are raising NotImplementedError instead of making it an abstract class, because mypy is not happy about binding TypeVar to abstract classes, see: https://stackoverflow.com/questions/48349054/how-do-you-annotate-the-type-of -an-abstract-class-with-mypy

Specifically,

TypeVar('TargetT', bound=Target)

Will raise:

Only concrete class can be given where "Type[Target]" is expected
def get_job_attrs(self) -> dict:
124    def get_job_attrs(self) -> dict:
125        """
126        Attributes for Hail Batch job.
127        """
128        raise NotImplementedError

Attributes for Hail Batch job.

def get_job_prefix(self) -> str:
130    def get_job_prefix(self) -> str:
131        """
132        Prefix job names.
133        """
134        raise NotImplementedError

Prefix job names.

def rich_id_map(self) -> dict[str, str]:
136    def rich_id_map(self) -> dict[str, str]:
137        """
138        Map if internal IDs to participant or external IDs, if the latter is provided.
139        """
140        return {s.id: s.rich_id for s in self.get_sequencing_groups() if s.participant_id != s.id}

Map if internal IDs to participant or external IDs, if the latter is provided.