cpg_flow.targets.target
This module defines the Target
class, which represents a target that a stage can act upon.
The Target
class includes methods to retrieve sequencing groups and their IDs, compute a hash
for alignment inputs, and provide job attributes and prefixes. It also includes a property to
retrieve a unique target ID and a method to map internal IDs to participant or external IDs.
Classes: Target: Defines a target that a stage can act upon.
Methods: get_sequencing_groups(only_active: bool = True) -> list["SequencingGroup"]: Get a flat list of all sequencing groups corresponding to this target.
get_sequencing_group_ids(only_active: bool = True) -> list[str]:
Get a flat list of all sequencing group IDs corresponding to this target.
get_alignment_inputs_hash() -> str:
Compute a hash for the alignment inputs of the sequencing groups.
target_id() -> str:
Property to retrieve a unique target ID.
get_job_attrs() -> dict:
Retrieve attributes for Hail Batch job.
get_job_prefix() -> str:
Retrieve prefix for job names.
rich_id_map() -> dict[str, str]:
Map internal IDs to participant or external IDs, if the latter is provided.
Targets for workflow stages: SequencingGroup, Dataset, Cohort.
1""" 2 3This module defines the `Target` class, which represents a target that a stage can act upon. 4The `Target` class includes methods to retrieve sequencing groups and their IDs, compute a hash 5for alignment inputs, and provide job attributes and prefixes. It also includes a property to 6retrieve a unique target ID and a method to map internal IDs to participant or external IDs. 7 8Classes: 9 Target: Defines a target that a stage can act upon. 10 11Methods: 12 get_sequencing_groups(only_active: bool = True) -> list["SequencingGroup"]: 13 Get a flat list of all sequencing groups corresponding to this target. 14 15 get_sequencing_group_ids(only_active: bool = True) -> list[str]: 16 Get a flat list of all sequencing group IDs corresponding to this target. 17 18 get_alignment_inputs_hash() -> str: 19 Compute a hash for the alignment inputs of the sequencing groups. 20 21 target_id() -> str: 22 Property to retrieve a unique target ID. 23 24 get_job_attrs() -> dict: 25 Retrieve attributes for Hail Batch job. 26 27 get_job_prefix() -> str: 28 Retrieve prefix for job names. 29 30 rich_id_map() -> dict[str, str]: 31 Map internal IDs to participant or external IDs, if the latter is provided. 32 33Targets for workflow stages: SequencingGroup, Dataset, Cohort. 34 35""" 36 37import hashlib 38from typing import TYPE_CHECKING 39 40if TYPE_CHECKING: 41 from cpg_flow.targets import SequencingGroup 42 43 44class Target: 45 """ 46 Defines a target that a stage can act upon. 47 """ 48 49 def __init__(self) -> None: 50 # Whether to process even if outputs exist: 51 self.forced: bool = False 52 53 # If not set, exclude from the workflow: 54 self.active: bool = True 55 56 # create a self.alignment_inputs_hash variable to store the hash of the alignment inputs 57 # this begins as None, and is set upon first calling 58 self.alignment_inputs_hash: str | None = None 59 60 def get_sequencing_groups( 61 self, 62 only_active: bool = True, 63 ) -> list['SequencingGroup']: 64 """ 65 Get flat list of all sequencing groups corresponding to this target. 66 """ 67 raise NotImplementedError 68 69 def get_sequencing_group_ids(self, only_active: bool = True) -> list[str]: 70 """ 71 Get flat list of all sequencing group IDs corresponding to this target. 72 """ 73 return [s.id for s in self.get_sequencing_groups(only_active=only_active)] 74 75 def get_alignment_inputs_hash(self) -> str: 76 """ 77 If this hash has been set, return it, otherwise set it, then return it 78 This should be safe as it matches the current usage: 79 - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts) 80 - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets) 81 - we then set up the Stages, where alignment input hashes are generated 82 - at this point, the alignment inputs are fixed 83 - all calls to get_alignment_inputs_hash() need to return the same value 84 """ 85 if self.alignment_inputs_hash is None: 86 self.set_alignment_inputs_hash() 87 if self.alignment_inputs_hash is None: 88 raise TypeError('Alignment_inputs_hash was not populated by the setter method') 89 return self.alignment_inputs_hash 90 91 def set_alignment_inputs_hash(self): 92 """ 93 Unique hash string of sample alignment inputs. Useful to decide 94 whether the analysis on the target needs to be rerun. 95 """ 96 s = ' '.join( 97 sorted(' '.join(str(s.alignment_input)) for s in self.get_sequencing_groups() if s.alignment_input), 98 ) 99 h = hashlib.sha256(s.encode()).hexdigest()[:38] 100 self.alignment_inputs_hash = f'{h}_{len(self.get_sequencing_group_ids())}' 101 102 @property 103 def target_id(self) -> str: 104 """ 105 ID should be unique across target of all levels. 106 107 We are raising NotImplementedError instead of making it an abstract class, 108 because mypy is not happy about binding TypeVar to abstract classes, see: 109 https://stackoverflow.com/questions/48349054/how-do-you-annotate-the-type-of 110 -an-abstract-class-with-mypy 111 112 Specifically, 113 ``` 114 TypeVar('TargetT', bound=Target) 115 ``` 116 Will raise: 117 ``` 118 Only concrete class can be given where "Type[Target]" is expected 119 ``` 120 """ 121 raise NotImplementedError 122 123 def get_job_attrs(self) -> dict: 124 """ 125 Attributes for Hail Batch job. 126 """ 127 raise NotImplementedError 128 129 def get_job_prefix(self) -> str: 130 """ 131 Prefix job names. 132 """ 133 raise NotImplementedError 134 135 def rich_id_map(self) -> dict[str, str]: 136 """ 137 Map if internal IDs to participant or external IDs, if the latter is provided. 138 """ 139 return {s.id: s.rich_id for s in self.get_sequencing_groups() if s.participant_id != s.id}
45class Target: 46 """ 47 Defines a target that a stage can act upon. 48 """ 49 50 def __init__(self) -> None: 51 # Whether to process even if outputs exist: 52 self.forced: bool = False 53 54 # If not set, exclude from the workflow: 55 self.active: bool = True 56 57 # create a self.alignment_inputs_hash variable to store the hash of the alignment inputs 58 # this begins as None, and is set upon first calling 59 self.alignment_inputs_hash: str | None = None 60 61 def get_sequencing_groups( 62 self, 63 only_active: bool = True, 64 ) -> list['SequencingGroup']: 65 """ 66 Get flat list of all sequencing groups corresponding to this target. 67 """ 68 raise NotImplementedError 69 70 def get_sequencing_group_ids(self, only_active: bool = True) -> list[str]: 71 """ 72 Get flat list of all sequencing group IDs corresponding to this target. 73 """ 74 return [s.id for s in self.get_sequencing_groups(only_active=only_active)] 75 76 def get_alignment_inputs_hash(self) -> str: 77 """ 78 If this hash has been set, return it, otherwise set it, then return it 79 This should be safe as it matches the current usage: 80 - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts) 81 - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets) 82 - we then set up the Stages, where alignment input hashes are generated 83 - at this point, the alignment inputs are fixed 84 - all calls to get_alignment_inputs_hash() need to return the same value 85 """ 86 if self.alignment_inputs_hash is None: 87 self.set_alignment_inputs_hash() 88 if self.alignment_inputs_hash is None: 89 raise TypeError('Alignment_inputs_hash was not populated by the setter method') 90 return self.alignment_inputs_hash 91 92 def set_alignment_inputs_hash(self): 93 """ 94 Unique hash string of sample alignment inputs. Useful to decide 95 whether the analysis on the target needs to be rerun. 96 """ 97 s = ' '.join( 98 sorted(' '.join(str(s.alignment_input)) for s in self.get_sequencing_groups() if s.alignment_input), 99 ) 100 h = hashlib.sha256(s.encode()).hexdigest()[:38] 101 self.alignment_inputs_hash = f'{h}_{len(self.get_sequencing_group_ids())}' 102 103 @property 104 def target_id(self) -> str: 105 """ 106 ID should be unique across target of all levels. 107 108 We are raising NotImplementedError instead of making it an abstract class, 109 because mypy is not happy about binding TypeVar to abstract classes, see: 110 https://stackoverflow.com/questions/48349054/how-do-you-annotate-the-type-of 111 -an-abstract-class-with-mypy 112 113 Specifically, 114 ``` 115 TypeVar('TargetT', bound=Target) 116 ``` 117 Will raise: 118 ``` 119 Only concrete class can be given where "Type[Target]" is expected 120 ``` 121 """ 122 raise NotImplementedError 123 124 def get_job_attrs(self) -> dict: 125 """ 126 Attributes for Hail Batch job. 127 """ 128 raise NotImplementedError 129 130 def get_job_prefix(self) -> str: 131 """ 132 Prefix job names. 133 """ 134 raise NotImplementedError 135 136 def rich_id_map(self) -> dict[str, str]: 137 """ 138 Map if internal IDs to participant or external IDs, if the latter is provided. 139 """ 140 return {s.id: s.rich_id for s in self.get_sequencing_groups() if s.participant_id != s.id}
Defines a target that a stage can act upon.
61 def get_sequencing_groups( 62 self, 63 only_active: bool = True, 64 ) -> list['SequencingGroup']: 65 """ 66 Get flat list of all sequencing groups corresponding to this target. 67 """ 68 raise NotImplementedError
Get flat list of all sequencing groups corresponding to this target.
70 def get_sequencing_group_ids(self, only_active: bool = True) -> list[str]: 71 """ 72 Get flat list of all sequencing group IDs corresponding to this target. 73 """ 74 return [s.id for s in self.get_sequencing_groups(only_active=only_active)]
Get flat list of all sequencing group IDs corresponding to this target.
76 def get_alignment_inputs_hash(self) -> str: 77 """ 78 If this hash has been set, return it, otherwise set it, then return it 79 This should be safe as it matches the current usage: 80 - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts) 81 - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets) 82 - we then set up the Stages, where alignment input hashes are generated 83 - at this point, the alignment inputs are fixed 84 - all calls to get_alignment_inputs_hash() need to return the same value 85 """ 86 if self.alignment_inputs_hash is None: 87 self.set_alignment_inputs_hash() 88 if self.alignment_inputs_hash is None: 89 raise TypeError('Alignment_inputs_hash was not populated by the setter method') 90 return self.alignment_inputs_hash
If this hash has been set, return it, otherwise set it, then return it This should be safe as it matches the current usage:
- we set up the Targets in this workflow (populating SGs, Datasets, Cohorts)
- at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets)
- we then set up the Stages, where alignment input hashes are generated
- at this point, the alignment inputs are fixed
- all calls to get_alignment_inputs_hash() need to return the same value
92 def set_alignment_inputs_hash(self): 93 """ 94 Unique hash string of sample alignment inputs. Useful to decide 95 whether the analysis on the target needs to be rerun. 96 """ 97 s = ' '.join( 98 sorted(' '.join(str(s.alignment_input)) for s in self.get_sequencing_groups() if s.alignment_input), 99 ) 100 h = hashlib.sha256(s.encode()).hexdigest()[:38] 101 self.alignment_inputs_hash = f'{h}_{len(self.get_sequencing_group_ids())}'
Unique hash string of sample alignment inputs. Useful to decide whether the analysis on the target needs to be rerun.
103 @property 104 def target_id(self) -> str: 105 """ 106 ID should be unique across target of all levels. 107 108 We are raising NotImplementedError instead of making it an abstract class, 109 because mypy is not happy about binding TypeVar to abstract classes, see: 110 https://stackoverflow.com/questions/48349054/how-do-you-annotate-the-type-of 111 -an-abstract-class-with-mypy 112 113 Specifically, 114 ``` 115 TypeVar('TargetT', bound=Target) 116 ``` 117 Will raise: 118 ``` 119 Only concrete class can be given where "Type[Target]" is expected 120 ``` 121 """ 122 raise NotImplementedError
ID should be unique across target of all levels.
We are raising NotImplementedError instead of making it an abstract class, because mypy is not happy about binding TypeVar to abstract classes, see: https://stackoverflow.com/questions/48349054/how-do-you-annotate-the-type-of -an-abstract-class-with-mypy
Specifically,
TypeVar('TargetT', bound=Target)
Will raise:
Only concrete class can be given where "Type[Target]" is expected
124 def get_job_attrs(self) -> dict: 125 """ 126 Attributes for Hail Batch job. 127 """ 128 raise NotImplementedError
Attributes for Hail Batch job.
130 def get_job_prefix(self) -> str: 131 """ 132 Prefix job names. 133 """ 134 raise NotImplementedError
Prefix job names.
136 def rich_id_map(self) -> dict[str, str]: 137 """ 138 Map if internal IDs to participant or external IDs, if the latter is provided. 139 """ 140 return {s.id: s.rich_id for s in self.get_sequencing_groups() if s.participant_id != s.id}
Map if internal IDs to participant or external IDs, if the latter is provided.