Skip to content

Specimens

Generate snail specimens.

SpecimenParams

Bases: BaseModel

Parameters for specimen generation.

  • length: Length of specimen genomes (must be positive)
  • max_mass: Maximum mass for specimens (must be positive)
  • min_mass: Minimum mass for specimens (must be positive and less than max_mass)
  • mut_scale: Scale factor for mutation effect
  • mutations: Number of mutations in specimens (must be between 0 and length)
  • number: Number of specimens to generate (must be positive)
  • seed: Random seed for reproducibility
Source code in src/snailz/specimens.py
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
class SpecimenParams(BaseModel):
    """Parameters for specimen generation.

    - length: Length of specimen genomes (must be positive)
    - max_mass: Maximum mass for specimens (must be positive)
    - min_mass: Minimum mass for specimens (must be positive and less than max_mass)
    - mut_scale: Scale factor for mutation effect
    - mutations: Number of mutations in specimens (must be between 0 and length)
    - number: Number of specimens to generate (must be positive)
    - seed: Random seed for reproducibility
    """

    length: int = Field(gt=0)
    max_mass: float = Field(gt=0)
    min_mass: float = Field(gt=0)
    mut_scale: float = Field()
    mutations: int = Field(ge=0)
    number: int = Field(gt=0)
    seed: int = Field()

    @model_validator(mode="after")
    def validate_mass_range(self):
        """Validate that max_mass is greater than min_mass."""
        if self.min_mass >= self.max_mass:
            raise ValueError("max_mass must be greater than min_mass")
        return self

    @model_validator(mode="after")
    def validate_mutations_range(self):
        """Validate that mutations is not greater than genome length."""
        if self.mutations > self.length:
            raise ValueError("mutations must be between 0 and length")
        return self

    model_config = {"extra": "forbid"}

validate_mass_range()

Validate that max_mass is greater than min_mass.

Source code in src/snailz/specimens.py
37
38
39
40
41
42
@model_validator(mode="after")
def validate_mass_range(self):
    """Validate that max_mass is greater than min_mass."""
    if self.min_mass >= self.max_mass:
        raise ValueError("max_mass must be greater than min_mass")
    return self

validate_mutations_range()

Validate that mutations is not greater than genome length.

Source code in src/snailz/specimens.py
44
45
46
47
48
49
@model_validator(mode="after")
def validate_mutations_range(self):
    """Validate that mutations is not greater than genome length."""
    if self.mutations > self.length:
        raise ValueError("mutations must be between 0 and length")
    return self

Point

Bases: BaseModel

A 2D point with x and y coordinates.

  • x: X coordinate in grid
  • y: Y coordinate in grid
Source code in src/snailz/specimens.py
54
55
56
57
58
59
60
61
62
class Point(BaseModel):
    """A 2D point with x and y coordinates.

    - x: X coordinate in grid
    - y: Y coordinate in grid
    """

    x: int | None = None
    y: int | None = None

Individual

Bases: BaseModel

A single specimen with unique identifier, genome, mass, and site location.

  • genome: bases in genome
  • ident: unique identifier
  • mass: snail mass in grams
  • site: grid location where specimen was collected
Source code in src/snailz/specimens.py
65
66
67
68
69
70
71
72
73
74
75
76
77
class Individual(BaseModel):
    """A single specimen with unique identifier, genome, mass, and site location.

    - genome: bases in genome
    - ident: unique identifier
    - mass: snail mass in grams
    - site: grid location where specimen was collected
    """

    genome: str
    ident: str
    mass: float
    site: Point

Specimens

Bases: BaseModel

A set of generated specimens.

  • individuals: list of individual specimens
  • loci: locations where mutations can occur
  • params: parameters used to generate this data
  • reference: unmutated genome
  • susceptible_base: mutant base that induces mass changes
  • susceptible_locus: location of mass change mutation
Source code in src/snailz/specimens.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
class Specimens(BaseModel):
    """A set of generated specimens.

    - individuals: list of individual specimens
    - loci: locations where mutations can occur
    - params: parameters used to generate this data
    - reference: unmutated genome
    - susceptible_base: mutant base that induces mass changes
    - susceptible_locus: location of mass change mutation
    """

    individuals: list[Individual]
    loci: list[int]
    params: SpecimenParams
    reference: str
    susceptible_base: str
    susceptible_locus: int

    def to_csv(self) -> str:
        """Return a CSV string representation of the specimens data.

        Returns:
            A CSV-formatted string containing specimen data with fields:
            - ident: specimen identifier
            - x: X coordinate in grid
            - y: Y coordinate in grid
            - genome: bases in genome
            - mass: snail mass in grams
        """
        output = io.StringIO()
        writer = csv.writer(output, **utils.CSV_SETTINGS)
        writer.writerow(["ident", "x", "y", "genome", "mass"])
        for indiv in self.individuals:
            writer.writerow(
                [indiv.ident, indiv.site.x, indiv.site.y, indiv.genome, indiv.mass]
            )
        return output.getvalue()

to_csv()

Return a CSV string representation of the specimens data.

Returns:

Type Description
str

A CSV-formatted string containing specimen data with fields:

str
  • ident: specimen identifier
str
  • x: X coordinate in grid
str
  • y: Y coordinate in grid
str
  • genome: bases in genome
str
  • mass: snail mass in grams
Source code in src/snailz/specimens.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def to_csv(self) -> str:
    """Return a CSV string representation of the specimens data.

    Returns:
        A CSV-formatted string containing specimen data with fields:
        - ident: specimen identifier
        - x: X coordinate in grid
        - y: Y coordinate in grid
        - genome: bases in genome
        - mass: snail mass in grams
    """
    output = io.StringIO()
    writer = csv.writer(output, **utils.CSV_SETTINGS)
    writer.writerow(["ident", "x", "y", "genome", "mass"])
    for indiv in self.individuals:
        writer.writerow(
            [indiv.ident, indiv.site.x, indiv.site.y, indiv.genome, indiv.mass]
        )
    return output.getvalue()

specimens_generate(params, grid=None)

Generate specimens with random genomes and masses.

Each genome is a string of bases of the same length. One locus is randomly chosen as "significant", and a specific mutation there predisposes the snail to mass changes. Other mutations are added randomly at other loci. Specimen masses are only mutated if a grid is provided.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams object

required
grid Grid

Grid object to place specimens on for mass mutation

None

Returns:

Type Description
Specimens

Specimens object containing the generated specimens and parameters

Source code in src/snailz/specimens.py
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
def specimens_generate(params: SpecimenParams, grid: Grid = None) -> Specimens:
    """Generate specimens with random genomes and masses.

    Each genome is a string of bases of the same length. One locus is
    randomly chosen as "significant", and a specific mutation there
    predisposes the snail to mass changes. Other mutations are added
    randomly at other loci.  Specimen masses are only mutated if a
    grid is provided.

    Parameters:
        params: SpecimenParams object
        grid: Grid object to place specimens on for mass mutation

    Returns:
        Specimens object containing the generated specimens and parameters

    """
    loci = _make_loci(params)
    reference = _make_reference_genome(params)
    susc_loc = _choose_one(loci)
    susc_base = reference[susc_loc]
    genomes = [_make_genome(reference, loci) for i in range(params.number)]
    masses = _make_masses(params, genomes, susc_loc, susc_base)
    identifiers = _make_idents(params.number)

    individuals = [
        Individual(genome=g, mass=m, site=Point(), ident=i)
        for g, m, i in zip(genomes, masses, identifiers)
    ]

    result = Specimens(
        individuals=individuals,
        loci=loci,
        params=params,
        reference=reference,
        susceptible_base=susc_base,
        susceptible_locus=susc_loc,
    )

    if grid is not None:
        mutate_masses(grid, result, params.mut_scale)

    return result

mutate_masses(grid, specimens, mut_scale, specific_index=None)

Mutate mass based on grid values and genetic susceptibility.

For each specimen, choose a random cell from the grid and modify the mass if the cell's value is non-zero and the genome is susceptible. Records the chosen site coordinates for each specimen regardless of whether mutation occurs. Modifies specimen masses in-place for susceptible individuals; updates site coordinates for all individuals.

Parameters:

Name Type Description Default
grid Grid

A Grid object containing pollution values

required
specimens Specimens

A Specimens object with individuals to potentially mutate

required
mut_scale float

Scaling factor for mutation effect

required
specific_index int | None

Optional index to mutate only a specific specimen

None
Source code in src/snailz/specimens.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
def mutate_masses(
    grid: Grid,
    specimens: Specimens,
    mut_scale: float,
    specific_index: int | None = None,
) -> None:
    """Mutate mass based on grid values and genetic susceptibility.

    For each specimen, choose a random cell from the grid and modify
    the mass if the cell's value is non-zero and the genome is
    susceptible. Records the chosen site coordinates for each specimen
    regardless of whether mutation occurs.  Modifies specimen masses
    in-place for susceptible individuals; updates site coordinates for
    all individuals.

    Parameters:
        grid: A Grid object containing pollution values
        specimens: A Specimens object with individuals to potentially mutate
        mut_scale: Scaling factor for mutation effect
        specific_index: Optional index to mutate only a specific specimen
    """
    grid_size = len(grid.grid)
    susc_locus = specimens.susceptible_locus
    susc_base = specimens.susceptible_base

    if specific_index is None:
        individuals = specimens.individuals
    else:
        individuals = [specimens.individuals[specific_index]]

    for i in individuals:
        x = random.randrange(grid_size)
        y = random.randrange(grid_size)
        i.site.x = x
        i.site.y = y
        if grid.grid[x][y] > 0 and i.genome[susc_locus] == susc_base:
            i.mass = mutate_mass(i.mass, mut_scale, grid.grid[x][y])

mutate_mass(original, mut_scale, cell_value)

Mutate a single specimen's mass.

Parameters:

Name Type Description Default
original float

The original mass value

required
mut_scale float

Scaling factor for mutation effect

required
cell_value int

The grid cell value affecting the mutation

required

Returns:

Type Description
float

The mutated mass value, rounded to PRECISION decimal places

Source code in src/snailz/specimens.py
203
204
205
206
207
208
209
210
211
212
213
214
def mutate_mass(original: float, mut_scale: float, cell_value: int) -> float:
    """Mutate a single specimen's mass.

    Parameters:
        original: The original mass value
        mut_scale: Scaling factor for mutation effect
        cell_value: The grid cell value affecting the mutation

    Returns:
        The mutated mass value, rounded to PRECISION decimal places
    """
    return round(original * (1 + (mut_scale * cell_value)), utils.PRECISION)

_choose_one(values)

Choose a single random item from a collection.

Parameters:

Name Type Description Default
values list[int]

A sequence to choose from

required

Returns:

Type Description
int

A randomly selected item from the values sequence

Source code in src/snailz/specimens.py
217
218
219
220
221
222
223
224
225
226
def _choose_one(values: list[int]) -> int:
    """Choose a single random item from a collection.

    Parameters:
        values: A sequence to choose from

    Returns:
        A randomly selected item from the values sequence
    """
    return random.choices(values, k=1)[0]

_choose_other(values, exclude)

Choose a value at random except for the excluded values.

Parameters:

Name Type Description Default
values str

A collection to choose from

required
exclude str

Value or collection of values to exclude from the choice

required

Returns:

Type Description
str

A randomly selected item from values that isn't in exclude

Source code in src/snailz/specimens.py
229
230
231
232
233
234
235
236
237
238
239
240
def _choose_other(values: str, exclude: str) -> str:
    """Choose a value at random except for the excluded values.

    Parameters:
        values: A collection to choose from
        exclude: Value or collection of values to exclude from the choice

    Returns:
        A randomly selected item from values that isn't in exclude
    """
    candidates = list(sorted(set(values) - set(exclude)))
    return candidates[random.randrange(len(candidates))]

_make_genome(reference, loci)

Make an individual genome by mutating the reference genome.

Parameters:

Name Type Description Default
reference str

Reference genome string to base the new genome on

required
loci list[int]

List of positions that can be mutated

required

Returns:

Type Description
str

A new genome string with random mutations at some loci

Source code in src/snailz/specimens.py
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
def _make_genome(reference: str, loci: list[int]) -> str:
    """Make an individual genome by mutating the reference genome.

    Parameters:
        reference: Reference genome string to base the new genome on
        loci: List of positions that can be mutated

    Returns:
        A new genome string with random mutations at some loci
    """
    result = list(reference)
    num_mutations = random.randint(1, len(loci))
    for loc in random.sample(range(len(loci)), num_mutations):
        result[loc] = _choose_other(BASES, reference[loc])
    return "".join(result)

_make_idents(count)

Create unique specimen identifiers.

Each identifier is a 6-character string: - First two characters are the same uppercase letters for all specimens - Remaining four chararacters are random uppercase letters and digits

Parameters:

Name Type Description Default
count int

Number of identifiers to generate

required

Returns:

Type Description
list[str]

List of unique specimen identifiers

Source code in src/snailz/specimens.py
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
def _make_idents(count: int) -> list[str]:
    """Create unique specimen identifiers.

    Each identifier is a 6-character string:
    - First two characters are the same uppercase letters for all specimens
    - Remaining four chararacters are random uppercase letters and digits

    Parameters:
        count: Number of identifiers to generate

    Returns:
        List of unique specimen identifiers
    """
    prefix = "".join(random.choices(string.ascii_uppercase, k=2))
    chars = string.ascii_uppercase + string.digits
    gen = utils.UniqueIdGenerator(
        "specimens", lambda: f"{prefix}{''.join(random.choices(chars, k=4))}"
    )
    return [gen.next() for _ in range(count)]

_make_loci(params)

Make a list of mutable loci positions.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with length and mutations attributes

required

Returns:

Type Description
list[int]

A list of unique randomly selected positions that can be mutated

Source code in src/snailz/specimens.py
281
282
283
284
285
286
287
288
289
290
def _make_loci(params: SpecimenParams) -> list[int]:
    """Make a list of mutable loci positions.

    Parameters:
        params: SpecimenParams with length and mutations attributes

    Returns:
        A list of unique randomly selected positions that can be mutated
    """
    return random.sample(list(range(params.length)), params.mutations)

_make_masses(params, genomes, susceptible_locus, susceptible_base)

Generate random masses for specimens.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with min_mass and max_mass attributes

required
genomes list[str]

List of genome strings

required
susceptible_locus int

Position that determines susceptibility

required
susceptible_base str

Base that makes a specimen susceptible

required

Returns:

Type Description
list[float]

List of randomly generated mass values between min_mass and max_mass,

list[float]

rounded to PRECISION decimal places

Source code in src/snailz/specimens.py
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
def _make_masses(
    params: SpecimenParams,
    genomes: list[str],
    susceptible_locus: int,
    susceptible_base: str,
) -> list[float]:
    """Generate random masses for specimens.

    Parameters:
        params: SpecimenParams with min_mass and max_mass attributes
        genomes: List of genome strings
        susceptible_locus: Position that determines susceptibility
        susceptible_base: Base that makes a specimen susceptible

    Returns:
        List of randomly generated mass values between min_mass and max_mass,
        rounded to PRECISION decimal places
    """
    return [
        round(random.uniform(params.min_mass, params.max_mass), utils.PRECISION)
        for _ in genomes
    ]

_make_reference_genome(params)

Make a random reference genome.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with length attribute

required

Returns:

Type Description
str

A randomly generated genome string of the specified length

Source code in src/snailz/specimens.py
317
318
319
320
321
322
323
324
325
326
def _make_reference_genome(params: SpecimenParams) -> str:
    """Make a random reference genome.

    Parameters:
        params: SpecimenParams with length attribute

    Returns:
        A randomly generated genome string of the specified length
    """
    return "".join(random.choices(BASES, k=params.length))