Skip to content

Specimens

Generate snail specimens.

This module handles the generation of snail specimens with the following process: 1. Generate genomes with random mutations 2. Assign initial masses based on whether they have the significant mutation 3. Place specimens randomly on the grid (no two snails in the same cell) 4. Adjust masses based on whether their location is polluted or not

SpecimenParams

Bases: BaseModel

Parameters for specimen generation.

Parameters:

Name Type Description Default
end_date date

End date for specimen collection

required
length int

Length of specimen genomes (must be positive)

required
max_mass float

Maximum mass for specimens (must be positive)

required
min_mass float

Minimum mass for specimens (must be positive and less than max_mass)

required
mut_scale float

Scale factor for mutation effect

required
mutations int

Number of mutations in specimens (must be between 0 and length)

required
number int

Number of specimens to generate (must be positive)

required
seed int

Random seed for reproducibility

required
start_date date

Start date for specimen collection

required
Source code in src/snailz/specimens.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class SpecimenParams(BaseModel):
    """Parameters for specimen generation."""

    end_date: date = Field(description="End date for specimen collection")
    length: int = Field(
        gt=0, description="Length of specimen genomes (must be positive)"
    )
    max_mass: float = Field(
        gt=0, description="Maximum mass for specimens (must be positive)"
    )
    min_mass: float = Field(
        gt=0,
        description="Minimum mass for specimens (must be positive and less than max_mass)",
    )
    mut_scale: float = Field(ge=0, description="Scale factor for mutation effect")
    mutations: int = Field(
        ge=0,
        description="Number of mutations in specimens (must be between 0 and length)",
    )
    number: int = Field(
        gt=0, description="Number of specimens to generate (must be positive)"
    )
    seed: int = Field(ge=0, description="Random seed for reproducibility")
    start_date: date = Field(description="Start date for specimen collection")

    model_config = {"extra": "forbid"}

    @model_validator(mode="after")
    def validate_fields(self):
        """Validate requirements on fields."""
        if self.min_mass >= self.max_mass:
            raise ValueError("max_mass must be greater than min_mass")
        if self.mutations > self.length:
            raise ValueError("mutations must be between 0 and length")
        if self.end_date < self.start_date:
            raise ValueError("end_date must be greater than or equal to start_date")
        return self

validate_fields()

Validate requirements on fields.

Source code in src/snailz/specimens.py
52
53
54
55
56
57
58
59
60
61
@model_validator(mode="after")
def validate_fields(self):
    """Validate requirements on fields."""
    if self.min_mass >= self.max_mass:
        raise ValueError("max_mass must be greater than min_mass")
    if self.mutations > self.length:
        raise ValueError("mutations must be between 0 and length")
    if self.end_date < self.start_date:
        raise ValueError("end_date must be greater than or equal to start_date")
    return self

Specimen

Bases: BaseModel

A single specimen.

Parameters:

Name Type Description Default
ident str

unique identifier

required
collected_on date

date when specimen was collected

required
genome str

bases in genome

required
mass float

snail mass in grams

required
site Point

grid location where specimen was collected

required
Source code in src/snailz/specimens.py
64
65
66
67
68
69
70
71
class Specimen(BaseModel):
    """A single specimen."""

    ident: str = Field(description="unique identifier")
    collected_on: date = Field(description="date when specimen was collected")
    genome: str = Field(description="bases in genome")
    mass: float = Field(gt=0, description="snail mass in grams")
    site: Point = Field(description="grid location where specimen was collected")

AllSpecimens

Bases: BaseModel

A set of generated specimens.

Parameters:

Name Type Description Default
individuals list[Specimen]

list of individual specimens

required
loci list[int]

locations where mutations can occur

required
params SpecimenParams

parameters used to generate this data

required
reference str

unmutated genome

required
susceptible_base str

mutant base that induces mass changes

required
susceptible_locus int

location of mass change mutation

required
Source code in src/snailz/specimens.py
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
class AllSpecimens(BaseModel):
    """A set of generated specimens."""

    individuals: list[Specimen] = Field(description="list of individual specimens")
    loci: list[int] = Field(description="locations where mutations can occur")
    params: SpecimenParams = Field(description="parameters used to generate this data")
    reference: str = Field(description="unmutated genome")
    susceptible_base: str = Field(description="mutant base that induces mass changes")
    susceptible_locus: int = Field(ge=0, description="location of mass change mutation")

    def to_csv(self) -> str:
        """Return a CSV string representation of the specimens data.

        Returns:
            A CSV-formatted string with people data (without parameters)
        """

        output = io.StringIO()
        writer = utils.csv_writer(output)
        writer.writerow(
            ["ident", "x", "y", "genome", "mass", "collected_on"]
        )
        for indiv in self.individuals:
            writer.writerow(
                [
                    indiv.ident,
                    indiv.site.x,
                    indiv.site.y,
                    indiv.genome,
                    indiv.mass,
                    indiv.collected_on,
                ]
            )
        return output.getvalue()

to_csv()

Return a CSV string representation of the specimens data.

Returns:

Type Description
str

A CSV-formatted string with people data (without parameters)

Source code in src/snailz/specimens.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def to_csv(self) -> str:
    """Return a CSV string representation of the specimens data.

    Returns:
        A CSV-formatted string with people data (without parameters)
    """

    output = io.StringIO()
    writer = utils.csv_writer(output)
    writer.writerow(
        ["ident", "x", "y", "genome", "mass", "collected_on"]
    )
    for indiv in self.individuals:
        writer.writerow(
            [
                indiv.ident,
                indiv.site.x,
                indiv.site.y,
                indiv.genome,
                indiv.mass,
                indiv.collected_on,
            ]
        )
    return output.getvalue()

specimens_generate(params, grid=None)

Generate specimens with random genomes and masses.

Each genome is a string of bases of the same length. One locus is randomly chosen as "significant", and a specific mutation there predisposes the snail to mass changes.

The process follows these steps: 1. Generate genomes with random mutations 2. Assign initial masses based on whether they have the significant mutation 3. Place specimens randomly on the grid (no two snails in the same cell) 4. Adjust masses based on location if a grid is provided

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams object

required
grid Grid | None

Grid object to place specimens on for mass mutation

None

Returns:

Type Description
AllSpecimens

AllSpecimens object containing the generated specimens and parameters

Source code in src/snailz/specimens.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
def specimens_generate(
    params: SpecimenParams, grid: Grid | None = None
) -> AllSpecimens:
    """Generate specimens with random genomes and masses.

    Each genome is a string of bases of the same length. One locus is
    randomly chosen as "significant", and a specific mutation there
    predisposes the snail to mass changes.

    The process follows these steps:
    1. Generate genomes with random mutations
    2. Assign initial masses based on whether they have the significant mutation
    3. Place specimens randomly on the grid (no two snails in the same cell)
    4. Adjust masses based on location if a grid is provided

    Parameters:
        params: SpecimenParams object
        grid: Grid object to place specimens on for mass mutation

    Returns:
        AllSpecimens object containing the generated specimens and parameters

    """
    loci = _make_loci(params)
    reference = _make_reference_genome(params)
    susc_loc = _choose_one(loci)
    susc_base = reference[susc_loc]
    genomes = [_make_genome(reference, loci) for i in range(params.number)]
    identifiers = _make_idents(params.number)
    collection_dates = _make_collection_dates(params)
    masses = _make_initial_masses(params, genomes, susc_loc, susc_base)

    individuals = [
        Specimen(genome=g, mass=m, site=Point(), ident=i, collected_on=d)
        for g, m, i, d in zip(genomes, masses, identifiers, collection_dates)
    ]

    result = AllSpecimens(
        individuals=individuals,
        loci=loci,
        params=params,
        reference=reference,
        susceptible_base=susc_base,
        susceptible_locus=susc_loc,
    )

    if grid is not None:
        _place_specimens_on_grid(grid, result)
        _adjust_masses_by_location(grid, result, params.mut_scale)

    return result

_adjust_masses_by_location(grid, specimens, mut_scale, specific_index=None)

Adjust mass based on grid values and genetic susceptibility.

For each specimen, if the cell value is non-zero and the genome is susceptible, modify the mass. Specimens must already have site coordinates assigned by _place_specimens_on_grid().

Parameters:

Name Type Description Default
grid Grid

A Grid object containing pollution values

required
specimens AllSpecimens

An AllSpecimens object with individuals to potentially adjust

required
mut_scale float

Scaling factor for mutation effect

required
specific_index int | None

Optional index to adjust only a specific specimen

None
Source code in src/snailz/specimens.py
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
def _adjust_masses_by_location(
    grid: Grid,
    specimens: AllSpecimens,
    mut_scale: float,
    specific_index: int | None = None,
) -> None:
    """Adjust mass based on grid values and genetic susceptibility.

    For each specimen, if the cell value is non-zero and the genome is
    susceptible, modify the mass. Specimens must already have site
    coordinates assigned by _place_specimens_on_grid().

    Parameters:
        grid: A Grid object containing pollution values
        specimens: An AllSpecimens object with individuals to potentially adjust
        mut_scale: Scaling factor for mutation effect
        specific_index: Optional index to adjust only a specific specimen
    """
    susc_locus = specimens.susceptible_locus
    susc_base = specimens.susceptible_base

    if specific_index is None:
        individuals = specimens.individuals
    else:
        individuals = [specimens.individuals[specific_index]]

    for indiv in individuals:
        assert indiv.site.x is not None and indiv.site.y is not None, "Specimens must be placed on grid first"
        x, y = indiv.site.x, indiv.site.y
        if grid.grid[x][y] > 0 and indiv.genome[susc_locus] == susc_base:
            indiv.mass = _mutate_mass(indiv.mass, mut_scale, grid.grid[x][y])

_choose_one(values)

Choose a single random item from a collection.

Parameters:

Name Type Description Default
values list[int]

A sequence to choose from

required

Returns:

Type Description
int

A randomly selected item from the values sequence

Source code in src/snailz/specimens.py
196
197
198
199
200
201
202
203
204
205
def _choose_one(values: list[int]) -> int:
    """Choose a single random item from a collection.

    Parameters:
        values: A sequence to choose from

    Returns:
        A randomly selected item from the values sequence
    """
    return random.choices(values, k=1)[0]

_choose_other(values, exclude)

Choose a value at random except for the excluded values.

Parameters:

Name Type Description Default
values str

A collection to choose from

required
exclude str

Value or collection of values to exclude from the choice

required

Returns:

Type Description
str

A randomly selected item from values that isn't in exclude

Source code in src/snailz/specimens.py
208
209
210
211
212
213
214
215
216
217
218
219
def _choose_other(values: str, exclude: str) -> str:
    """Choose a value at random except for the excluded values.

    Parameters:
        values: A collection to choose from
        exclude: Value or collection of values to exclude from the choice

    Returns:
        A randomly selected item from values that isn't in exclude
    """
    candidates = list(sorted(set(values) - set(exclude)))
    return candidates[random.randrange(len(candidates))]

_make_collection_dates(params)

Generate random collection dates for specimens.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with start_date, end_date, and number attributes

required

Returns:

Type Description
list[date]

List of randomly generated collection dates between start_date and end_date

Source code in src/snailz/specimens.py
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
def _make_collection_dates(params: SpecimenParams) -> list[date]:
    """Generate random collection dates for specimens.

    Parameters:
        params: SpecimenParams with start_date, end_date, and number attributes

    Returns:
        List of randomly generated collection dates between start_date and end_date
    """
    start_ordinal = params.start_date.toordinal()
    end_ordinal = params.end_date.toordinal()
    return [
        date.fromordinal(random.randint(start_ordinal, end_ordinal))
        for _ in range(params.number)
    ]

_make_genome(reference, loci)

Make an individual genome by mutating the reference genome.

Parameters:

Name Type Description Default
reference str

Reference genome string to base the new genome on

required
loci list[int]

List of positions that can be mutated

required

Returns:

Type Description
str

A new genome string with random mutations at some loci

Source code in src/snailz/specimens.py
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
def _make_genome(reference: str, loci: list[int]) -> str:
    """Make an individual genome by mutating the reference genome.

    Parameters:
        reference: Reference genome string to base the new genome on
        loci: List of positions that can be mutated

    Returns:
        A new genome string with random mutations at some loci
    """
    result = list(reference)
    num_mutations = random.randint(1, len(loci))
    for loc in random.sample(range(len(loci)), num_mutations):
        result[loc] = _choose_other(BASES, reference[loc])
    return "".join(result)

_make_idents(count)

Create unique specimen identifiers.

Each identifier is a 6-character string: - First two characters are the same uppercase letters for all specimens - Remaining four chararacters are random uppercase letters and digits

Parameters:

Name Type Description Default
count int

Number of identifiers to generate

required

Returns:

Type Description
list[str]

List of unique specimen identifiers

Source code in src/snailz/specimens.py
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
def _make_idents(count: int) -> list[str]:
    """Create unique specimen identifiers.

    Each identifier is a 6-character string:
    - First two characters are the same uppercase letters for all specimens
    - Remaining four chararacters are random uppercase letters and digits

    Parameters:
        count: Number of identifiers to generate

    Returns:
        List of unique specimen identifiers
    """
    prefix = "".join(random.choices(string.ascii_uppercase, k=2))
    chars = string.ascii_uppercase + string.digits
    gen = utils.UniqueIdGenerator(
        "specimens", lambda: f"{prefix}{''.join(random.choices(chars, k=4))}"
    )
    return [gen.next() for _ in range(count)]

_make_initial_masses(params, genomes, susceptible_locus, susceptible_base)

Generate initial masses for specimens based on significant mutation.

Specimens with the susceptible base at the susceptible locus are given a higher initial mass range compared to non-susceptible specimens.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with min_mass and max_mass attributes

required
genomes list[str]

List of genome strings

required
susceptible_locus int

Position that determines susceptibility

required
susceptible_base str

Base that makes a specimen susceptible

required

Returns:

Type Description
list[float]

List of generated mass values between min_mass and max_mass,

list[float]

rounded to PRECISION decimal places

Source code in src/snailz/specimens.py
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
def _make_initial_masses(
    params: SpecimenParams,
    genomes: list[str],
    susceptible_locus: int,
    susceptible_base: str,
) -> list[float]:
    """Generate initial masses for specimens based on significant mutation.

    Specimens with the susceptible base at the susceptible locus are given
    a higher initial mass range compared to non-susceptible specimens.

    Parameters:
        params: SpecimenParams with min_mass and max_mass attributes
        genomes: List of genome strings
        susceptible_locus: Position that determines susceptibility
        susceptible_base: Base that makes a specimen susceptible

    Returns:
        List of generated mass values between min_mass and max_mass,
        rounded to PRECISION decimal places
    """
    # Calculate mass range midpoint
    midpoint = (params.max_mass + params.min_mass) / 2

    # Create masses based on susceptibility
    masses = []
    for genome in genomes:
        if genome[susceptible_locus] == susceptible_base:
            # Susceptible specimens get higher mass range
            mass = round(
                random.uniform(midpoint, params.max_mass),
                utils.PRECISION
            )
        else:
            # Non-susceptible specimens get lower mass range
            mass = round(
                random.uniform(params.min_mass, midpoint),
                utils.PRECISION
            )
        masses.append(mass)

    return masses

_make_locations(size, num)

Generate non-overlapping locations for specimens.

Selects random locations from the grid, ensuring no two specimens are placed in the same cell. This implements the requirement that no two snails may be placed in the same cell.

Parameters:

Name Type Description Default
size int

Size of the grid (assuming square grid)

required
num int

Number of locations to generate

required

Returns:

Type Description
list[tuple[int, int]]

List of (x, y) coordinate tuples

Raises:

Type Description
ValueError

If there are not enough cells to place all specimens

Source code in src/snailz/specimens.py
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
def _make_locations(size: int, num: int) -> list[tuple[int, int]]:
    """Generate non-overlapping locations for specimens.

    Selects random locations from the grid, ensuring no two specimens
    are placed in the same cell. This implements the requirement that
    no two snails may be placed in the same cell.

    Parameters:
        size: Size of the grid (assuming square grid)
        num: Number of locations to generate

    Returns:
        List of (x, y) coordinate tuples

    Raises:
        ValueError: If there are not enough cells to place all specimens
    """
    if num > size * size:
        utils.fail(f"Cannot place {num} specimens on a {size}x{size} grid")

    # Create all possible grid locations
    all_locations = [(x, y) for x in range(size) for y in range(size)]

    # Select locations randomly without replacement
    chosen_locations = random.sample(all_locations, num)

    return chosen_locations

_make_loci(params)

Make a list of mutable loci positions.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with length and mutations attributes

required

Returns:

Type Description
list[int]

A list of unique randomly selected positions that can be mutated

Source code in src/snailz/specimens.py
350
351
352
353
354
355
356
357
358
359
def _make_loci(params: SpecimenParams) -> list[int]:
    """Make a list of mutable loci positions.

    Parameters:
        params: SpecimenParams with length and mutations attributes

    Returns:
        A list of unique randomly selected positions that can be mutated
    """
    return random.sample(list(range(params.length)), params.mutations)

_make_reference_genome(params)

Make a random reference genome.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with length attribute

required

Returns:

Type Description
str

A randomly generated genome string of the specified length

Source code in src/snailz/specimens.py
362
363
364
365
366
367
368
369
370
371
def _make_reference_genome(params: SpecimenParams) -> str:
    """Make a random reference genome.

    Parameters:
        params: SpecimenParams with length attribute

    Returns:
        A randomly generated genome string of the specified length
    """
    return "".join(random.choices(BASES, k=params.length))

_mutate_mass(original, mut_scale, cell_value)

Mutate a single specimen's mass.

Parameters:

Name Type Description Default
original float

The original mass value

required
mut_scale float

Scaling factor for mutation effect

required
cell_value int

The grid cell value affecting the mutation

required

Returns:

Type Description
float

The mutated mass value, rounded to PRECISION decimal places

Source code in src/snailz/specimens.py
374
375
376
377
378
379
380
381
382
383
384
385
def _mutate_mass(original: float, mut_scale: float, cell_value: int) -> float:
    """Mutate a single specimen's mass.

    Parameters:
        original: The original mass value
        mut_scale: Scaling factor for mutation effect
        cell_value: The grid cell value affecting the mutation

    Returns:
        The mutated mass value, rounded to PRECISION decimal places
    """
    return round(original * (1 + (mut_scale * cell_value)), utils.PRECISION)

_place_specimens_on_grid(grid, specimens)

Place specimens randomly on the grid, ensuring no two share the same cell.

Updates the site coordinates for each specimen.

Parameters:

Name Type Description Default
grid Grid

A Grid object containing pollution values

required
specimens AllSpecimens

An AllSpecimens object with individuals to place on the grid

required
Source code in src/snailz/specimens.py
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
def _place_specimens_on_grid(
    grid: Grid,
    specimens: AllSpecimens,
) -> None:
    """Place specimens randomly on the grid, ensuring no two share the same cell.

    Updates the site coordinates for each specimen.

    Parameters:
        grid: A Grid object containing pollution values
        specimens: An AllSpecimens object with individuals to place on the grid
    """
    grid_size = len(grid.grid)
    locations = _make_locations(grid_size, len(specimens.individuals))

    for indiv, (x, y) in zip(specimens.individuals, locations):
        indiv.site.x = x
        indiv.site.y = y