Skip to content

Specimens

Generate snail specimens.

SpecimenParams

Bases: BaseModel

Parameters for specimen generation.

Parameters:

Name Type Description Default
end_date date

End date for specimen collection

required
length int

Length of specimen genomes (must be positive)

required
max_mass float

Maximum mass for specimens (must be positive)

required
min_mass float

Minimum mass for specimens (must be positive and less than max_mass)

required
mut_scale float

Scale factor for mutation effect

required
mutations int

Number of mutations in specimens (must be between 0 and length)

required
number int

Number of specimens to generate (must be positive)

required
seed int

Random seed for reproducibility

required
start_date date

Start date for specimen collection

required
Source code in src/snailz/specimens.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class SpecimenParams(BaseModel):
    """Parameters for specimen generation."""

    end_date: date = Field(description="End date for specimen collection")
    length: int = Field(
        gt=0, description="Length of specimen genomes (must be positive)"
    )
    max_mass: float = Field(
        gt=0, description="Maximum mass for specimens (must be positive)"
    )
    min_mass: float = Field(
        gt=0,
        description="Minimum mass for specimens (must be positive and less than max_mass)",
    )
    mut_scale: float = Field(ge=0, description="Scale factor for mutation effect")
    mutations: int = Field(
        ge=0,
        description="Number of mutations in specimens (must be between 0 and length)",
    )
    number: int = Field(
        gt=0, description="Number of specimens to generate (must be positive)"
    )
    seed: int = Field(ge=0, description="Random seed for reproducibility")
    start_date: date = Field(description="Start date for specimen collection")

    model_config = {"extra": "forbid"}

    @model_validator(mode="after")
    def validate_fields(self):
        """Validate requirements on fields."""
        if self.min_mass >= self.max_mass:
            raise ValueError("max_mass must be greater than min_mass")
        if self.mutations > self.length:
            raise ValueError("mutations must be between 0 and length")
        if self.end_date < self.start_date:
            raise ValueError("end_date must be greater than or equal to start_date")
        return self

validate_fields()

Validate requirements on fields.

Source code in src/snailz/specimens.py
45
46
47
48
49
50
51
52
53
54
@model_validator(mode="after")
def validate_fields(self):
    """Validate requirements on fields."""
    if self.min_mass >= self.max_mass:
        raise ValueError("max_mass must be greater than min_mass")
    if self.mutations > self.length:
        raise ValueError("mutations must be between 0 and length")
    if self.end_date < self.start_date:
        raise ValueError("end_date must be greater than or equal to start_date")
    return self

Specimen

Bases: BaseModel

A single specimen.

Parameters:

Name Type Description Default
ident str

unique identifier

required
collected_on date

date when specimen was collected

required
genome str

bases in genome

required
mass float

snail mass in grams

required
site Point

grid location where specimen was collected

required
territory float

share of the grid that belongs to this specimen

0.0
Source code in src/snailz/specimens.py
57
58
59
60
61
62
63
64
65
66
67
class Specimen(BaseModel):
    """A single specimen."""

    ident: str = Field(description="unique identifier")
    collected_on: date = Field(description="date when specimen was collected")
    genome: str = Field(description="bases in genome")
    mass: float = Field(gt=0, description="snail mass in grams")
    site: Point = Field(description="grid location where specimen was collected")
    territory: float = Field(
        default=0.0, description="share of the grid that belongs to this specimen"
    )

AllSpecimens

Bases: BaseModel

A set of generated specimens.

Parameters:

Name Type Description Default
individuals list[Specimen]

list of individual specimens

required
loci list[int]

locations where mutations can occur

required
params SpecimenParams

parameters used to generate this data

required
reference str

unmutated genome

required
susceptible_base str

mutant base that induces mass changes

required
susceptible_locus int

location of mass change mutation

required
Source code in src/snailz/specimens.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
class AllSpecimens(BaseModel):
    """A set of generated specimens."""

    individuals: list[Specimen] = Field(description="list of individual specimens")
    loci: list[int] = Field(description="locations where mutations can occur")
    params: SpecimenParams = Field(description="parameters used to generate this data")
    reference: str = Field(description="unmutated genome")
    susceptible_base: str = Field(description="mutant base that induces mass changes")
    susceptible_locus: int = Field(ge=0, description="location of mass change mutation")

    def to_csv(self) -> str:
        """Return a CSV string representation of the specimens data.

        Returns:
            A CSV-formatted string with people data (without parameters)
        """

        output = io.StringIO()
        writer = utils.csv_writer(output)
        writer.writerow(
            ["ident", "x", "y", "genome", "mass", "collected_on", "territory"]
        )
        for indiv in self.individuals:
            writer.writerow(
                [
                    indiv.ident,
                    indiv.site.x,
                    indiv.site.y,
                    indiv.genome,
                    indiv.mass,
                    indiv.collected_on,
                    indiv.territory,
                ]
            )
        return output.getvalue()

to_csv()

Return a CSV string representation of the specimens data.

Returns:

Type Description
str

A CSV-formatted string with people data (without parameters)

Source code in src/snailz/specimens.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def to_csv(self) -> str:
    """Return a CSV string representation of the specimens data.

    Returns:
        A CSV-formatted string with people data (without parameters)
    """

    output = io.StringIO()
    writer = utils.csv_writer(output)
    writer.writerow(
        ["ident", "x", "y", "genome", "mass", "collected_on", "territory"]
    )
    for indiv in self.individuals:
        writer.writerow(
            [
                indiv.ident,
                indiv.site.x,
                indiv.site.y,
                indiv.genome,
                indiv.mass,
                indiv.collected_on,
                indiv.territory,
            ]
        )
    return output.getvalue()

specimens_generate(params, grid=None)

Generate specimens with random genomes and masses.

Each genome is a string of bases of the same length. One locus is randomly chosen as "significant", and a specific mutation there predisposes the snail to mass changes. Other mutations are added randomly at other loci. Specimen masses are only mutated if a grid is provided.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams object

required
grid Grid | None

Grid object to place specimens on for mass mutation

None

Returns:

Type Description
AllSpecimens

AllSpecimens object containing the generated specimens and parameters

Source code in src/snailz/specimens.py
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
def specimens_generate(
    params: SpecimenParams, grid: Grid | None = None
) -> AllSpecimens:
    """Generate specimens with random genomes and masses.

    Each genome is a string of bases of the same length. One locus is
    randomly chosen as "significant", and a specific mutation there
    predisposes the snail to mass changes. Other mutations are added
    randomly at other loci.  Specimen masses are only mutated if a
    grid is provided.

    Parameters:
        params: SpecimenParams object
        grid: Grid object to place specimens on for mass mutation

    Returns:
        AllSpecimens object containing the generated specimens and parameters

    """
    loci = _make_loci(params)
    reference = _make_reference_genome(params)
    susc_loc = _choose_one(loci)
    susc_base = reference[susc_loc]
    genomes = [_make_genome(reference, loci) for i in range(params.number)]
    masses = _make_masses(params, genomes, susc_loc, susc_base)
    identifiers = _make_idents(params.number)
    collection_dates = _make_collection_dates(params)

    individuals = [
        Specimen(genome=g, mass=m, site=Point(), ident=i, collected_on=d)
        for g, m, i, d in zip(genomes, masses, identifiers, collection_dates)
    ]

    result = AllSpecimens(
        individuals=individuals,
        loci=loci,
        params=params,
        reference=reference,
        susceptible_base=susc_base,
        susceptible_locus=susc_loc,
    )

    if grid is not None:
        mutate_masses(grid, result, params.mut_scale)
        calculate_ranges(grid.params.size, result)

    return result

calculate_ranges(size, specimens)

Calculate the territory of each specimen.

Source code in src/snailz/specimens.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
def calculate_ranges(size: int, specimens: AllSpecimens) -> None:
    """Calculate the territory of each specimen."""
    # Allocate points to specimens.
    belong = {}
    for x in range(size):
        for y in range(size):
            for indiv in specimens.individuals:
                assert indiv.site.x is not None
                assert indiv.site.y is not None
                dist = (x - indiv.site.x) ** 2 + (y - indiv.site.y) ** 2
                if ((x, y) not in belong) or (dist < belong[(x, y)]["dist"]):
                    belong[(x, y)] = {"dist": dist, "indiv": {indiv.ident}}
                elif dist == belong[(x, y)]["dist"]:
                    belong[(x, y)]["indiv"].add(indiv.ident)

    # Add up area per individual
    for indiv in specimens.individuals:
        indiv.territory = 0.0
        for b in belong.values():
            if indiv.ident in b["indiv"]:
                indiv.territory += 1 / len(b["indiv"])
        indiv.territory = round(indiv.territory, utils.PRECISION)

mutate_masses(grid, specimens, mut_scale, specific_index=None)

Mutate mass based on grid values and genetic susceptibility.

For each specimen, choose a random cell from the grid and modify the mass if the cell's value is non-zero and the genome is susceptible. Records the chosen site coordinates for each specimen regardless of whether mutation occurs. Modifies specimen masses in-place for susceptible individuals; updates site coordinates for all individuals.

Parameters:

Name Type Description Default
grid Grid

A Grid object containing pollution values

required
specimens AllSpecimens

A AllSpecimens object with individuals to potentially mutate

required
mut_scale float

Scaling factor for mutation effect

required
specific_index int | None

Optional index to mutate only a specific specimen

None
Source code in src/snailz/specimens.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
def mutate_masses(
    grid: Grid,
    specimens: AllSpecimens,
    mut_scale: float,
    specific_index: int | None = None,
) -> None:
    """Mutate mass based on grid values and genetic susceptibility.

    For each specimen, choose a random cell from the grid and modify
    the mass if the cell's value is non-zero and the genome is
    susceptible. Records the chosen site coordinates for each specimen
    regardless of whether mutation occurs.  Modifies specimen masses
    in-place for susceptible individuals; updates site coordinates for
    all individuals.

    Parameters:
        grid: A Grid object containing pollution values
        specimens: A AllSpecimens object with individuals to potentially mutate
        mut_scale: Scaling factor for mutation effect
        specific_index: Optional index to mutate only a specific specimen
    """
    grid_size = len(grid.grid)
    susc_locus = specimens.susceptible_locus
    susc_base = specimens.susceptible_base

    if specific_index is None:
        individuals = specimens.individuals
    else:
        individuals = [specimens.individuals[specific_index]]

    locations = _make_locations(grid_size, len(individuals))
    for indiv, (x, y) in zip(individuals, locations):
        indiv.site.x = x
        indiv.site.y = y
        if grid.grid[x][y] > 0 and indiv.genome[susc_locus] == susc_base:
            indiv.mass = mutate_mass(indiv.mass, mut_scale, grid.grid[x][y])

mutate_mass(original, mut_scale, cell_value)

Mutate a single specimen's mass.

Parameters:

Name Type Description Default
original float

The original mass value

required
mut_scale float

Scaling factor for mutation effect

required
cell_value int

The grid cell value affecting the mutation

required

Returns:

Type Description
float

The mutated mass value, rounded to PRECISION decimal places

Source code in src/snailz/specimens.py
218
219
220
221
222
223
224
225
226
227
228
229
def mutate_mass(original: float, mut_scale: float, cell_value: int) -> float:
    """Mutate a single specimen's mass.

    Parameters:
        original: The original mass value
        mut_scale: Scaling factor for mutation effect
        cell_value: The grid cell value affecting the mutation

    Returns:
        The mutated mass value, rounded to PRECISION decimal places
    """
    return round(original * (1 + (mut_scale * cell_value)), utils.PRECISION)

_choose_one(values)

Choose a single random item from a collection.

Parameters:

Name Type Description Default
values list[int]

A sequence to choose from

required

Returns:

Type Description
int

A randomly selected item from the values sequence

Source code in src/snailz/specimens.py
232
233
234
235
236
237
238
239
240
241
def _choose_one(values: list[int]) -> int:
    """Choose a single random item from a collection.

    Parameters:
        values: A sequence to choose from

    Returns:
        A randomly selected item from the values sequence
    """
    return random.choices(values, k=1)[0]

_choose_other(values, exclude)

Choose a value at random except for the excluded values.

Parameters:

Name Type Description Default
values str

A collection to choose from

required
exclude str

Value or collection of values to exclude from the choice

required

Returns:

Type Description
str

A randomly selected item from values that isn't in exclude

Source code in src/snailz/specimens.py
244
245
246
247
248
249
250
251
252
253
254
255
def _choose_other(values: str, exclude: str) -> str:
    """Choose a value at random except for the excluded values.

    Parameters:
        values: A collection to choose from
        exclude: Value or collection of values to exclude from the choice

    Returns:
        A randomly selected item from values that isn't in exclude
    """
    candidates = list(sorted(set(values) - set(exclude)))
    return candidates[random.randrange(len(candidates))]

_make_genome(reference, loci)

Make an individual genome by mutating the reference genome.

Parameters:

Name Type Description Default
reference str

Reference genome string to base the new genome on

required
loci list[int]

List of positions that can be mutated

required

Returns:

Type Description
str

A new genome string with random mutations at some loci

Source code in src/snailz/specimens.py
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
def _make_genome(reference: str, loci: list[int]) -> str:
    """Make an individual genome by mutating the reference genome.

    Parameters:
        reference: Reference genome string to base the new genome on
        loci: List of positions that can be mutated

    Returns:
        A new genome string with random mutations at some loci
    """
    result = list(reference)
    num_mutations = random.randint(1, len(loci))
    for loc in random.sample(range(len(loci)), num_mutations):
        result[loc] = _choose_other(BASES, reference[loc])
    return "".join(result)

_make_idents(count)

Create unique specimen identifiers.

Each identifier is a 6-character string: - First two characters are the same uppercase letters for all specimens - Remaining four chararacters are random uppercase letters and digits

Parameters:

Name Type Description Default
count int

Number of identifiers to generate

required

Returns:

Type Description
list[str]

List of unique specimen identifiers

Source code in src/snailz/specimens.py
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
def _make_idents(count: int) -> list[str]:
    """Create unique specimen identifiers.

    Each identifier is a 6-character string:
    - First two characters are the same uppercase letters for all specimens
    - Remaining four chararacters are random uppercase letters and digits

    Parameters:
        count: Number of identifiers to generate

    Returns:
        List of unique specimen identifiers
    """
    prefix = "".join(random.choices(string.ascii_uppercase, k=2))
    chars = string.ascii_uppercase + string.digits
    gen = utils.UniqueIdGenerator(
        "specimens", lambda: f"{prefix}{''.join(random.choices(chars, k=4))}"
    )
    return [gen.next() for _ in range(count)]

_make_locations(size, num)

Generate non-adjacent locations for specimens or fail.

Source code in src/snailz/specimens.py
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
def _make_locations(size: int, num: int) -> list[tuple[int, int]]:
    """Generate non-adjacent locations for specimens or fail."""
    available = {(x, y) for x in range(size) for y in range(size)}
    chosen = set()
    for i in range(num):
        if not available:
            utils.fail(f"failed to select {num} points on iteration {i}")
        point = random.choice(list(available))
        chosen.add(point)
        for x in range(point[0] - 1, point[0] + 2):
            if (x < 0) or (x >= size):
                continue
            for y in range(point[1] - 1, point[1] + 2):
                if (y < 0) or (y >= size):
                    continue
                available.discard((x, y))
    return list(chosen)

_make_loci(params)

Make a list of mutable loci positions.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with length and mutations attributes

required

Returns:

Type Description
list[int]

A list of unique randomly selected positions that can be mutated

Source code in src/snailz/specimens.py
315
316
317
318
319
320
321
322
323
324
def _make_loci(params: SpecimenParams) -> list[int]:
    """Make a list of mutable loci positions.

    Parameters:
        params: SpecimenParams with length and mutations attributes

    Returns:
        A list of unique randomly selected positions that can be mutated
    """
    return random.sample(list(range(params.length)), params.mutations)

_make_masses(params, genomes, susceptible_locus, susceptible_base)

Generate random masses for specimens.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with min_mass and max_mass attributes

required
genomes list[str]

List of genome strings

required
susceptible_locus int

Position that determines susceptibility

required
susceptible_base str

Base that makes a specimen susceptible

required

Returns:

Type Description
list[float]

List of randomly generated mass values between min_mass and max_mass,

list[float]

rounded to PRECISION decimal places

Source code in src/snailz/specimens.py
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
def _make_masses(
    params: SpecimenParams,
    genomes: list[str],
    susceptible_locus: int,
    susceptible_base: str,
) -> list[float]:
    """Generate random masses for specimens.

    Parameters:
        params: SpecimenParams with min_mass and max_mass attributes
        genomes: List of genome strings
        susceptible_locus: Position that determines susceptibility
        susceptible_base: Base that makes a specimen susceptible

    Returns:
        List of randomly generated mass values between min_mass and max_mass,
        rounded to PRECISION decimal places
    """
    return [
        round(random.uniform(params.min_mass, params.max_mass), utils.PRECISION)
        for _ in genomes
    ]

_make_collection_dates(params)

Generate random collection dates for specimens.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with start_date, end_date, and number attributes

required

Returns:

Type Description
list[date]

List of randomly generated collection dates between start_date and end_date

Source code in src/snailz/specimens.py
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
def _make_collection_dates(params: SpecimenParams) -> list[date]:
    """Generate random collection dates for specimens.

    Parameters:
        params: SpecimenParams with start_date, end_date, and number attributes

    Returns:
        List of randomly generated collection dates between start_date and end_date
    """
    start_ordinal = params.start_date.toordinal()
    end_ordinal = params.end_date.toordinal()
    return [
        date.fromordinal(random.randint(start_ordinal, end_ordinal))
        for _ in range(params.number)
    ]

_make_reference_genome(params)

Make a random reference genome.

Parameters:

Name Type Description Default
params SpecimenParams

SpecimenParams with length attribute

required

Returns:

Type Description
str

A randomly generated genome string of the specified length

Source code in src/snailz/specimens.py
368
369
370
371
372
373
374
375
376
377
def _make_reference_genome(params: SpecimenParams) -> str:
    """Make a random reference genome.

    Parameters:
        params: SpecimenParams with length attribute

    Returns:
        A randomly generated genome string of the specified length
    """
    return "".join(random.choices(BASES, k=params.length))