Stochastic Universal Sampling
- class StochasticUniversalSampling(num_batches=None, batch_size=1, duplicate_molecules=True, duplicate_batches=True, key_maker=Inchi(), fitness_modifier=None, random_seed=None)[source]
Bases:
Selector
Yields batches of molecules through stochastic universal sampling.
Stochastic universal sampling lays out batches along a line, with each batch taking up length proportional to its fitness. It then creates a set of evenly spaced pointers to different points on the line, each of which is occupied by a batch. Batches which are pointed to are yielded.
This approach means weaker members of the population are given a greater chance to be chosen than in
Roulette
selection [1].References
Examples
Yielding Single Molecule Batches
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation.
import stk # Make the selector. stochastic_sampling = stk.StochasticUniversalSampling(5) population = tuple( stk.MoleculeRecord( topology_graph=stk.polymer.Linear( building_blocks=( stk.BuildingBlock( smiles='BrCCBr', functional_groups=[stk.BromoFactory()], ), ), repeating_unit='A', num_repeating_units=2, ), ).with_fitness_value(i) for i in range(100) ) # Select the molecules. for selected, in stochastic_sampling.select(population): # Do stuff with each selected molecule. pass
Methods
select
(population[, included_batches, ...])Yield batches of molecule records from population.
- __init__(num_batches=None, batch_size=1, duplicate_molecules=True, duplicate_batches=True, key_maker=Inchi(), fitness_modifier=None, random_seed=None)[source]
Initialize a
StochasticUniversalSampling
instance.- Parameters:
num_batches (
int
, optional) – The number of batches to yield. IfNone
then yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int
, optional) – The number of molecules yielded at once.duplicate_molecules (
bool
, optional) – IfTrue
the same molecule can be yielded in more than one batch.duplicate_batches (
bool
, optional) – IfTrue
the same batch can be yielded more than once.key_maker (
MoleculeKeyMaker
, optional) – Used to get the keys of molecules. If two molecules have the same key, they are considered duplicates.fitness_modifier (
callable
, optional) – Takes the population on whichselect()
is called and returns adict
, which maps records in the population to the fitness values theSelector
should use. IfNone
, the regular fitness values of the records are used.random_seed (
int
, optional) – The random seed to use.
- select(population, included_batches=None, excluded_batches=None)
Yield batches of molecule records from population.
- Parameters:
population (
tuple
ofMoleculeRecord
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields:
Batch
ofMoleculeRecord
– A batch of selected molecule records.