Building Block

class BuildingBlock(smiles, functional_groups=(), placer_ids=None, position_matrix=None)[source]

Bases: stk.molecular.molecules.molecule.molecule.Molecule

Represents a building block of a ConstructedMolecule.

A BuildingBlock can represent either an entire molecule or a molecular fragments used to construct a ConstructedMolecule. The building block uses FunctionalGroup instances to identify which atoms are modified during construction.

Methods

clone()

Return a clone.

get_atomic_positions([atom_ids])

Yield the positions of atoms.

get_atoms([atom_ids])

Yield the atoms in the molecule, ordered by id.

get_bonds()

Yield the bond in the molecule.

get_canonical_atom_ids()

Map the id of each atom to its id under canonical ordering.

get_centroid([atom_ids])

Return the centroid.

get_core_atom_ids()

Yield ids of atoms which form the core of the building block.

get_direction([atom_ids])

Return a vector of best fit through the atoms.

get_functional_groups([fg_ids])

Yield the functional groups, ordered by id.

get_maximum_diameter([atom_ids])

Return the maximum diameter.

get_num_atoms()

Return the number of atoms in the molecule.

get_num_bonds()

Return the number of bonds in the molecule.

get_num_functional_groups()

Return the number of functional groups.

get_num_placers()

Return the number of placer atoms in the building block.

get_placer_ids()

Yield the ids of placer atoms.

get_plane_normal([atom_ids])

Return the normal to the plane of best fit.

get_position_matrix()

Return a matrix holding the atomic positions.

init(atoms, bonds, position_matrix[, …])

Initialize a BuildingBlock from its components.

init_from_file(path[, functional_groups, …])

Initialize from a file.

init_from_molecule(molecule[, …])

Initialize from a Molecule.

init_from_rdkit_mol(molecule[, …])

Initialize from an rdkit molecule.

init_from_vabene_molecule(molecule[, …])

Initialize from a vabene.Molecule.

to_rdkit_mol()

Return an rdkit representation.

with_canonical_atom_ordering()

Return a clone, with canonically ordered atoms.

with_centroid(position[, atom_ids])

Return a clone with its centroid at position.

with_displacement(displacement)

Return a displaced clone.

with_functional_groups(functional_groups)

Return a clone with specific functional groups.

with_position_matrix(position_matrix)

Return a clone with atomic positions set by position_matrix.

with_rotation_about_axis(angle, axis, origin)

Return a rotated clone.

with_rotation_between_vectors(start, target, …)

Return a rotated clone.

with_rotation_to_minimize_angle(start, …)

Return a rotated clone.

with_structure_from_file(path[, extension])

Return a clone, with its structure taken from a file.

write(path[, atom_ids])

Write the structure to a file.

__init__(smiles, functional_groups=(), placer_ids=None, position_matrix=None)[source]

Initialize a BuildingBlock.

Notes

The molecule is given 3D coordinates with rdkit.ETKDGv2().

Parameters
  • smiles (str) – A SMILES string of the molecule.

  • functional_groups (iterable, optional) – An iterable of FunctionalGroup or FunctionalGroupFactory or both. FunctionalGroup instances are added to the building block and FunctionalGroupFactory instances are used to create FunctionalGroup instances the building block should hold. FunctionalGroup instances are used to identify which atoms are modified during ConstructedMolecule construction.

  • placer_ids (tuple of int, optional) –

    The ids of placer atoms. These are the atoms which should be used for calculating the position of the building block. Depending on the values passed to placer_ids, and the functional groups in the building block, different placer ids will be used by the building block.

    1. placer_ids is passed to the initializer: the passed placer ids will be used by the building block.

    2. placer_ids is None and the building block has functional groups: The placer ids of the functional groups will be used as the placer ids of the building block.

    3. placer_ids is None and functional_groups is empty. All atoms of the molecule will be used for placer ids.

  • position_matrix (numpy.ndarray, optional) – The position matrix the building block should use. If None, rdkit.ETKDGv2() will be used to calculate it.

Raises

RuntimeError – If embedding the molecule fails.

clone()[source]

Return a clone.

Returns

The clone. Has the same type as the original molecule.

Return type

Molecule

get_atomic_positions(atom_ids=None)

Yield the positions of atoms.

Parameters

atom_ids (iterable of int, optional) – The ids of the atoms whose positions are desired. If None, then the positions of all atoms will be yielded. Can be a single int, if the position of a single atom is desired.

Yields

numpy.ndarray – The x, y and z coordinates of an atom.

get_atoms(atom_ids=None)

Yield the atoms in the molecule, ordered by id.

Parameters

atom_ids (iterable of int, optional) – The ids of atoms to yield. Can be a single int if a single atom is wanted, or None if all atoms are wanted.

Yields

Atom – An atom in the molecule.

get_bonds()

Yield the bond in the molecule.

Yields

Bond – A bond in the molecule.

get_canonical_atom_ids()

Map the id of each atom to its id under canonical ordering.

Returns

Maps the id of each atom in the molecule to the id it would have under canonical ordering.

Return type

dict

get_centroid(atom_ids=None)

Return the centroid.

Parameters

atom_ids (iterable of int, optional) – The ids of atoms which are used to calculate the centroid. Can be a single int, if a single atom is to be used, or None if all atoms are to be used.

Returns

The centroid of atoms specified by atom_ids.

Return type

numpy.ndarray

Raises

ValueError – If atom_ids has a length of 0.

get_core_atom_ids()[source]

Yield ids of atoms which form the core of the building block.

This includes all atoms in the building block not part of a functional group, as well as any atoms in a functional group, specifically labelled as core atoms.

Yields

int – The id of a core atom.

get_direction(atom_ids=None)

Return a vector of best fit through the atoms.

Parameters

atom_ids (iterable of int, optional) – The ids of atoms which should be used to calculate the vector. Can be a single int, if a single atom is to be used, or None, if all atoms are to be used.

Returns

The vector of best fit.

Return type

numpy.ndarray

Raises

ValueError – If atom_ids has a length of 0.

get_functional_groups(fg_ids=None)[source]

Yield the functional groups, ordered by id.

Parameters

fg_ids (Union[int, Iterable[int], None]) – The ids of functional groups yielded. If None, then all functional groups are yielded. Can be a single int, if a single functional group is desired.

Yields

A functional group of the building block.

Return type

Iterable[FunctionalGroup]

get_maximum_diameter(atom_ids=None)

Return the maximum diameter.

This method does not account for the van der Waals radius of atoms.

Parameters

atom_ids (iterable of int, optional) – The ids of atoms which are considered when looking for the maximum diameter. Can be a single int, if a single atom is to be used, or None, if all atoms are to be used.

Returns

The maximum diameter in the molecule.

Return type

float

Raises

ValueError – If atom_ids has a length of 0.

get_num_atoms()

Return the number of atoms in the molecule.

Returns

The number of atoms in the molecule.

Return type

int

get_num_bonds()

Return the number of bonds in the molecule.

Returns

The number of bonds in the molecule.

Return type

int

get_num_functional_groups()[source]

Return the number of functional groups.

Returns

The number of functional groups in the building block.

Return type

int

get_num_placers()[source]

Return the number of placer atoms in the building block.

Return type

int

Returns

The number of placer atoms in the building block.

get_placer_ids()[source]

Yield the ids of placer atoms.

Placer atoms are those, which should be used to calculate the position of the building block.

Yields

int – The id of a placer atom.

get_plane_normal(atom_ids=None)

Return the normal to the plane of best fit.

Parameters

atom_ids (iterable of int, optional) – The ids of atoms which should be used to calculate the plane. Can be a single int, if a single atom is to be used, or None, if all atoms are to be used.

Returns

Vector orthonormal to the plane of the molecule.

Return type

numpy.ndarray

Raises

ValueError – If atom_ids has a length of 0.

get_position_matrix()

Return a matrix holding the atomic positions.

Returns

The array has the shape (n, 3). Each row holds the x, y and z coordinates of an atom.

Return type

numpy.ndarray

classmethod init(atoms, bonds, position_matrix, functional_groups=(), placer_ids=None)[source]

Initialize a BuildingBlock from its components.

Parameters
  • atoms (tuple of Atom) – The atoms of the building block.

  • bonds (tuple of Bond) – The bonds of the building block.

  • position_matrix (numpy.ndarray) – An (n, 3) position matrix of the building block.

  • functional_groups (iterable, optional) – An iterable holding the FunctionalGroup instances the building block should have, and / or FunctionalGroupFactory instances used for creating them.

  • placer_ids (tuple of int, optional) –

    The ids of placer atoms. These are the atoms which should be used for calculating the position of the building block. Depending on the values passed to placer_ids, and the functional groups in the building block, different placer ids will be used by the building block.

    1. placer_ids is passed to the initializer: the passed placer ids will be used by the building block.

    2. placer_ids is None and the building block has functional groups: The placer ids of the functional groups will be used as the placer ids of the building block.

    3. placer_ids is None and functional_groups is empty. All atoms of the molecule will be used for placer ids.

Returns

The building block.

Return type

BuildingBlock

classmethod init_from_file(path, functional_groups=(), placer_ids=None)[source]

Initialize from a file.

Parameters
  • path (str) –

    The path to a molecular structure file. Supported file types are:

    1. .mol, .sdf - MDL V3000 MOL file

    2. .pdb - PDB file

  • functional_groups (iterable, optional) – An iterable of FunctionalGroup or FunctionalGroupFactory or both. FunctionalGroup instances are added to the building block and FunctionalGroupFactory instances are used to create FunctionalGroup instances the building block should hold. FunctionalGroup instances are used to identify which atoms are modified during ConstructedMolecule construction.

  • placer_ids (tuple of int, optional) –

    The ids of placer atoms. These are the atoms which should be used for calculating the position of the building block. Depending on the values passed to placer_ids, and the functional groups in the building block, different placer ids will be used by the building block.

    1. placer_ids is passed to the initializer: the passed placer ids will be used by the building block.

    2. placer_ids is None and the building block has functional groups: The placer ids of the functional groups will be used as the placer ids of the building block.

    3. placer_ids is None and functional_groups is empty. All atoms of the molecule will be used for placer ids.

Returns

The building block.

Return type

BuildingBlock

Raises

ValueError – If the file type cannot be used for initialization.

classmethod init_from_molecule(molecule, functional_groups=(), placer_ids=None)[source]

Initialize from a Molecule.

Parameters
  • molecule (Molecule) – The molecule to initialize from.

  • functional_groups (iterable, optional) – An iterable of FunctionalGroup or FunctionalGroupFactory or both. FunctionalGroup instances are added to the building block and FunctionalGroupFactory instances are used to create FunctionalGroup instances the building block should hold. FunctionalGroup instances are used to identify which atoms are modified during ConstructedMolecule construction.

  • placer_ids (tuple of int, optional) –

    The ids of placer atoms. These are the atoms which should be used for calculating the position of the building block. Depending on the values passed to placer_ids, and the functional groups in the building block, different placer ids will be used by the building block.

    1. placer_ids is passed to the initializer: the passed placer ids will be used by the building block.

    2. placer_ids is None and the building block has functional groups: The placer ids of the functional groups will be used as the placer ids of the building block.

    3. placer_ids is None and functional_groups is empty. All atoms of the molecule will be used for placer ids.

Returns

The building block. It will have the same atoms, bonds and atomic positions as molecule.

Return type

BuildingBlock

classmethod init_from_rdkit_mol(molecule, functional_groups=(), placer_ids=None)[source]

Initialize from an rdkit molecule.

Parameters
  • molecule (rdkit.Mol) – The molecule.

  • functional_groups (iterable, optional) – An iterable of FunctionalGroup or FunctionalGroupFactory or both. FunctionalGroup instances are added to the building block and FunctionalGroupFactory instances are used to create FunctionalGroup instances the building block should hold. FunctionalGroup instances are used to identify which atoms are modified during ConstructedMolecule construction.

  • placer_ids (tuple of int, optional) –

    The ids of placer atoms. These are the atoms which should be used for calculating the position of the building block. Depending on the values passed to placer_ids, and the functional groups in the building block, different placer ids will be used by the building block.

    1. placer_ids is passed to the initializer: the passed placer ids will be used by the building block.

    2. placer_ids is None and the building block has functional groups: The placer ids of the functional groups will be used as the placer ids of the building block.

    3. placer_ids is None and functional_groups is empty. All atoms of the molecule will be used for placer ids.

Returns

The molecule.

Return type

BuildingBlock

classmethod init_from_vabene_molecule(molecule, functional_groups=(), placer_ids=None, position_matrix=None)[source]

Initialize from a vabene.Molecule.

Notes

The molecule is given 3D coordinates with rdkit.ETKDGv2().

Parameters
  • molecule (vabene.Molecule) – The molecule from which to initialize.

  • functional_groups (iterable, optional) – An iterable of FunctionalGroup or FunctionalGroupFactory or both. FunctionalGroup instances are added to the building block and FunctionalGroupFactory instances are used to create FunctionalGroup instances the building block should hold. FunctionalGroup instances are used to identify which atoms are modified during ConstructedMolecule construction.

  • placer_ids (tuple of int, optional) –

    The ids of placer atoms. These are the atoms which should be used for calculating the position of the building block. Depending on the values passed to placer_ids, and the functional groups in the building block, different placer ids will be used by the building block.

    1. placer_ids is passed to the initializer: the passed placer ids will be used by the building block.

    2. placer_ids is None and the building block has functional groups: The placer ids of the functional groups will be used as the placer ids of the building block.

    3. placer_ids is None and functional_groups is empty. All atoms of the molecule will be used for placer ids.

  • position_matrix (numpy.ndarray, optional) – The position matrix the building block should use. If None, rdkit.ETKDGv2() will be used to calculate it.

Returns

The building block.

Return type

BuildingBlock

Raises

RuntimeError – If embedding the molecule fails.

to_rdkit_mol()

Return an rdkit representation.

Returns

The molecule in rdkit format.

Return type

rdkit.Mol

with_canonical_atom_ordering()

Return a clone, with canonically ordered atoms.

Returns

The clone. Has the same type as the original molecule.

Return type

Molecule

with_centroid(position, atom_ids=None)

Return a clone with its centroid at position.

Parameters
  • position (numpy.ndarray) – This array holds the position on which the centroid of the clone is going to be placed.

  • atom_ids (iterable of int, optional) – The ids of atoms which should have their centroid set to position. Can be a single int, if a single atom is to be used, or None, if all atoms are to be used.

Returns

A clone with its centroid at position. Has the same type as the original molecule.

Return type

Molecule

with_displacement(displacement)

Return a displaced clone.

Parameters

displacement (numpy.ndarray) – The displacement vector to be applied.

Returns

A displaced clone. Has the same type as the original molecule.

Return type

Molecule

with_functional_groups(functional_groups)[source]

Return a clone with specific functional groups.

Parameters

functional_groups (iterable) – FunctionalGroup instances which the clone should have.

Returns

The clone. Has the same type as the original molecule.

Return type

BuildingBlock

with_position_matrix(position_matrix)

Return a clone with atomic positions set by position_matrix.

Parameters

position_matrix (numpy.ndarray) – The position matrix of the clone. The shape of the matrix is (n, 3).

Returns

The clone. Has the same type as the original molecule.

Return type

Molecule

with_rotation_about_axis(angle, axis, origin)

Return a rotated clone.

The clone is rotated by angle about axis on the origin.

Parameters
  • angle (float) – The size of the rotation in radians.

  • axis (numpy.ndarray) – The axis about which the rotation happens. Must have unit magnitude.

  • origin (numpy.ndarray) – The origin about which the rotation happens.

Returns

A rotated clone. Has the same type as the original molecule.

Return type

Molecule

with_rotation_between_vectors(start, target, origin)

Return a rotated clone.

The rotation is equal to a rotation from start to target.

Given two direction vectors, start and target, this method applies the rotation required transform start to target onto the clone. The rotation occurs about the origin.

For example, if the start and target vectors are 45 degrees apart, a 45 degree rotation will be applied to the clone. The rotation will be along the appropriate direction.

The great thing about this method is that you as long as you can associate a geometric feature of the molecule with a vector, then the clone can be rotated so that this vector is aligned with target. The defined vector can be virtually anything. This means that any geometric feature of the molecule can be easily aligned with any arbitrary direction.

Parameters
  • start (numpy.ndarray) – A vector which is to be rotated so that it transforms into the target vector.

  • target (numpy.ndarray) – The vector onto which start is rotated.

  • origin (numpy.ndarray) – The point about which the rotation occurs.

Returns

A rotated clone. Has the same type as the original molecule.

Return type

Molecule

with_rotation_to_minimize_angle(start, target, axis, origin)

Return a rotated clone.

The clone is rotated by the rotation required to minimize the angle between start and target.

Note that this function will not necessarily overlay the start and target vectors. This is because the possible rotation is restricted to the axis.

Parameters
  • start (numpy.ndarray) – The vector which is rotated.

  • target (numpy.ndarray) – The vector which is stationary.

  • axis (numpy.ndarray) – The vector about which the rotation happens. Must have unit magnitude.

  • origin (numpy.ndarray) – The origin about which the rotation happens.

Returns

A rotated clone. Has the same type as the original molecule.

Return type

Molecule

Raises

ValueError – If target has a magnitude of 0. In this case it is not possible to calculate an angle between start and target.

with_structure_from_file(path, extension=None)

Return a clone, with its structure taken from a file.

Multiple file types are supported, namely:

  1. .mol, .sdf - MDL V2000 and V3000 files

  2. .xyz - XYZ files

  3. .mae - Schrodinger Maestro files

  4. .coord - Turbomole files

  5. .pdb - PDB files

Parameters
  • path (str) – The path to a molecular structure file holding updated coordinates for the Molecule.

  • extension (str, optional) – If you want to treat the file as though it has a particular extension, put it here. Include the dot.

Returns

A clone with atomic positions found in path. Has the same type as the original molecule.

Return type

Molecule

write(path, atom_ids=None)

Write the structure to a file.

This function will write the format based on the extension of path.

  1. .mol, .sdf - MDL V3000 MOL file

  2. .xyz - XYZ file

  3. .pdb - PDB file

Parameters
  • path (str) – The path to which the molecule should be written.

  • atom_ids (iterable of int, optional) – The atom ids of atoms to write. Can be a single int, if a single atom is to be used, or None, if all atoms are to be used. If you use this parameter, the atom ids in the file may not correspond to the atom ids in the molecule.

Returns

The molecule.

Return type

Molecule