Basic Examples
Note
The molecules displayed are interactive renderings.
Creating Building Blocks
There are a number of ways to create BuildingBlock
molecules,
but the two most common ways are direct creation through SMILES
strings and loading them from molecule structure files.
import stk
bb1 = stk.BuildingBlock('NCCCN')
bb2 = stk.BuildingBlock.init_from_file('path/to/file.mol')
Look at the documentation of BuildingBlock
to see other
available initialization methods. They are easy to find, as they all
begin with init
.
Specifying Multiple Functional Groups
When you create a BuildingBlock
, you also need to specify
which atoms are modified during construction of a
ConstructedMolecule
. This is achieved by providing the
BuildingBlock
with
functional_group
instances. To save you
the pain of creating function groups one by one, you can use a
functional_group_factory
. If you have a building block
with bromo groups, and you want the bromo groups to be modified
during construction, you would use a BromoFactory
import stk
bb = stk.BuildingBlock('BrCCCBr', [stk.BromoFactory()])
The bb
, in the example above, would have two Bromo
functional groups. Similarly, if you have a building block with
aldehyde groups
bb2 = stk.BuildingBlock('O=CCCC=O', [stk.AldehydeFactory()])
In this example, bb2
will have two Aldehyde
functional
groups. Finally, if you have both aldehyde and bromo groups on a
molecule, and you want both to be modified during construction,
you would use both of the factories
bb3 = stk.BuildingBlock(
smiles='O=CCCBr',
functional_groups=[stk.AldehydeFactory(), stk.BromoFactory()],
)
In the example above, bb3
has one Bromo
and one
Aldehyde
functional group.
Constructing Molecules
To construct molecules, you need to create a new
ConstructedMolecule
. The required input consists of
a TopologyGraph
, which, in turn, requires
BuildingBlock
instances.
import stk
# React the amine functional groups during construction.
bb1 = stk.BuildingBlock('NCCN', [stk.PrimaryAminoFactory()])
# React the aldehyde functional groups during construction.
bb2 = stk.BuildingBlock('O=CCCC=O', [stk.AldehydeFactory()])
# Build a polymer.
polymer = stk.ConstructedMolecule(
topology_graph=stk.polymer.Linear(
building_blocks=(bb1, bb2),
repeating_unit='AB',
num_repeating_units=4,
optimizer=stk.Collapser(scale_steps=False),
),
)
# Build a longer polymer.
longer = stk.ConstructedMolecule(
topology_graph=stk.polymer.Linear(
building_blocks=(bb1, bb2),
repeating_unit='AB',
num_repeating_units=8,
optimizer=stk.Collapser(scale_steps=False),
),
)
Each topology graph requires different input parameters.
For example, organic cage topology graphs only require the
BuildingBlock
instances.
# The cage requires a building block with 3 functional groups.
cage_bb2 = stk.BuildingBlock(
smiles='O=CC(C=O)CC=O',
functional_groups=[stk.AldehydeFactory()],
)
cage = stk.ConstructedMolecule(
topology_graph=stk.cage.FourPlusSix((bb1, cage_bb2)),
)
Note that this structure is far from ideal, the next example shows how to make one with shorter bond lengths!
Read the documentation for each kind of TopologyGraph
, for
more examples on how to initialize it, and to see what optional
parameters you have available.
Using Built-in Optimizers During Construction
All TopologyGraph
instances take an optimizer argument,
which provides efficient optimization of stk
structures from
their expanded form. No optimization will be performed with the
NullOptimizer
.
Collapser
performs rigid translations of the building blocks
toward the centroid of the ConstructedMolecule
until steric
clashes occur.
import stk
bb1 = stk.BuildingBlock('NCCN', [stk.PrimaryAminoFactory()])
bb2 = stk.BuildingBlock('O=CCCC=O', [stk.AldehydeFactory()])
polymer = stk.ConstructedMolecule(
topology_graph=stk.polymer.Linear(
building_blocks=(bb1, bb2),
repeating_unit='AB',
num_repeating_units=3,
optimizer=stk.Collapser(scale_steps=False),
),
)
Similarly, MCHammer
performs rigid translations of the
building blocks either toward the centroid of the
ConstructedMolecule
or along the bonds formed during
construction following a Metropolis Monte Carlo algorithm with
simplified potential energy terms for the long bonds and nonbonded
interactions.
bb1 = stk.BuildingBlock('NCCN', [stk.PrimaryAminoFactory()])
bb2 = stk.BuildingBlock(
smiles='O=CC(C=O)CC=O',
functional_groups=[stk.AldehydeFactory()],
)
cage = stk.ConstructedMolecule(
topology_graph=stk.cage.FourPlusSix(
building_blocks=(bb1, bb2),
optimizer=stk.MCHammer(),
),
)
See also
The Collapser
and MCHammer
optimizers use the
algorithms from https://github.com/andrewtarzia/MCHammer.
stk
returns the final molecule only but further
visualisation of the full trajectory and properties can be
performed using the MCHammer
code explicitly. This is useful
for determining optimal optimization parameters, for which safe
options are provided by default in stk
.
Using RDKit to Optimize Molecular Structures
Molecules used by stk
can be structure optimized both before and
after construction. One easy way to do is, is with the
rdkit
library. You can optimize any stk
Molecule
, such as a BuildingBlock
import stk
import rdkit.Chem.AllChem as rdkit
bb = stk.BuildingBlock('BrCCBr', [stk.BromoFactory()])
# Optimize with the MMFF force field.
rdkit_bb = bb.to_rdkit_mol()
rdkit.SanitizeMol(rdkit_bb)
rdkit.MMFFOptimizeMolecule(rdkit_bb)
# stk molecules are immutable. with_position_matrix returns a
# a clone, holding the new position matrix.
bb = bb.with_position_matrix(
position_matrix=rdkit_bb.GetConformer().GetPositions(),
)
or a ConstructedMolecule
polymer = stk.ConstructedMolecule(
topology_graph=stk.polymer.Linear((bb, ), 'A', 8),
)
# Optimize with the MMFF force field.
rdkit_polymer = polymer.to_rdkit_mol()
rdkit.SanitizeMol(rdkit_polymer)
rdkit.MMFFOptimizeMolecule(rdkit_polymer)
# stk molecules are immutable. with_position_matrix returns a
# a clone, holding the new position matrix.
polymer = polymer.with_position_matrix(
position_matrix=rdkit_polymer.GetConformer().GetPositions(),
)
Writing Molecular Files
The simplest way to save molecules is to write them to a file.
This works with any Molecule
, including both the
BuildingBlock
import stk
bb = stk.BuildingBlock(
smiles='ICCBr',
functional_groups=[stk.BromoFactory(), stk.IodoFactory()],
)
stk.MolWriter().write(bb, 'bb.mol')
and the ConstructedMolecule
polymer = stk.ConstructedMolecule(
topology_graph=stk.polymer.Linear((bb, ), 'A', 10),
)
stk.MolWriter().write(polymer, 'polymer.mol')
You can see what file formats are supported by reading the
documentation for write()
.
Placing and Retrieving Molecules From a Database
Requirements
stk
allows you to place molecules into a
MoleculeDatabase
. Out-of-the-box, stk
comes
with support for a MoleculeMongoDb
. In order to use it
locally, you have to install MongoDB on your computer.
Documentation for installing, and making sure your local MongoDB is working properly, can be found here. Trust me, this is easy to do and worth it.
You can also use a remote MongoDB, in which case you do not have to
install it locally, but you will still need to install
pymongo
.
Molecules and Building Blocks
To place molecules into the database, first create the database
import stk
import pymongo
# Connect to a MongoDB. This example connects to a local
# MongoDB, but you can connect to a remote DB too with
# MongoClient() - read the documentation for pymongo to see how
# to do that.
client = pymongo.MongoClient()
db = stk.MoleculeMongoDb(client)
You then create and place a molecule into the database,
for example, a BuildingBlock
bb = stk.BuildingBlock('BrCCBr', [stk.BromoFactory()])
# Note that as soon as put() is called, the molecule is placed
# into permanent storage.
db.put(bb)
Note that stk
databases do not have a staging area. The
moment you call put()
, the molecule is
committed to the database.
To retrieve a molecule from the database, by default, you would provide the InChIKey. To first thing you might want to do is write a function which turns the SMILES of a molecule into the InChIKey
import rdkit.Chem.AllChem as rdkit
def get_inchi_key(smiles):
return rdkit.MolToInchiKey(rdkit.MolFromSmiles(smiles))
Now we can load the molecule from the database, by providing the SMILES of the molecule
loaded = db.get({
'InChIKey': get_inchi_key('BrCCBr'),
})
However, this step can be customized. For example, the documentation of
MoleculeMongoDb
, shows how you can use SMILES to retrieve
your molecules, without needing to write a function like
get_inchi_key()
.
The loaded
molecule is only a Molecule
instance,
and not a BuildingBlock
instance, which means that it lacks
functional groups. You can restore your functional groups however
loaded_bb = stk.BuildingBlock.init_from_molecule(
molecule=loaded,
functional_groups=[stk.BromoFactory()],
)
Constructed Molecules
You can use the same database for placing
ConstructedMolecule
instances
polymer = stk.ConstructedMolecule(
topology_graph=stk.polymer.Linear((bb, ), 'A', 2),
)
db.put(polymer)
and restore them in the same way
loaded = db.get({
'InChIKey': get_inchi_key('BrCCCCBr'),
})
However, once again, loaded
will only be a Molecule
instance, and not a ConstructedMolecule
instance.
If you want to store and retrieve ConstructedMolecule
instances, you have to create a ConstructedMoleculeMongoDb
constructed_db = stk.ConstructedMoleculeMongoDb(client)
constructed_db.put(polymer)
loaded_polymer = constructed_db.get({
'InChIKey': get_inchi_key('BrCCCCBr'),
})
Unlike loaded
, loaded_polymer
is a
ConstructedMolecule
instance.
Placing and Retrieving Molecular Property Values From a Database
Requirements
Using a ValueMongoDb
has the same requirements as the
previous example.
Storing Values
Unlike the previous example, you can deposit values for both
a BuildingBlock
and a ConstructedMolecule
in the
same database. First, lets create one
import stk
import pymongo
# Connect to a MongoDB. This example connects to a local
# MongoDB, but you can connect to a remote DB too with
# MongoClient() - read the documentation for pymongo to see how
# to do that.
client = pymongo.MongoClient()
# You have to choose name for your collection.
energy_db = stk.ValueMongoDb(client, 'energy')
Here, energy_db
will store energy values. Lets create a function
to calculate the energy of a molecule.
import rdkit.Chem.AllChem as rdkit
def get_energy(molecule):
rdkit_molecule = molecule.to_rdkit_mol()
rdkit.SanitizeMol(rdkit_molecule)
ff = rdkit.UFFGetMoleculeForceField(rdkit_molecule)
return ff.CalcEnergy()
Now we can deposit the energy value into the database
bb = stk.BuildingBlock('BrCCCCBr')
# Note that as soon as put() is called, the value is placed into
# permanent storage.
energy_db.put(bb, get_energy(bb))
Note that stk
databases do not have a staging area. The
moment you call put()
, the value is
committed to the database.
To retrieve a value from the database, you provide the molecule, whose value you are interested in
energy = energy_db.get(bb)
If we make the same molecule in some other way, for example we
can make BrCCCCBr
as a constructed molecule
polymer = stk.ConstructedMolecule(
topology_graph=stk.polymer.Linear(
building_blocks=(
stk.BuildingBlock('BrCCBr', [stk.BromoFactory()]),
),
repeating_unit='A',
num_repeating_units=2,
),
)
we can still retrieve the value
# You get the correct energy out, because polymer and bb are
# actually the same molecule.
bb_energy = energy_db.get(polymer)
You can also use a ConstructedMolecule
to deposit values
into the database, for example
atom_count_db = stk.ValueMongoDb(client, 'atom_counts')
atom_count_db.put(polymer, polymer.get_num_atoms())
These values will also be accessible in a later session
# Assume this a new Python session.
import stk
import pymongo
client = pymongo.MongoClient()
energy_db = stk.ValueMongoDb(client, 'energy')
atom_count_db = stk.ValueMongoDb(client, 'atom_counts')
bb = stk.BuildingBlock('BrCCCCBr')
bb_energy = energy_db.get(bb)
bb_atom_count = atom_count_db.get(bb)
Finally, you can also store, and retrieve, a tuple
of values
from the database. For example,
centroid_db = stk.ValueMongoDb(client, 'centroids')
# Centroid is a position, and therefore a tuple of 3 floats.
centroid_db.put(bb, tuple(bb.get_centroid()))
# Retrieve the centroid.
centroid = centroid_db.get(bb)
Visualising Molecular Datasets using chemiscope
Requirements
chemiscope
makes it easy for you to write a .json
or .json.gz
containing stk
molecules and their properties; see an example.
To get chemiscope
, you can install it with pip:
$ pip install chemiscope
Writing to file
chemiscope
can be used through the web-interface or can be embedded
into read-the-docs pages through a sphinx package, like for toy models.
Either way, you need to write a .json
or .json.gz
file
import stk
import chemiscope
structures = [
stk.BuildingBlock(smiles="NCCN"),
stk.BuildingBlock(smiles="NCCCN"),
# A mostly optimised cage molecule.
stk.ConstructedMolecule(
topology_graph=stk.cage.FourPlusSix(
building_blocks=(
stk.BuildingBlock(
smiles="NCCN",
functional_groups=[stk.PrimaryAminoFactory()],
),
stk.BuildingBlock(
smiles="O=CC(C=O)C=O",
functional_groups=[stk.AldehydeFactory()],
),
),
optimizer=stk.MCHammer(),
),
)
]
# Write their properties to a dictionary.
properties = {
"num_atoms": [molecule.get_num_atoms() for molecule in structures],
"num_bonds": [
len(list(molecule.get_bonds())) for molecule in structures
],
}
# Define stk bonding.
shape_dict = chemiscope.convert_stk_bonds_as_shapes(
frames=structures,
bond_color="#fc5500",
bond_radius=0.12,
)
# Write the shape string for settings to turn them on automatically.
shape_string = ",".join(shape_dict.keys())
# Write to file.
chemiscope.write_input(
path="stk_example.json.gz",
frames=structures,
properties=properties,
meta=dict(name="A name."),
settings=chemiscope.quick_settings(
x="num_atoms",
y="num_bonds",
color="",
structure_settings={
"shape": shape_string,
"atoms": True,
"bonds": False,
"spaceFilling": False,
},
),
shapes=shape_dict,
)
See also
chemiscope: https://chemiscope.org/
Specifying Functional Groups Individually
If you want to be more precise about which functional groups get
created, you can provide them directly to the BuildingBlock
.
For example, if you have multiple bromo groups on a molecule, but
you only want to use one during construction
import stk
bb = stk.BuildingBlock(
smiles='BrCCCBr',
functional_groups=[
stk.Bromo(
# The number is the atom's id.
bromine=stk.Br(0),
atom=stk.C(1),
# bonders are atoms which have bonds added during
# construction.
bonders=(stk.C(1), ),
# deleters are atoms which are deleted during
# construction.
deleters=(stk.Br(0), ),
),
],
)
When creating a Bromo
functional group, you have to
specify things like which atoms have bonds added during construction,
and which ones are removed during construction. These are specified by
the bonders and deleters parameters, respectively. You can add
as many functional groups to BuildingBlock
as you like
in this way, and you can mix different types of
functional_group
. You can even mix
a functional_group
instances with
functional_group_factory
instances.
Changing Bonder, Placer and Deleter Atoms in Functional Group Factories
In the previous example, you saw that during creation of a
Bromo
instance, you can specify which atoms have bonds
added during construction, and which atoms are deleted during
construction. You might like to customize this in the functional groups
created by a functional_group_factory
.
Take, for example, a CarboxylicAcid
functional group. There
are two likely ways you would like to modify
this group, C(=O)O
, during construction. In the first way, you want
to add a bond to the carbon atom, and delete the OH
group, which is
treated as a leaving group. This is what
CarboxylicAcidFactory
will do by default
import stk
bb = stk.BuildingBlock(
smiles='O=C(O)CCC(=O)O',
functional_groups=[stk.CarboxylicAcidFactory()],
)
Here, bb
will have two CarboxylicAcid
functional groups.
In each, the deleter atoms will be the oxygen and hydrogen atom of
the OH
group, and the bonder atom will be the carbon atom.
Now, the second way you might want to modify a carobxylic acid group,
is to only delete the hydrogen atom of the OH
group during
construction, and add a bond to the oxygen atom of the
OH
group. This means the hydrogen atom is the deleter atom and
the oxygen atom is the bonder atom. You can tell the
CarboxylicAcidFactory
to create CarboxylicAcid
instances of this kind
bb2 = stk.BuildingBlock(
smiles='O=C(O)CCC(=O)O',
functional_groups=[
stk.CarboxylicAcidFactory(
# Atom number 3 corresponds to the OH oxygen atom in a
# carboxylic acid group. THIS IS NOT THE ATOM'S ID IN
# THE MOLECULE.
bonders=(3, ),
# Atom number 4 corresponds to the hydrogen atom in a
# carboxylic acid group. THIS IS NOT THE ATOM'S ID IN
# THE MOLECULE.
deleters=(4, ),
),
],
)
Here, bb2
will also have two CarboxylicAcid
functional
groups. In each, the deleter atom will be the hydrogen of the
OH
group and the bonder atom will be the oxygen atom of the
OH
group.
You might be wondering: “How do I know which number to use for
which atom in the functional group, so that I can specify the correct
atoms to be the bonders or deleters?” The docstring of
CarboxylicAcidFactory
will tell you which number corresponds
to which atom in the functional group. The same is true for any
other functional_group_factory
. Note that the number you
provide to the factory, is not the id of the atom found in the
molecule!!
Finally, each functional_group_factory
allows for the user to
provide the placer ids, which tell stk
which atoms in a functional
group should be used for aligning and placing the molecule during
construction. In most cases, this is not an important parameter and the
default behaviour (where placers and bonders are equivalent) is
appropropriate. However, in cases where your building block is small,
you may get NAN or division by zero errors due to the vectors in a
building block overlapping. If this occurs, we recommend using bonder
and deleter atoms as placers like so:
bb2 = stk.BuildingBlock(
smiles='C(Br)(Br)Br',
functional_groups=stk.BromoFactory(
# Defaults.
bonders=(0,),
deleters=(1,),
# Include both atoms.
placers=(0, 1)
),
)
Handling Molecules with Metal Atoms and Dative Bonds
All stk
Molecule
instances (such as
BuildingBlock
and ConstructedMolecule
) can contain
metal atoms and handle various coordination reactions.
In order to represent dative bonds in these systems, a bond order of
9 is used.
Furthermore, when working with metal-containing systems, any
BuildingBlock
initialization functions that require ETKDG
may fail, because the ETKDG algorithm is liable to fail in these cases.
In cases like this, you probably want to set the position matrix
explicitly, which will mean that ETKDG will not be used.
import stk
bb = stk.BuildingBlock('[Fe+2]', position_matrix=[[0., 0., 0.]])
If you want to get a more complex position matrix, defining a function may be a good idea
import rdkit.Chem.AllChem as rdkit
def get_position_matrix(smiles):
molecule = rdkit.AddHs(rdkit.MolFromSmiles(smiles))
rdkit.EmbedMolecule(molecule, randomSeed=12)
rdkit.UFFOptimizeMolecule(molecule)
return molecule.GetConformer().GetPositions()
smiles = 'CCCO->[Fe+2]'
bb2 = stk.BuildingBlock(
smiles=smiles,
position_matrix=get_position_matrix(smiles),
)
Finally, stk
will also read bonds from .mol
files,
which have a bond order of 9, as dative.
Making Keys for Molecules with Dative Bonds
Dative bonds are not defined in an InChI or InChiIKey.
Therefore, when storing metal-containing molecules in a
database, a different key is required. Because dative bonds are
implemented in SMILES, the SMILES string makes a
useful key for metal-containing molecules. You can use the
Smiles
key maker for this purpose
import stk
import pymongo
db = stk.MoleculeMongoDb(
mongo_client=pymongo.MongoClient(),
jsonizer=stk.MoleculeJsonizer(
key_makers=(stk.Smiles(), ),
),
)
bb = stk.BuildingBlock('BrO->[Fe+2]')
db.put(bb)
# Use the Smiles() key maker to get the retrieval SMILES,
# to make sure it has canonical atom ordering.
canonical_smiles = stk.Smiles().get_key(bb)
retrieved_bb = db.get({'SMILES': canonical_smiles})
Creating New Topology Graphs with Existing Vertices
The vertex classes that make up topology graphs in stk
can be
accessed to speed up the implemention of new and arbitrary topology
graphs (as shown below). The exact details of how vertices can be used
to implement new topology graphs depends on the topology graph, so read
that documentation for further examples. For metal complexes, you would
read the documentation of MetalComplex
.
import stk
import numpy as np
class NewMetalComplexTopology(stk.metal_complex.MetalComplex):
_metal_vertex_prototypes = (
stk.metal_complex.MetalVertex(0, (1., 0., 0.)),
)
_ligand_vertex_prototypes = (
stk.metal_complex.MonoDentateLigandVertex(1, (2., 0., 0.)),
stk.metal_complex.MonoDentateLigandVertex(2, (0., 0., 0.)),
)
# Define Edges below.
_edge_prototypes = (
stk.Edge(
id=0,
vertex1=_metal_vertex_prototypes[0],
vertex2=_ligand_vertex_prototypes[0],
),
stk.Edge(
id=1,
vertex1=_metal_vertex_prototypes[0],
vertex2=_ligand_vertex_prototypes[1],
),
)
# Build new metal complex.
ligand = stk.BuildingBlock('NCC', [stk.PrimaryAminoFactory()])
metal = stk.BuildingBlock(
smiles='[Fe+2]',
functional_groups=(
stk.SingleAtom(stk.Fe(0, charge=2))
for i in range(2)
),
position_matrix=np.array([[0., 0., 0.]]),
)
complex = stk.ConstructedMolecule(
topology_graph=NewMetalComplexTopology(
metals=metal,
ligands=ligand,
)
)
Extending stk
There are a lot of ways to extend stk
, for example by adding
new functional groups, topology graphs, mutation operations and so on.
However, because every part of stk
is built around an abstract
base class, all you need to do is find the appropriate abstract base
class, and create a new subclass for it. The abstract base class
will provide documentation and examples on how to create a subclass.
You can easily find the abstract base classes by looking at the
sidebar.