Basic Examples

Note

The molecules displayed are interactive renderings.

Creating Building Blocks

There are a number of ways to create BuildingBlock molecules, but the two most common ways are direct creation through SMILES strings and loading them from molecule structure files.

import stk

bb1 = stk.BuildingBlock('NCCCN')
bb2 = stk.BuildingBlock.init_from_file('path/to/file.mol')

Look at the documentation of BuildingBlock to see other available initialization methods. They are easy to find, as they all begin with init.

Specifying Multiple Functional Groups

When you create a BuildingBlock, you also need to specify which atoms are modified during construction of a ConstructedMolecule. This is achieved by providing the BuildingBlock with functional_group instances. To save you the pain of creating function groups one by one, you can use a functional_group_factory. If you have a building block with bromo groups, and you want the bromo groups to be modified during construction, you would use a BromoFactory

import stk

bb = stk.BuildingBlock('BrCCCBr', [stk.BromoFactory()])

The bb, in the example above, would have two Bromo functional groups. Similarly, if you have a building block with aldehyde groups

bb2 = stk.BuildingBlock('O=CCCC=O', [stk.AldehydeFactory()])

In this example, bb2 will have two Aldehyde functional groups. Finally, if you have both aldehyde and bromo groups on a molecule, and you want both to be modified during construction, you would use both of the factories

bb3 = stk.BuildingBlock(
    smiles='O=CCCBr',
    functional_groups=[stk.AldehydeFactory(), stk.BromoFactory()],
)

In the example above, bb3 has one Bromo and one Aldehyde functional group.

Constructing Molecules

To construct molecules, you need to create a new ConstructedMolecule. The required input consists of a TopologyGraph, which, in turn, requires BuildingBlock instances.

import stk

# React the amine functional groups during construction.
bb1 = stk.BuildingBlock('NCCN', [stk.PrimaryAminoFactory()])
# React the aldehyde functional groups during construction.
bb2 = stk.BuildingBlock('O=CCCC=O', [stk.AldehydeFactory()])
# Build a polymer.
polymer = stk.ConstructedMolecule(
    topology_graph=stk.polymer.Linear(
        building_blocks=(bb1, bb2),
        repeating_unit='AB',
        num_repeating_units=4,
        optimizer=stk.Collapser(scale_steps=False),
    ),
)
# Build a longer polymer.
longer = stk.ConstructedMolecule(
    topology_graph=stk.polymer.Linear(
        building_blocks=(bb1, bb2),
        repeating_unit='AB',
        num_repeating_units=8,
        optimizer=stk.Collapser(scale_steps=False),
    ),
)

Each topology graph requires different input parameters. For example, organic cage topology graphs only require the BuildingBlock instances.

# The cage requires a building block with 3 functional groups.
cage_bb2 = stk.BuildingBlock(
    smiles='O=CC(C=O)CC=O',
    functional_groups=[stk.AldehydeFactory()],
)
cage = stk.ConstructedMolecule(
    topology_graph=stk.cage.FourPlusSix((bb1, cage_bb2)),
)

Note that this structure is far from ideal, the next example shows how to make one with shorter bond lengths!

Read the documentation for each kind of TopologyGraph, for more examples on how to initialize it, and to see what optional parameters you have available.

Using Built-in Optimizers During Construction

All TopologyGraph instances take an optimizer argument, which provides efficient optimization of stk structures from their expanded form. No optimization will be performed with the NullOptimizer.

Collapser performs rigid translations of the building blocks toward the centroid of the ConstructedMolecule until steric clashes occur.

import stk

bb1 = stk.BuildingBlock('NCCN', [stk.PrimaryAminoFactory()])
bb2 = stk.BuildingBlock('O=CCCC=O', [stk.AldehydeFactory()])
polymer = stk.ConstructedMolecule(
    topology_graph=stk.polymer.Linear(
        building_blocks=(bb1, bb2),
        repeating_unit='AB',
        num_repeating_units=3,
        optimizer=stk.Collapser(scale_steps=False),
    ),
)

Similarly, MCHammer performs rigid translations of the building blocks either toward the centroid of the ConstructedMolecule or along the bonds formed during construction following a Metropolis Monte Carlo algorithm with simplified potential energy terms for the long bonds and nonbonded interactions.

bb1 = stk.BuildingBlock('NCCN', [stk.PrimaryAminoFactory()])
bb2 = stk.BuildingBlock(
    smiles='O=CC(C=O)CC=O',
    functional_groups=[stk.AldehydeFactory()],
)
cage = stk.ConstructedMolecule(
    topology_graph=stk.cage.FourPlusSix(
        building_blocks=(bb1, bb2),
        optimizer=stk.MCHammer(),
    ),
)

See also

The Collapser and MCHammer optimizers use the algorithms from https://github.com/andrewtarzia/MCHammer. stk returns the final molecule only but further visualisation of the full trajectory and properties can be performed using the MCHammer code explicitly. This is useful for determining optimal optimization parameters, for which safe options are provided by default in stk.

Using RDKit to Optimize Molecular Structures

Molecules used by stk can be structure optimized both before and after construction. One easy way to do is, is with the rdkit library. You can optimize any stk Molecule, such as a BuildingBlock

import stk
import rdkit.Chem.AllChem as rdkit

bb = stk.BuildingBlock('BrCCBr', [stk.BromoFactory()])

# Optimize with the MMFF force field.

rdkit_bb = bb.to_rdkit_mol()
rdkit.SanitizeMol(rdkit_bb)
rdkit.MMFFOptimizeMolecule(rdkit_bb)

# stk molecules are immutable. with_position_matrix returns a
# a clone, holding the new position matrix.
bb = bb.with_position_matrix(
    position_matrix=rdkit_bb.GetConformer().GetPositions(),
)

or a ConstructedMolecule

polymer = stk.ConstructedMolecule(
    topology_graph=stk.polymer.Linear((bb, ), 'A', 8),
)

# Optimize with the MMFF force field.

rdkit_polymer = polymer.to_rdkit_mol()
rdkit.SanitizeMol(rdkit_polymer)
rdkit.MMFFOptimizeMolecule(rdkit_polymer)

# stk molecules are immutable. with_position_matrix returns a
# a clone, holding the new position matrix.
polymer = polymer.with_position_matrix(
    position_matrix=rdkit_polymer.GetConformer().GetPositions(),
)

Writing Molecular Files

The simplest way to save molecules is to write them to a file. This works with any Molecule, including both the BuildingBlock

import stk

bb = stk.BuildingBlock(
    smiles='ICCBr',
    functional_groups=[stk.BromoFactory(), stk.IodoFactory()],
)
stk.MolWriter().write(bb, 'bb.mol')

and the ConstructedMolecule

polymer = stk.ConstructedMolecule(
    topology_graph=stk.polymer.Linear((bb, ), 'A', 10),
)
stk.MolWriter().write(polymer, 'polymer.mol')

You can see what file formats are supported by reading the documentation for write().

Placing and Retrieving Molecules From a Database

Requirements

stk allows you to place molecules into a MoleculeDatabase. Out-of-the-box, stk comes with support for a MoleculeMongoDb. In order to use it locally, you have to install MongoDB on your computer.

Documentation for installing, and making sure your local MongoDB is working properly, can be found here. Trust me, this is easy to do and worth it.

You can also use a remote MongoDB, in which case you do not have to install it locally, but you will still need to install pymongo.

Molecules and Building Blocks

To place molecules into the database, first create the database

import stk
import pymongo

# Connect to a MongoDB. This example connects to a local
# MongoDB, but you can connect to a remote DB too with
# MongoClient() - read the documentation for pymongo to see how
# to do that.
client = pymongo.MongoClient()
db = stk.MoleculeMongoDb(client)

You then create and place a molecule into the database, for example, a BuildingBlock

bb = stk.BuildingBlock('BrCCBr', [stk.BromoFactory()])
# Note that as soon as put() is called, the molecule is placed
# into permanent storage.
db.put(bb)

Note that stk databases do not have a staging area. The moment you call put(), the molecule is committed to the database.

To retrieve a molecule from the database, by default, you would provide the InChIKey. To first thing you might want to do is write a function which turns the SMILES of a molecule into the InChIKey

import rdkit.Chem.AllChem as rdkit

def get_inchi_key(smiles):
    return rdkit.MolToInchiKey(rdkit.MolFromSmiles(smiles))

Now we can load the molecule from the database, by providing the SMILES of the molecule

loaded = db.get({
    'InChIKey': get_inchi_key('BrCCBr'),
})

However, this step can be customized. For example, the documentation of MoleculeMongoDb, shows how you can use SMILES to retrieve your molecules, without needing to write a function like get_inchi_key().

The loaded molecule is only a Molecule instance, and not a BuildingBlock instance, which means that it lacks functional groups. You can restore your functional groups however

loaded_bb = stk.BuildingBlock.init_from_molecule(
    molecule=loaded,
    functional_groups=[stk.BromoFactory()],
)

Constructed Molecules

You can use the same database for placing ConstructedMolecule instances

polymer = stk.ConstructedMolecule(
    topology_graph=stk.polymer.Linear((bb, ), 'A', 2),
)
db.put(polymer)

and restore them in the same way

loaded = db.get({
    'InChIKey': get_inchi_key('BrCCCCBr'),
})

However, once again, loaded will only be a Molecule instance, and not a ConstructedMolecule instance.

If you want to store and retrieve ConstructedMolecule instances, you have to create a ConstructedMoleculeMongoDb

constructed_db = stk.ConstructedMoleculeMongoDb(client)
constructed_db.put(polymer)
loaded_polymer = constructed_db.get({
    'InChIKey': get_inchi_key('BrCCCCBr'),
})

Unlike loaded, loaded_polymer is a ConstructedMolecule instance.

Placing and Retrieving Molecular Property Values From a Database

Requirements

Using a ValueMongoDb has the same requirements as the previous example.

Storing Values

Unlike the previous example, you can deposit values for both a BuildingBlock and a ConstructedMolecule in the same database. First, lets create one

import stk
import pymongo

# Connect to a MongoDB. This example connects to a local
# MongoDB, but you can connect to a remote DB too with
# MongoClient() - read the documentation for pymongo to see how
# to do that.
client = pymongo.MongoClient()

# You have to choose name for your collection.
energy_db = stk.ValueMongoDb(client, 'energy')

Here, energy_db will store energy values. Lets create a function to calculate the energy of a molecule.

import rdkit.Chem.AllChem as rdkit

def get_energy(molecule):
    rdkit_molecule = molecule.to_rdkit_mol()
    rdkit.SanitizeMol(rdkit_molecule)
    ff = rdkit.UFFGetMoleculeForceField(rdkit_molecule)
    return ff.CalcEnergy()

Now we can deposit the energy value into the database

bb = stk.BuildingBlock('BrCCCCBr')
# Note that as soon as put() is called, the value is placed into
# permanent storage.
energy_db.put(bb, get_energy(bb))

Note that stk databases do not have a staging area. The moment you call put(), the value is committed to the database.

To retrieve a value from the database, you provide the molecule, whose value you are interested in

energy = energy_db.get(bb)

If we make the same molecule in some other way, for example we can make BrCCCCBr as a constructed molecule

polymer = stk.ConstructedMolecule(
    topology_graph=stk.polymer.Linear(
        building_blocks=(
            stk.BuildingBlock('BrCCBr', [stk.BromoFactory()]),
        ),
        repeating_unit='A',
        num_repeating_units=2,
    ),
)

we can still retrieve the value

# You get the correct energy out, because polymer and bb are
# actually the same molecule.
bb_energy = energy_db.get(polymer)

You can also use a ConstructedMolecule to deposit values into the database, for example

atom_count_db = stk.ValueMongoDb(client, 'atom_counts')
atom_count_db.put(polymer, polymer.get_num_atoms())

These values will also be accessible in a later session

# Assume this a new Python session.
import stk
import pymongo


client = pymongo.MongoClient()
energy_db = stk.ValueMongoDb(client, 'energy')
atom_count_db = stk.ValueMongoDb(client, 'atom_counts')

bb = stk.BuildingBlock('BrCCCCBr')
bb_energy = energy_db.get(bb)
bb_atom_count = atom_count_db.get(bb)

Finally, you can also store, and retrieve, a tuple of values from the database. For example,

centroid_db = stk.ValueMongoDb(client, 'centroids')
# Centroid is a position, and therefore a tuple of 3 floats.
centroid_db.put(bb, tuple(bb.get_centroid()))

# Retrieve the centroid.
centroid = centroid_db.get(bb)

Visualising Molecular Datasets using chemiscope

Requirements

chemiscope makes it easy for you to write a .json or .json.gz containing stk molecules and their properties; see an example.

To get chemiscope, you can install it with pip:

$ pip install chemiscope

Writing to file

chemiscope can be used through the web-interface or can be embedded into read-the-docs pages through a sphinx package, like for toy models.

Either way, you need to write a .json or .json.gz file

import stk
import chemiscope

structures = [
    stk.BuildingBlock(smiles="NCCN"),
    stk.BuildingBlock(smiles="NCCCN"),
    # A mostly optimised cage molecule.
    stk.ConstructedMolecule(
        topology_graph=stk.cage.FourPlusSix(
            building_blocks=(
                stk.BuildingBlock(
                    smiles="NCCN",
                    functional_groups=[stk.PrimaryAminoFactory()],
                ),
                stk.BuildingBlock(
                    smiles="O=CC(C=O)C=O",
                    functional_groups=[stk.AldehydeFactory()],
                ),
            ),
            optimizer=stk.MCHammer(),
        ),
    )
]

# Write their properties to a dictionary.
properties = {
    "num_atoms": [molecule.get_num_atoms() for molecule in structures],
    "num_bonds": [
        len(list(molecule.get_bonds())) for molecule in structures
    ],
}

# Define stk bonding.
shape_dict = chemiscope.convert_stk_bonds_as_shapes(
    frames=structures,
    bond_color="#fc5500",
    bond_radius=0.12,
)

# Write the shape string for settings to turn them on automatically.
shape_string = ",".join(shape_dict.keys())

# Write to file.
chemiscope.write_input(
    path="stk_example.json.gz",
    frames=structures,
    properties=properties,
    meta=dict(name="A name."),
    settings=chemiscope.quick_settings(
        x="num_atoms",
        y="num_bonds",
        color="",
        structure_settings={
            "shape": shape_string,
            "atoms": True,
            "bonds": False,
            "spaceFilling": False,
        },
    ),
    shapes=shape_dict,
)

See also

chemiscope: https://chemiscope.org/

Specifying Functional Groups Individually

If you want to be more precise about which functional groups get created, you can provide them directly to the BuildingBlock. For example, if you have multiple bromo groups on a molecule, but you only want to use one during construction

import stk

bb = stk.BuildingBlock(
    smiles='BrCCCBr',
    functional_groups=[
        stk.Bromo(
            # The number is the atom's id.
            bromine=stk.Br(0),
            atom=stk.C(1),
            # bonders are atoms which have bonds added during
            # construction.
            bonders=(stk.C(1), ),
            # deleters are atoms which are deleted during
            # construction.
            deleters=(stk.Br(0), ),
        ),
    ],
)

When creating a Bromo functional group, you have to specify things like which atoms have bonds added during construction, and which ones are removed during construction. These are specified by the bonders and deleters parameters, respectively. You can add as many functional groups to BuildingBlock as you like in this way, and you can mix different types of functional_group. You can even mix a functional_group instances with functional_group_factory instances.

Changing Bonder, Placer and Deleter Atoms in Functional Group Factories

In the previous example, you saw that during creation of a Bromo instance, you can specify which atoms have bonds added during construction, and which atoms are deleted during construction. You might like to customize this in the functional groups created by a functional_group_factory.

Take, for example, a CarboxylicAcid functional group. There are two likely ways you would like to modify this group, C(=O)O, during construction. In the first way, you want to add a bond to the carbon atom, and delete the OH group, which is treated as a leaving group. This is what CarboxylicAcidFactory will do by default

import stk

bb = stk.BuildingBlock(
    smiles='O=C(O)CCC(=O)O',
    functional_groups=[stk.CarboxylicAcidFactory()],
)

Here, bb will have two CarboxylicAcid functional groups. In each, the deleter atoms will be the oxygen and hydrogen atom of the OH group, and the bonder atom will be the carbon atom.

Now, the second way you might want to modify a carobxylic acid group, is to only delete the hydrogen atom of the OH group during construction, and add a bond to the oxygen atom of the OH group. This means the hydrogen atom is the deleter atom and the oxygen atom is the bonder atom. You can tell the CarboxylicAcidFactory to create CarboxylicAcid instances of this kind

bb2 = stk.BuildingBlock(
    smiles='O=C(O)CCC(=O)O',
    functional_groups=[
        stk.CarboxylicAcidFactory(
            # Atom number 3 corresponds to the OH oxygen atom in a
            # carboxylic acid group. THIS IS NOT THE ATOM'S ID IN
            # THE MOLECULE.
            bonders=(3, ),
            # Atom number 4 corresponds to the hydrogen atom in a
            # carboxylic acid group. THIS IS NOT THE ATOM'S ID IN
            # THE MOLECULE.
            deleters=(4, ),
        ),
    ],
)

Here, bb2 will also have two CarboxylicAcid functional groups. In each, the deleter atom will be the hydrogen of the OH group and the bonder atom will be the oxygen atom of the OH group.

You might be wondering: “How do I know which number to use for which atom in the functional group, so that I can specify the correct atoms to be the bonders or deleters?” The docstring of CarboxylicAcidFactory will tell you which number corresponds to which atom in the functional group. The same is true for any other functional_group_factory. Note that the number you provide to the factory, is not the id of the atom found in the molecule!!

Finally, each functional_group_factory allows for the user to provide the placer ids, which tell stk which atoms in a functional group should be used for aligning and placing the molecule during construction. In most cases, this is not an important parameter and the default behaviour (where placers and bonders are equivalent) is appropropriate. However, in cases where your building block is small, you may get NAN or division by zero errors due to the vectors in a building block overlapping. If this occurs, we recommend using bonder and deleter atoms as placers like so:

bb2 = stk.BuildingBlock(
    smiles='C(Br)(Br)Br',
    functional_groups=stk.BromoFactory(
        # Defaults.
        bonders=(0,),
        deleters=(1,),
        # Include both atoms.
        placers=(0, 1)
    ),
)

Handling Molecules with Metal Atoms and Dative Bonds

All stk Molecule instances (such as BuildingBlock and ConstructedMolecule) can contain metal atoms and handle various coordination reactions. In order to represent dative bonds in these systems, a bond order of 9 is used.

Furthermore, when working with metal-containing systems, any BuildingBlock initialization functions that require ETKDG may fail, because the ETKDG algorithm is liable to fail in these cases. In cases like this, you probably want to set the position matrix explicitly, which will mean that ETKDG will not be used.

import stk

bb = stk.BuildingBlock('[Fe+2]', position_matrix=[[0., 0., 0.]])

If you want to get a more complex position matrix, defining a function may be a good idea

import rdkit.Chem.AllChem as rdkit


def get_position_matrix(smiles):
    molecule = rdkit.AddHs(rdkit.MolFromSmiles(smiles))
    rdkit.EmbedMolecule(molecule, randomSeed=12)
    rdkit.UFFOptimizeMolecule(molecule)
    return molecule.GetConformer().GetPositions()


smiles = 'CCCO->[Fe+2]'
bb2 = stk.BuildingBlock(
    smiles=smiles,
    position_matrix=get_position_matrix(smiles),
)

Finally, stk will also read bonds from .mol files, which have a bond order of 9, as dative.

Making Keys for Molecules with Dative Bonds

Dative bonds are not defined in an InChI or InChiIKey. Therefore, when storing metal-containing molecules in a database, a different key is required. Because dative bonds are implemented in SMILES, the SMILES string makes a useful key for metal-containing molecules. You can use the Smiles key maker for this purpose

import stk
import pymongo

db = stk.MoleculeMongoDb(
    mongo_client=pymongo.MongoClient(),
    jsonizer=stk.MoleculeJsonizer(
        key_makers=(stk.Smiles(), ),
    ),
)
bb = stk.BuildingBlock('BrO->[Fe+2]')
db.put(bb)
# Use the Smiles() key maker to get the retrieval SMILES,
# to make sure it has canonical atom ordering.
canonical_smiles = stk.Smiles().get_key(bb)
retrieved_bb = db.get({'SMILES': canonical_smiles})

Creating New Topology Graphs with Existing Vertices

The vertex classes that make up topology graphs in stk can be accessed to speed up the implemention of new and arbitrary topology graphs (as shown below). The exact details of how vertices can be used to implement new topology graphs depends on the topology graph, so read that documentation for further examples. For metal complexes, you would read the documentation of MetalComplex.

import stk
import numpy as np

class NewMetalComplexTopology(stk.metal_complex.MetalComplex):

    _metal_vertex_prototypes = (
        stk.metal_complex.MetalVertex(0, (1., 0., 0.)),
    )
    _ligand_vertex_prototypes = (
        stk.metal_complex.MonoDentateLigandVertex(1, (2., 0., 0.)),
        stk.metal_complex.MonoDentateLigandVertex(2, (0., 0., 0.)),
    )

    # Define Edges below.
    _edge_prototypes = (
        stk.Edge(
            id=0,
            vertex1=_metal_vertex_prototypes[0],
            vertex2=_ligand_vertex_prototypes[0],
        ),
        stk.Edge(
            id=1,
            vertex1=_metal_vertex_prototypes[0],
            vertex2=_ligand_vertex_prototypes[1],
        ),
    )

# Build new metal complex.
ligand = stk.BuildingBlock('NCC', [stk.PrimaryAminoFactory()])
metal = stk.BuildingBlock(
    smiles='[Fe+2]',
    functional_groups=(
        stk.SingleAtom(stk.Fe(0, charge=2))
        for i in range(2)
    ),
    position_matrix=np.array([[0., 0., 0.]]),
)
complex = stk.ConstructedMolecule(
    topology_graph=NewMetalComplexTopology(
        metals=metal,
        ligands=ligand,
    )
)

Extending stk

There are a lot of ways to extend stk, for example by adding new functional groups, topology graphs, mutation operations and so on. However, because every part of stk is built around an abstract base class, all you need to do is find the appropriate abstract base class, and create a new subclass for it. The abstract base class will provide documentation and examples on how to create a subclass. You can easily find the abstract base classes by looking at the sidebar.