stk.ConstructedMoleculeMongoDb
- class stk.ConstructedMoleculeMongoDb(mongo_client, database='stk', molecule_collection='molecules', constructed_molecule_collection='constructed_molecules', position_matrix_collection='position_matrices', building_block_position_matrix_collection='building_block_position_matrices', jsonizer=ConstructedMoleculeJsonizer((InchiKey(),)), dejsonizer=ConstructedMoleculeDejsonizer(), put_lru_cache_size=128, get_lru_cache_size=128, indices=('InChIKey',))[source]
Bases:
ConstructedMoleculeDatabase
Uses MongoDB to store and retrieve constructed molecules.
See also
MoleculeMongoDb
If you need to store and retrieve molecules, which are not
ConstructedMolecule
instances, use aMoleculeMongoDb
.
Examples
Storing and Retrieving Constructed Molecules
You want to store and retrieve a
ConstructedMolecule
from the databaseimport stk import pymongo # Connect to a MongoDB. This example connects to a local # MongoDB, but you can connect to a remote DB too with # MongoClient() - read the documentation for pymongo to see how # to do that. client = pymongo.MongoClient() db = stk.ConstructedMoleculeMongoDb(client) # Create a molecule. polymer = stk.ConstructedMolecule( topology_graph=stk.polymer.Linear( building_blocks=( stk.BuildingBlock('BrCCBr', [stk.BromoFactory()]), ), repeating_unit='A', num_repeating_units=2, ), ) # Place it into the database. db.put(polymer) # Retrieve it from the database. key_maker = stk.InchiKey() retrieved = db.get({ key_maker.get_key_name(): key_maker.get_key(polymer), })
Note that the molecule retrieved from that database can have a different atom ordering than the one put into it. So while the molecule will have the same structure, the order of the atoms may be different to the molecule placed into the database. This is because the database gives the molecules a canonical atom ordering, which allows position matrices to be used across different atom id orderings.
Iterating over All Entries in the Database
All entries in a database can be iterated over very simply
for entry in db.get_all(): # Do something to the entry. print(stk.Smiles().get_key(entry))
Using Alternative Keys for Retrieving Molecules
By default, the only molecular key the database stores, is the InChIKey. However, additional keys can be added to the JSON stored in the database by using a different
ConstructedMoleculeJsonizer
import stk import pymongo db = stk.ConstructedMoleculeMongoDb( mongo_client=pymongo.MongoClient(), # Store the InChI and the InChIKey of molecules in # the JSON representation. jsonizer=stk.ConstructedMoleculeJsonizer( key_makers=(stk.Inchi(), stk.InchiKey()), ), ) # Create a molecule. polymer = stk.ConstructedMolecule( topology_graph=stk.polymer.Linear( building_blocks=( stk.BuildingBlock('BrCCBr', [stk.BromoFactory()]), ), repeating_unit='A', num_repeating_units=2, ), ) # Places the JSON of the molecule into the database. In this # case, the JSON includes both the InChI and the InChIKey. db.put(polymer) # You can now use the InChI or the InChIKey to retrieve the # molecule from the database. key_maker = stk.Inchi() retrieved = db.get({ key_maker.get_key_name(): key_maker.get_key(polymer), })
Obviously, most of the time, you won’t have the molecule you are trying to retrieve from the database. Maybe you only have the SMILES of the molecule. You can still retrieve it
import rdkit.Chem.AllChem as rdkit retrieved2 = db.get({ 'InChI': rdkit.MolToInchi(rdkit.MolFromSmiles('BrCCCCBr')), })
As long as you have the name of the key, and the expected value of the key, you can retrieve your molecule from the database.
Note that you can create your own keys and add them to the database
# Create your own key. This one is called "SMILES" and the # value is the SMILES of the molecule. class Smiles(stk.MoleculeKeyMaker): def __init__(self): return def get_key_name(self): return 'SMILES' def get_key(self, molecule): return rdkit.MolToSmiles(molecule.to_rdkit_mol()) db = stk.ConstructedMoleculeMongoDb( mongo_client=pymongo.MongoClient(), jsonizer=stk.ConstructedMoleculeJsonizer( # Include your own custom key maker in the JSON # representation. key_makers = (stk.Inchi(), stk.InchiKey(), Smiles()), ), ) # Place the JSON of your molecule into the database. In this # case the JSON will include a key called "SMILES" and # the value will be the SMILES of the molecule. db.put(polymer) # You can now find your molecule by using SMILES as the key, def normalize_smiles(smiles): return rdkit.MolToSmiles( mol=rdkit.AddHs(rdkit.MolFromSmiles(smiles)), ) retrieved3 = db.get({'SMILES': normalize_smiles('BrCCCCBr')})
Often, it is unnecessary to create a whole subclass for a your custom key
smiles = stk.MoleculeKeyMaker( key_name='SMILES', get_key=lambda molecule: rdkit.MolToSmiles(molecule.to_rdkit_mol()), ) db = stk.ConstructedMoleculeMongoDb( mongo_client=pymongo.MongoClient(), jsonizer=stk.ConstructedMoleculeJsonizer( key_makers=(stk.InchiKey(), smiles), ) )
Note that the key you use to get the molecule back from the database should be unique. In other words, there should always just be one molecule which has that key in the database. Using a key that is matched by multiple molecules will likely cause your data to be jumbled. For example, you might return the atoms of one of the molecules matched by the key but holding the position matrix of the second molecule matched by the key.
Initialize a
ConstructedMoleculeMongoDb
.- Parameters:
mongo_client (
pymongo.MongoClient
) – The database client.database (
str
) – The name of the database to use.molecule_collection (
str
) – The name of the collection which stores molecular information.constructed_molecule_collection (
str
) – The name of the collection which stored constructed molecule information, that does not belong in the molecule_collection.position_matrix_collection (
str
) – The name of the collection which stores the position matrices of the molecules put into and retrieved from the database.building_block_position_matrix_collection (
str
) – The name of the collection, which stores the position matrices of the building blocks of the constructed molecules put into and retrieved from the database.jsonizer (
ConstructedMoleculeJsonizer
) – Used to create the JSON representations of molecules stored in the database.dejsonizer (
ConstructedMoleculeDejsonizer
) – Used to createMolecule
instances from their JSON representations.put_lru_cache_size (
int
, optional) – A RAM-based least recently used cache is used to avoid writing to the database repeatedly. This sets the number of values which fit into the LRU cache. IfNone
, the cache size will be unlimited.get_lru_cache_size (
int
, optional) – A RAM-based least recently used cache is used to avoid reading from the database repeatedly. This sets the number of values which fit into the LRU cache. IfNone
, the cache size will be unlimited.indices (
tuple
ofstr
, optional) – The names of molecule keys, on which an index should be created, in order to minimize lookup time.
Methods
Get the molecule with key from the database.
Get all entries in the database.
Put molecule into the database.
- get(key)[source]
Get the molecule with key from the database.
- get_all()[source]
Get all entries in the database.
- Yields:
ConstructedMolecule
– A molecule in the database.
- put(molecule)[source]
Put molecule into the database.
- Parameters:
molecule (
ConstructedMolecule
) – The molecule to place into the database.- Returns:
None
- Return type:
NoneType