Find MCS is useful function for me, because sometime I want to extract common substructure from compounds.
But, in the case of large amount of compounds set give me boring results like a ethyl and so on. It’s no wonder.
FindMCS function of RDKit has unique solution to solve that. To use “threshold” option I can define the maximum number of molecules that need to calculate MCS.
I found tips, the result of FindMCS with the option depends on order of molecules.
See following codes….
from rdkit import Chem from rdkit.Chem import MCS from rdkit.Chem.Draw import IPythonConsole from rdkit import RDConfig from rdkit.Chem import FragmentCatalog mol1 = Chem.MolFromSmiles("Cc1ccccc1") mol2 = Chem.MolFromSmiles( "CCc1ccccc1" ) mol3 = Chem.MolFromSmiles( "Oc1ccccc1" ) mol4 = Chem.MolFromSmiles( "COc1ccccc1" ) Draw.MolsToGridImage([mol1,mol2,mol3,mol4])
OK get MCS.
res = MCS.FindMCS([mol1,mol2,mol3,mol4], threshold=0.5) res2 = MCS.FindMCS([mol4,mol3,mol2,mol1], threshold=0.5)
Chem.MolFromSmarts(res.smarts)
Chem.MolFromSmarts(res2.smarts)
Different order of molecules gave different result. I will keep that mind!!!