I like Molecular Matched Pair Analysis because of it’s easy to understand and it is intuitively.
Recently P(pair) is extended to S(series) molecular matched series. Developer of openbabel reported MMPS in ACS. http://pubs.acs.org/doi/abs/10.1021/jm500022q
And also, they developed application that is implemented MMPS named Matsy.
I saw Matsy in JCUP and it was quite impressive for me! So, I want to make mmps in my project, but it is difficult for me to make the system.
,,,,,Fortunately, useful presentation was provided from rdkit ugm 2014. ;-)
Click to access OBoyle_MatchedSeries.pdf
OBoyle provided sample script to make MMPS using RDKit!!!!!!!!
Cool!
I reproduce MMPS generator in reference to the PDF.
At first, to use following script I need to make fragmented smiles using rdkit contrib/mmpa/rfrag.py.
Then run the script. And I got following output.
iwatobipen$ python mmps.py inputfile > output.txt iwatobipen$cat output.txt # [*:1]C1COc2ccccc2O1 [*:1]C(=O)Nc1ccc(C(=O)O)cc1 2881039 [*:1]C(=O)Nc1ccc(C(N)=O)cc1 2787356 # [*:1]CNc1ncnc2sccc21 [*:1]c1ccccc1 2139597 [*:1]c1cccnc1 2531831 # [*:1]Cn1nc(C)cc1C [*:1]c1ccc(C(=O)O)cc1 615212 [*:1]c1ccc(C(=O)O)o1 658387 # [*:1]NC(=O)C1COc2ccccc2O1 [*:1]c1ccc(C(=O)O)cc1 2881039 [*:1]c1ccc(C(N)=O)cc1 2787356 # [*:1]Nc1ncnc2sccc21 [*:1]Cc1ccccc1 2139597 [*:1]Cc1cccnc1 2531831 # [*:1]S(=O)(=O)N1CCCCC1 [*:1]c1ccc(-c2cn3cccc(C)c3n2)cc1 1156028 [*:1]c1cccc(-c2cn3cccc(C)c3n2)c1 2963575 # [*:1]c1ccc(C(=O)O)cc1 [*:1]Cn1nc(C)cc1C 615212 [*:1]NC(=O)C1COc2ccccc2O1 2881039 # [*:1]c1ncnc2sccc21 [*:1]NCc1ccccc1 2139597 [*:1]NCc1cccnc1 2531831 # [*:1]n1nc(C)cc1C [*:1]Cc1ccc(C(=O)O)cc1 615212 [*:1]Cc1ccc(C(=O)O)o1 658387
I want to make mmps db maker. I think the db useful because of mmps can catch trend of SAR.
I uploaded the code to myrepo.
https://github.com/iwatobipen/mmps/tree/master/mmps
# mmps.py from rdkit import Chem import sys from collections import namedtuple Frag = namedtuple( 'Frag', [ 'id', 'scaffold', 'rgroup' ] ) class Series(): def __init__( self ): self.rgroups = [] self.scaffold = "" def getFrags( filename ): frags = [] for line in open( filename ): broken = line.rstrip().split( "," ) if broken[ 2 ]: # single cut continue smiles = broken[ -1 ].split( "." ) mols = [ Chem.MolFromSmiles( smi ) for smi in smiles ] numAtoms = [ mol.GetNumAtoms() for mol in mols ] if numAtoms[ 0 ] > 5 and numAtoms[ 1 ] < 12: frags.append( Frag( broken[1], smiles[0], smiles[1] ) ) if numAtoms[ 1 ] > 5 and numAtoms[ 0 ] < 12: frags.append( Frag( broken[1], smiles[1], smiles[0] ) ) frags.sort( key=lambda x:( x.scaffold, x.rgroup ) ) return frags def getSeries( frags ): oldfrag = Frag( None, None, None ) series = Series() for frag in frags: if frag.scaffold != oldfrag.scaffold: if len( series.rgroups ) >= 2: series.scaffold = oldfrag.scaffold yield series series = Series() series.rgroups.append( ( frag.rgroup, frag.id ) ) oldfrag = frag if len( series.rgroups ) >= 2: series.scaffold = oldfrag.scaffold yield series if __name__ == "__main__": filename = sys.argv[1] frags = getFrags( filename ) it = getSeries( frags ) for series in it: print( "# %s" % series.scaffold ) for rgroup in sorted( series.rgroups ): print( "%s %s" % ( rgroup[0], rgroup[1] ) )