MMPS in rdkit

I like Molecular Matched Pair Analysis because of it’s easy to understand and it is intuitively.
Recently P(pair) is extended to S(series) molecular matched series. Developer of openbabel reported MMPS in ACS. http://pubs.acs.org/doi/abs/10.1021/jm500022q
And also, they developed application that is implemented MMPS named Matsy.
I saw Matsy in JCUP and it was quite impressive for me! So, I want to make mmps in my project, but it is difficult for me to make the system.
,,,,,Fortunately, useful presentation was provided from rdkit ugm 2014. 😉
https://github.com/rdkit/UGM_2014/blob/master/Presentations/OBoyle_MatchedSeries.pdf

OBoyle provided sample script to make MMPS using RDKit!!!!!!!!
Cool!

I reproduce MMPS generator in reference to the PDF.
At first, to use following script I need to make fragmented smiles using rdkit contrib/mmpa/rfrag.py.
Then run the script. And I got following output.

iwatobipen$ python mmps.py inputfile > output.txt
iwatobipen$cat output.txt
# [*:1]C1COc2ccccc2O1
[*:1]C(=O)Nc1ccc(C(=O)O)cc1 2881039
[*:1]C(=O)Nc1ccc(C(N)=O)cc1 2787356
# [*:1]CNc1ncnc2sccc21
[*:1]c1ccccc1 2139597
[*:1]c1cccnc1 2531831
# [*:1]Cn1nc(C)cc1C
[*:1]c1ccc(C(=O)O)cc1 615212
[*:1]c1ccc(C(=O)O)o1 658387
# [*:1]NC(=O)C1COc2ccccc2O1
[*:1]c1ccc(C(=O)O)cc1 2881039
[*:1]c1ccc(C(N)=O)cc1 2787356
# [*:1]Nc1ncnc2sccc21
[*:1]Cc1ccccc1 2139597
[*:1]Cc1cccnc1 2531831
# [*:1]S(=O)(=O)N1CCCCC1
[*:1]c1ccc(-c2cn3cccc(C)c3n2)cc1 1156028
[*:1]c1cccc(-c2cn3cccc(C)c3n2)c1 2963575
# [*:1]c1ccc(C(=O)O)cc1
[*:1]Cn1nc(C)cc1C 615212
[*:1]NC(=O)C1COc2ccccc2O1 2881039
# [*:1]c1ncnc2sccc21
[*:1]NCc1ccccc1 2139597
[*:1]NCc1cccnc1 2531831
# [*:1]n1nc(C)cc1C
[*:1]Cc1ccc(C(=O)O)cc1 615212
[*:1]Cc1ccc(C(=O)O)o1 658387

I want to make mmps db maker. I think the db useful because of mmps can catch trend of SAR.
I uploaded the code to myrepo.
https://github.com/iwatobipen/mmps/tree/master/mmps

# mmps.py
from rdkit import Chem
import sys
from collections import namedtuple

Frag = namedtuple( 'Frag', [ 'id', 'scaffold', 'rgroup' ] )

class Series():
    def __init__( self ):
        self.rgroups = []
        self.scaffold = ""

def getFrags( filename ):
    frags = []
    for line in open( filename ):
        broken = line.rstrip().split( "," )
        if broken[ 2 ]: # single cut
            continue
        smiles = broken[ -1 ].split( "." )

        mols = [ Chem.MolFromSmiles( smi ) for smi in smiles ]
        numAtoms = [ mol.GetNumAtoms() for mol in mols ]

        if numAtoms[ 0 ] > 5 and numAtoms[ 1 ] < 12:
            frags.append( Frag( broken[1], smiles[0], smiles[1] ) )
        if numAtoms[ 1 ] > 5 and numAtoms[ 0 ] < 12:
            frags.append( Frag( broken[1], smiles[1], smiles[0] )  )
    frags.sort( key=lambda x:( x.scaffold, x.rgroup ) )
    return frags

def getSeries( frags ):
    oldfrag = Frag( None, None, None )
    series = Series()
    for frag in frags:
        if frag.scaffold != oldfrag.scaffold:
            if len( series.rgroups ) >= 2:
                series.scaffold = oldfrag.scaffold
                yield series
            series = Series()
        series.rgroups.append( ( frag.rgroup, frag.id ) )
        oldfrag = frag
    if len( series.rgroups ) >= 2:
        series.scaffold = oldfrag.scaffold
        yield series

if __name__ == "__main__":
    filename = sys.argv[1]

    frags = getFrags( filename )
    it = getSeries( frags )
    for series in it:

        print( "# %s" % series.scaffold )
        for rgroup in sorted( series.rgroups ):
            print( "%s %s" % ( rgroup[0], rgroup[1] ) )



Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s