Find MCS in R

Find maximum common substructure is useful for finding core scaffold.
I think that finding MCS, using commercially available tools is common (pipeline pilot ?).
I often use RDkit. ;-)
Today I found the library that search MCS in R, named fmcsR.
That’s sounds nice, because if fmcsR works fine, I’ll implement the library to Spotfire using TERR.
So, let’s try it.
Install is very easy.
Type following command.

source("http://bioconductor.org/biocLite.R")
biocLite("fmcsR")

TIPS; fmcsR depend on ChemmineR.
Then write test code.

library(fmcsR)
data("fmcstest")
test <- fmcs(fmcstest[1], fmcstest[2], au=2,bu=1)
plotMCS(test, regenerateCoords=TRUE)

au is Upper bound for the number of atom mismatches.
bu is Upper bound for the number of bound mismatches.

Then I got following image.
Rplot
Works fine.

This library also compute batch search.
Example is following.

> fmcsBatch(sdf[1], sdf[1:30], au=0, bu=0)
starting worker pid=2002 on localhost:11906 at 22:33:40.230
Query_Size Target_Size MCS_Size Tanimoto_Coefficient Overlap_Coefficient
CMP1 33 33 33 1.0000000 1.0000000
CMP2 33 26 11 0.2291667 0.4230769
CMP3 33 26 10 0.2040816 0.3846154
CMP4 33 32 9 0.1607143 0.2812500
CMP5 33 23 14 0.3333333 0.6086957
CMP6 33 19 13 0.3333333 0.6842105
CMP7 33 21 9 0.2000000 0.4285714
CMP8 33 31 8 0.1428571 0.2580645
CMP9 33 21 9 0.2000000 0.4285714
CMP10 33 21 8 0.1739130 0.3809524
CMP11 33 36 15 0.2777778 0.4545455
CMP12 33 26 12 0.2553191 0.4615385
CMP13 33 26 11 0.2291667 0.4230769
CMP14 33 16 12 0.3243243 0.7500000
CMP15 33 34 15 0.2884615 0.4545455
CMP16 33 25 8 0.1600000 0.3200000
CMP17 33 19 8 0.1818182 0.4210526
CMP18 33 24 10 0.2127660 0.4166667
CMP19 33 25 14 0.3181818 0.5600000
CMP20 33 26 10 0.2040816 0.3846154
CMP21 33 25 15 0.3488372 0.6000000
CMP22 33 21 11 0.2558140 0.5238095
CMP23 33 26 11 0.2291667 0.4230769
CMP24 33 17 6 0.1363636 0.3529412
CMP25 33 27 9 0.1764706 0.3333333
CMP26 33 24 13 0.2954545 0.5416667
CMP27 33 26 11 0.2291667 0.4230769
CMP28 33 20 10 0.2325581 0.5000000
CMP29 33 20 8 0.1777778 0.4000000
CMP30 33 18 7 0.1590909 0.3888889
>

This method is useful for bach search.

Unfortunately, batch rmcs search is very slow on win7 32bit environment. ;-(

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: