Find maximum common substructure is useful for finding core scaffold.
I think that finding MCS, using commercially available tools is common (pipeline pilot ?).
I often use RDkit. ;-)
Today I found the library that search MCS in R, named fmcsR.
That’s sounds nice, because if fmcsR works fine, I’ll implement the library to Spotfire using TERR.
So, let’s try it.
Install is very easy.
Type following command.
source("http://bioconductor.org/biocLite.R") biocLite("fmcsR")
TIPS; fmcsR depend on ChemmineR.
Then write test code.
library(fmcsR) data("fmcstest") test <- fmcs(fmcstest[1], fmcstest[2], au=2,bu=1) plotMCS(test, regenerateCoords=TRUE)
au is Upper bound for the number of atom mismatches.
bu is Upper bound for the number of bound mismatches.
Then I got following image.
Works fine.
This library also compute batch search.
Example is following.
> fmcsBatch(sdf[1], sdf[1:30], au=0, bu=0) starting worker pid=2002 on localhost:11906 at 22:33:40.230 Query_Size Target_Size MCS_Size Tanimoto_Coefficient Overlap_Coefficient CMP1 33 33 33 1.0000000 1.0000000 CMP2 33 26 11 0.2291667 0.4230769 CMP3 33 26 10 0.2040816 0.3846154 CMP4 33 32 9 0.1607143 0.2812500 CMP5 33 23 14 0.3333333 0.6086957 CMP6 33 19 13 0.3333333 0.6842105 CMP7 33 21 9 0.2000000 0.4285714 CMP8 33 31 8 0.1428571 0.2580645 CMP9 33 21 9 0.2000000 0.4285714 CMP10 33 21 8 0.1739130 0.3809524 CMP11 33 36 15 0.2777778 0.4545455 CMP12 33 26 12 0.2553191 0.4615385 CMP13 33 26 11 0.2291667 0.4230769 CMP14 33 16 12 0.3243243 0.7500000 CMP15 33 34 15 0.2884615 0.4545455 CMP16 33 25 8 0.1600000 0.3200000 CMP17 33 19 8 0.1818182 0.4210526 CMP18 33 24 10 0.2127660 0.4166667 CMP19 33 25 14 0.3181818 0.5600000 CMP20 33 26 10 0.2040816 0.3846154 CMP21 33 25 15 0.3488372 0.6000000 CMP22 33 21 11 0.2558140 0.5238095 CMP23 33 26 11 0.2291667 0.4230769 CMP24 33 17 6 0.1363636 0.3529412 CMP25 33 27 9 0.1764706 0.3333333 CMP26 33 24 13 0.2954545 0.5416667 CMP27 33 26 11 0.2291667 0.4230769 CMP28 33 20 10 0.2325581 0.5000000 CMP29 33 20 8 0.1777778 0.4000000 CMP30 33 18 7 0.1590909 0.3888889 >
This method is useful for bach search.
Unfortunately, batch rmcs search is very slow on win7 32bit environment. ;-(