Implementation of machine learning in Spotfire.

Today I coded for functions that predict molecular property using e1071.
Following code is almost pure R, but the code get data from Spotifre.
So, users don’t need to think about R coding. User can build model and predict data only using spotfire.

At first I get sample data in from Bursi Mutagenicity Dataset(link).
Convert SDF format to smiles format using RDKit, because to calculate fingerprint I used rcdk from Spotfire.

Let’s build model and predict data.
At first, upload data that has smiles and AMES Categorisation to library.

Second, register the data function .
The function builds model from smiles and Categorisation data, and saves the model in temp folder and return the test_result.
“inTable” means uploaded data that was mentioned above.
Of cause, to do that, user need R and rcdk, e1071.
To use following data function, user needs to set input=>inTable(table), output=>outTable(table).

Code is following….

library( RinR )
outTable <- REvaluate({
                      library( rcdk );
                      library( e1071 );
                      inTable$CATEGORIATION <- as.factor(inTable$CATEGORIATION);
                      inTable$SMILES <- as.character(inTable$SMILES);
                      mols <- lapply(inTable$SMILES, parse.smiles);
                      cmp.fp <- vector("list", nrow(inTable));
                      for (i in 1: nrow(inTable)){
                                            cmp.fp[i] <- lapply(mols[[i]][1], get.fingerprint, type="circular")
                      fp.matrix <-;
                      cmp.fingerprint <-;
                      dataset <- cbind(cmp.fingerprint, inTable$CATEGORIATION);
                      colnames(dataset)[1025] <- "RESPONSE";
                      train <- sample(dim(dataset)[1],3000);
                      test<-c( 1:nrow(dataset) )[-train];
                      model <- svm(RESPONSE ~., data=train_data);
                      #write.svm(model, svm.file = "c:/temp/svmdata.svm", scale.file = "c:/temp/svmdata.scale");
                      res <- predict(model, test_data);
                      res_mat <- data.frame(res, test_data$RESPONSE);
                      outTable <- data.frame(table(res_mat));
                     }, data="inTable")

Third, register the another data function to predict category from smiles.
The data function needs smiles column that you want predict, and returns predicted category column.
So, user need to set input=>inCol (column) and output=>outCol(column).
This function read models stored temp folder and predict data.

library( RinR )
outCol <- REvaluate({
                      library( rcdk );
                      library( e1071 );
                      inCol <- as.character(inCol);
                      mols <- lapply( as.list(inCol), parse.smiles );
                      cmp.fp <- vector("list", length(inCol));
                      for (i in 1:length(inCol) ){
                                            cmp.fp[i] <- lapply(mols[[i]][1], get.fingerprint, type="circular")
                      fp.matrix <-;
                      cmp.fingerprint <-;
                      outCol <- predict( model , cmp.fingerprint );
                      }, data="inCol")

I think TERR and RinR are useful not only comp chem. but also med chem. because to use datafuction, user don’t need cording.

I up loaded sample code to my github. ;-)


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: