Implementation of machine learning in Spotfire.

Today I coded for functions that predict molecular property using e1071.
Following code is almost pure R, but the code get data from Spotifre.
So, users don’t need to think about R coding. User can build model and predict data only using spotfire.

At first I get sample data in from Bursi Mutagenicity Dataset(link).
Convert SDF format to smiles format using RDKit, because to calculate fingerprint I used rcdk from Spotfire.

Let’s build model and predict data.
At first, upload data that has smiles and AMES Categorisation to library.

Second, register the data function .
The function builds model from smiles and Categorisation data, and saves the model in temp folder and return the test_result.
“inTable” means uploaded data that was mentioned above.
Of cause, to do that, user need R and rcdk, e1071.
To use following data function, user needs to set input=>inTable(table), output=>outTable(table).

Code is following….

library( RinR )
outTable <- REvaluate({
                      library( rcdk );
                      library( e1071 );
                      inTable$CATEGORIATION <- as.factor(inTable$CATEGORIATION);
                      inTable$SMILES <- as.character(inTable$SMILES);
                      mols <- lapply(inTable$SMILES, parse.smiles);
                      cmp.fp <- vector("list", nrow(inTable));
                      for (i in 1: nrow(inTable)){
                                            cmp.fp[i] <- lapply(mols[[i]][1], get.fingerprint, type="circular")
                      fp.matrix <-;
                      cmp.fingerprint <-;
                      dataset <- cbind(cmp.fingerprint, inTable$CATEGORIATION);
                      colnames(dataset)[1025] <- "RESPONSE";
                      train <- sample(dim(dataset)[1],3000);
                      test<-c( 1:nrow(dataset) )[-train];
                      model <- svm(RESPONSE ~., data=train_data);
                      #write.svm(model, svm.file = "c:/temp/svmdata.svm", scale.file = "c:/temp/svmdata.scale");
                      res <- predict(model, test_data);
                      res_mat <- data.frame(res, test_data$RESPONSE);
                      outTable <- data.frame(table(res_mat));
                     }, data="inTable")

Third, register the another data function to predict category from smiles.
The data function needs smiles column that you want predict, and returns predicted category column.
So, user need to set input=>inCol (column) and output=>outCol(column).
This function read models stored temp folder and predict data.

library( RinR )
outCol <- REvaluate({
                      library( rcdk );
                      library( e1071 );
                      inCol <- as.character(inCol);
                      mols <- lapply( as.list(inCol), parse.smiles );
                      cmp.fp <- vector("list", length(inCol));
                      for (i in 1:length(inCol) ){
                                            cmp.fp[i] <- lapply(mols[[i]][1], get.fingerprint, type="circular")
                      fp.matrix <-;
                      cmp.fingerprint <-;
                      outCol <- predict( model , cmp.fingerprint );
                      }, data="inCol")

I think TERR and RinR are useful not only comp chem. but also med chem. because to use datafuction, user don’t need cording.

I up loaded sample code to my github. 😉



以下に詳細を記入するか、アイコンをクリックしてログインしてください。 ロゴ アカウントを使ってコメントしています。 ログアウト / 変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト / 変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト / 変更 )

Google+ フォト

Google+ アカウントを使ってコメントしています。 ログアウト / 変更 )

%s と連携中