Make original sklearn classifier-2 #sklearn #chemoinfo

After posted ‘Make original sklearn classifier’, I could get comment from my follower @yamasaKit_-san and @kzfm-san. (Thanks!) So I checked diversity of models with principal component analysis(PCA).The example is almost same as yesterday but little bit different at last part.Last part of my code is below. Extract feature importances from L1 layer classifiers and mono-randomContinue reading “Make original sklearn classifier-2 #sklearn #chemoinfo”

Make original sklearn classifier #sklearn #chemoinfo

I posted and wrote code about ‘blending’ which is one of the strategy for ensemble learning. But the code had many hard coded part so it was difficult to use in my job. In this post, I tried to make new classification class of sklearn for ensemble learning and test the code. At first, mostContinue reading “Make original sklearn classifier #sklearn #chemoinfo”

Analysis and visualize tool kit for FMO #FMO

I and my daughter got the flu last week and now we are staying in my home…Now I read some articles and found interesting work for FMO.URL is below. means ‘Fragment Molecular Orbital’ that is powerful method for protein-ligand interaction energy calculation. Evotec which is a drug discovery alliance and development partnership company published manyContinue reading “Analysis and visualize tool kit for FMO #FMO”

メドケムxAI 創薬化学者の今後は明るいのか? #souyakuAC2018

これは”創薬アドベントカレンダー2018″ 20日目の記事になります。去年もサイエンス色0の駄文を書いたiwatobipenです。今年もエントリーしたもののネタがないなーと思って色々彷徨った結果、論文を紹介しつつ自分の業務周りにフォーカスしようということにしました。(またサイエンスじゃないのかよ!)今回紹介するのはメドケム〜合成メドケムっぽいネタ。まずは、Drug Discovery TodayからMedicinal chemistry in drug discovery in big pharma: past, present and future内容はタイトルの通り、GSKにて長年メドケムをやられてきた著者らによる大手製薬企業(海外)のこれまでとこれからですに関する記事です。論文中のTable1”summary of the topics covered from the past to the present and then the future.”から何個か抜粋してみます。以下の文章のリストは過去=>現在=>これからの順で書いています。なお、あくまで大きな製薬企業の例なので国内の中規模以下の製薬企業には当てはまらないことも多いと思います。# 合成- ハイスキルの人材がマニュアルで実施- 50%は派遣スタッフや委託- 90%以上は派遣スタッフや委託# 合成反応- 基本的な反応セット- 同じ反応セット+Pd反応- 現在の反応セット+CHActivationやBioconversion# テクノロジー- ホットプレートで攪拌、Evap- +マイクロ波、パラレル反応キット- ?# Leads- 様々なソースから- Role of 5によるDrug like(経口投与を意識した)な構造から- Role of 5を超えたスペース、様々な投与経路# Compounds design and SAR- 論文ベース、暗黙知、経験に基づく試行錯誤- in silicoツールを活用して効率的に進める。- より人がやるよりコンピューターが行うようなり、データを活用した洗練された手法になる。#Continue reading “メドケムxAI 創薬化学者の今後は明るいのか? #souyakuAC2018”

Make interactive MMP network with Knime #Knime #chemoinformatics

I posted an example that shows making interactive scatter plot with Knime. And I would like to try MMP network with Knime. I often make network view via python package such as igraph, networkx and py2cytoscape etc. BTW, today I want to learn how to do that on knime. Recent version of Knime is providedContinue reading “Make interactive MMP network with Knime #Knime #chemoinformatics”

Make interactive plot with Knime #RDKit #Chemoinformatics #Knime

Dalia Goldman provided very cool presentation in RDKit UGM 2018 about Knime. Click to access Goldmann_KNIMEandRDKit.pdf She demonstrated interactive analysis with RDKit knime node and Javascript node. I was really interested but it was difficult to build the workflow by myself at that time. BTW, I need to learn knime for data preparation in thisContinue reading “Make interactive plot with Knime #RDKit #Chemoinformatics #Knime”

Make interactive chemical space plot in jupyter notebook #cheminformatics #Altair

I often use seaborn for data visualization. With the library, user can make beautiful visualization. BTW, today I tried to use another library that can make interactive plot in jupyter notebook. Name of the library is ‘altair’. The library can be installed from pip or conda and this package based vega and vega-lite. VegaContinue reading “Make interactive chemical space plot in jupyter notebook #cheminformatics #Altair”

Build stacking Classification QSAR model with mlxtend #chemoinformatics #mlxtend #RDKit

I posed about the ML method named ‘blending’ somedays ago. And reader recommended me that how about try to use “mlxtend”. When I learned ensemble learning package in python I had found it but never used. So try to use the library to build model. Mlxtend is easy to install and good document is providedContinue reading “Build stacking Classification QSAR model with mlxtend #chemoinformatics #mlxtend #RDKit”

Change properties of approved oral drugs

When I learned drug discovery long time ago, I read the article about Role of five which is a rule of thumb to evaluate druglikeness. You can read nice review about the druglikess scores in following URL. ( Written in Japanese ;-) ) View at By the way, recently there are many articlesContinue reading “Change properties of approved oral drugs”

Vote Vote Vote #chemoinformatics

Somedays ago, I posted about ensemble classification method named ‘blending’. The method is not implemented in scikit-learn. So I am implementing the function now. By the way, scikit-learn has an ensemble classification method named ‘VotingClassifer’. Following explanation from sklearn document. The idea behind the VotingClassifier is to combine conceptually different machine learning classifiers andContinue reading “Vote Vote Vote #chemoinformatics”

Visualize pharmacophore in RDKit #RDKit

RDKit has pharmacophore feature assignment function. The function can retrieve molecular features based on pre-defined ph4core. And RDKit IPythonconsole can draw molecules on ipython notebook. Today I tried to visualize ph4core on notebook. Code is very simple. First, load feature definition. Then calculate pharmacophore. And compute 2D cordes. Next I defined drawing function. To highlightContinue reading “Visualize pharmacophore in RDKit #RDKit”

Applicable Domain on Deep Neural Networks #JCIM #chemoinformatics

I read interesting article from JCIM. Dissecting Machine-Learning Prediction of Molecular Activity: Is an Applicability Domain Needed for Quantitative Structure−Activity Relationship Models Based on Deep Neural Networks? URL is below. The pros of DNN is feature extraction. And there are many articles which use DNN for molecular activity prediction. BTW, is it true thatContinue reading “Applicable Domain on Deep Neural Networks #JCIM #chemoinformatics”

Generate possible list of SMLIES with RDKit #RDKit

In the computer vision, it is often used data augmentation technique for getting large data set. On the other hand, Canonical SMILES representations are used in chemoinformatics area. RDKit UGM in last year, Dr. Esben proposed new approach for RNN with SMILES. He expanded 602 training molecules to almost 8000 molecules with different smiles representationContinue reading “Generate possible list of SMLIES with RDKit #RDKit”

Tracking progress of machine learning #MachineLearning

To conduct machine learning it is needed to optimize hyper parameters. For example scikit-learn provides grid search method. And you know there are several packages to do that such as hyperopt or gyopt etc. How do you mange builded models? It is difficult for me. Recently I am interested in mlflow . MLflow is anContinue reading “Tracking progress of machine learning #MachineLearning”

Ensemble learning with scikit-learn and XGBoost #machine learning

I often post about the topics of deep learning. But today I would like to post about ensemble learning. There are lots of documents describes Ensemble learning. And I think following document is very informative for me. Kaggle Ensembling Guide I interested one of the method, named ‘blending’. Regarding above URL, the merit of ‘blending’Continue reading “Ensemble learning with scikit-learn and XGBoost #machine learning”

Visualize important features of machine leaning #RDKit

As you know, rdkit2018 09 01 has very exiting method named ‘DrawMorganBit’ and ‘DrawMorganBits’. It can render the bit information of fingerprint. It is described the following blog post. And if you can read Japanese, Excellent posts are provided. View at What I want to do in the blog post is thatContinue reading “Visualize important features of machine leaning #RDKit”

convert rdkit mol object to schrodinger’s mol object #RDKit #Chemoinformatics

I posted a memo about how to read maestro file format from RDKit. It means that rdkitter can use “mae” format from RDKit. ;-) BTW, schrodinger’s site provides API for python. I would like to know the way to communicate rdkit from schrodinger python API. I read the API in lunch break and testedContinue reading “convert rdkit mol object to schrodinger’s mol object #RDKit #Chemoinformatics”

Read maestro format file from RDKit

RDKitter knows that Schrodinger contributes RDKit I think. Schrodinger provides many computational tools for drug discovery, that is not only GUI tool but also python API. Many tool can call from python and also RDKit. And RDKit can read maestro file vise versa. It is easy to do it like reading SDFiles. I amContinue reading “Read maestro format file from RDKit”

Run rdkit and deep learning on Google Colab! #RDKit

If you can not use GPU on your PC, it is worth to know that you can use GPU and/or TPU on google colab. Now you can use google colab no fee. So, I would like to use rdkit on google colab and run deep learning on the app. Today I tried it. At firstContinue reading “Run rdkit and deep learning on Google Colab! #RDKit”