Generate possible list of SMLIES with RDKit #RDKit

In the computer vision, it is often used data augmentation technique for getting large data set. On the other hand, Canonical SMILES representations are used in chemoinformatics area. RDKit UGM in last year, Dr. Esben proposed new approach for RNN with SMILES. He expanded 602 training molecules to almost 8000 molecules with different smiles representationContinue reading “Generate possible list of SMLIES with RDKit #RDKit”

Advertisement

Tracking progress of machine learning #MachineLearning

To conduct machine learning it is needed to optimize hyper parameters. For example scikit-learn provides grid search method. And you know there are several packages to do that such as hyperopt or gyopt etc. How do you mange builded models? It is difficult for me. Recently I am interested in mlflow . MLflow is anContinue reading “Tracking progress of machine learning #MachineLearning”

Ensemble learning with scikit-learn and XGBoost #machine learning

I often post about the topics of deep learning. But today I would like to post about ensemble learning. There are lots of documents describes Ensemble learning. And I think following document is very informative for me. Kaggle Ensembling Guide I interested one of the method, named ‘blending’. Regarding above URL, the merit of ‘blending’Continue reading “Ensemble learning with scikit-learn and XGBoost #machine learning”

convert rdkit mol object to schrodinger’s mol object #RDKit #Chemoinformatics

I posted a memo about how to read maestro file format from RDKit. It means that rdkitter can use “mae” format from RDKit. ;-) BTW, schrodinger’s site provides API for python. I would like to know the way to communicate rdkit from schrodinger python API. https://www.schrodinger.com/pythonapi I read the API in lunch break and testedContinue reading “convert rdkit mol object to schrodinger’s mol object #RDKit #Chemoinformatics”

Read maestro format file from RDKit

RDKitter knows that Schrodinger contributes RDKit I think. https://www.schrodinger.com/news/schr%C3%B6dinger-contributes-rdkit Schrodinger provides many computational tools for drug discovery, that is not only GUI tool but also python API. Many tool can call from python and also RDKit. And RDKit can read maestro file vise versa. It is easy to do it like reading SDFiles. I amContinue reading “Read maestro format file from RDKit”

Run rdkit and deep learning on Google Colab! #RDKit

If you can not use GPU on your PC, it is worth to know that you can use GPU and/or TPU on google colab. Now you can use google colab no fee. So, I would like to use rdkit on google colab and run deep learning on the app. Today I tried it. At firstContinue reading “Run rdkit and deep learning on Google Colab! #RDKit”

Make predictive models with small data and visualize it #Chemoinformatics

I enjoyed chemoinformatics conference held in Kumamoto in this week. The first day of the conference, I could hear about very interesting lecture. That was very basic data handling and visualization tutorial but useful for newbie of chemoinformatics. I would like to reproduce the code example, so I tried it. First, visualize training data. ItContinue reading “Make predictive models with small data and visualize it #Chemoinformatics”

standardization of tautomers #RDKit

One of the hot topic of new version of RDKit is an integration of MolVS which is tool for molecular standardization. Molecular standardization is important for not only chemist but also chemoinformatist. Because tautomer shows different representation of molecule and it will be affect accuracy of QSAR models. I wrote molecular standardization tools named ‘MolVS’Continue reading “standardization of tautomers #RDKit”

New finger print calculation method in RDKit #RDKit

It’s a good news for RDKitters! New version of rdkit is released and it can be installed with Anaconda! There are many implementations and enhancements. You can find details of that from URL below. https://github.com/rdkit/rdkit/blob/master/ReleaseNotes.md One of interesting feature is a fingerprint bit information rendering function. There is nice blog post about that. http://rdkit.blogspot.com/2018/10/using-new-fingerprint-bit-rendering-code.html AndContinue reading “New finger print calculation method in RDKit #RDKit”

Draw similarity network #RDKit #Cyjupyter

Recently Kei Ono who is developer of cytoscape developed cyjupyter. https://pypi.org/project/cyjupyter/0.2.0/ It seems attractive for me because the library can draw network diagram on jupyter notebook. There are many network structured data in chemoinformatics. For example molecule, molecular similarity map and MMP etc… I used the library to draw similarity map of molecules today. IContinue reading “Draw similarity network #RDKit #Cyjupyter”

New fingerprint/MinHash FingerPrint #RDKit #Chemoinformatics

Recently I found an article that describe new method for fast fingerprint calculation. You can read the article from chemrxiv, URL is below. https://chemrxiv.org/articles/A_Probabilistic_Molecular_Fingerprint_for_Big_Data_Settings/7176350 They used MinHash method. MinHash method is the way to estimate jaccard similarity very efficiently. The authors developed MHFP (MinHash Fingerprint) and compared the performance with ECFP4. ”’ ? MinHash ?Continue reading “New fingerprint/MinHash FingerPrint #RDKit #Chemoinformatics”

3D Alignment function of RDKit #RDKit

During the UGM, I was interested in Ben Tehan & Rob Smith’s great work. They showed me a nice example of molecular alignment with RDKit. RDKit has several function to perform 3D alignment. In the Drug Discovery 3D alignment of ligands is important not only Comp Chem but also Med Chem. After their presentation, IContinue reading “3D Alignment function of RDKit #RDKit”

what is probabilistic programming?

I did not know what PPL is. Recently I knew probabilistic programing and found nice article in arxiv. Click to access arxiv18-deep-ppl.pdf A deep probabilistic programming language is a language for specifying both deep NN and probabilistic models. Probabilistic programming creates systems that help make decisions in the face of uncertainty. In this article, authorContinue reading “what is probabilistic programming?”

Calculate HOMO and LUMO with Psi4 reviced #RDKit #Psi4

Yesterday, I got comments from reader. Regarding the comment, to calculate HOMO LUMO with psi4 correct way is below. Next, calculate HOMO-LUMO of benzene with the function and psi4. After the calculation, I could access HOMO-LUMO, the code is below. Check log file That’s all. I am happy because I can get many response throughContinue reading “Calculate HOMO and LUMO with Psi4 reviced #RDKit #Psi4”

Calculate HOMO and LUMO with Psi4 #RDKit #Psi4

You know Psi4 is an open-source suite of ab initio quantum chemistry programs designed for efficient, high-accuracy simulations of a variety of molecular properties. It is very easy to use and has an optional Python interface. It is useful for us I think. Because Psi4 can use in python, it means we can integrate manyContinue reading “Calculate HOMO and LUMO with Psi4 #RDKit #Psi4”

Get 3D distance matrix with rdkit #RDKit

I updated rdkit of my env from 20180301 to 20180303 with anaconda. ;-) When I want to get 3D distance matrix of the molecule I use Get3DDistanceMatrix method. But I found that rdDistGeom.GetMoleculeBoundsMatrix returns almost same results. 3DDistance matrix is useful for feature of 3D QSAR. I would like to use these method. And alsoContinue reading “Get 3D distance matrix with rdkit #RDKit”

AMES classification with WL graph kernel #RDKit

I often feel it difficult for me to implement algorithm from zero-base… I need to more practice. ;-) BTW, recently I can find many articles about application of graph theory for chemoinformatics. I found some interesting articles and they provides useful packages in github! Today, I tried a library named Grakel. You can find originalContinue reading “AMES classification with WL graph kernel #RDKit”

Molecular set profiling with pandas_profiling #RDKit

Molecular descriptors are good indicator for molecular profiling. Visualize and analyze these descriptors are important to have a bird’s-eye view of given molecules set. I often use “pandas” and “seaborn” to do it. Seaborn is powerful tool to make cool visualization but difficult to obtain statistics data. Yesterday, I found interesting tool to analyze pandasContinue reading “Molecular set profiling with pandas_profiling #RDKit”

mol2graph and graph2mol #rdkit #igraph

I posted about mol to graph object before. In the blog post, I wrote example that convert RDKit mol object to igraph object. It was one way. There was no method igraph to rdkit mol object. So I wrote very simple converter from graph to molecule. First, import modules. Then define two way function, mol2graphContinue reading “mol2graph and graph2mol #rdkit #igraph”

Make Drug central ER diagram with python #chemoinfo

Recently I knew useful database “DrugCentral“. From About. DrugCentral provides information on active ingredients chemical entities, pharmaceutical products, drug mode of action, indications, pharmacologic action. We monitor FDA, EMA, and PMDA for new drug approval on regular basis to ensure currency of the resource. By using the site, user can search many information on webContinue reading “Make Drug central ER diagram with python #chemoinfo”