Try to use new LLM phi3 #memo #LLM

As name of LLM means that to use these kinds of models, we need enough GPU memory and it’s not so cost effective for personal use ;) To overcome the limitation, there are lots of technologies are developt and still be developping. LLAMA-cpp is one of the them. Today I would like to share newContinue reading “Try to use new LLM phi3 #memo #LLM”

FW analysis make easily with rdkit contrib FW package #rdkit #chemoinformatics #memo

Last week I enjoyed RDKit UGM 2022. It was really great and exciting evenif I participated there from online. I hope I could participate RDKIT UGM 2023 locally ;) As you know RDKit is one of the useful OSS package for chemoinformatician. It has nice community and be developed actively. I respect the community andContinue reading “FW analysis make easily with rdkit contrib FW package #rdkit #chemoinformatics #memo”

Useful package for ploting chemical space rapidly #chemoinformatics #memo

Visualize chemical space is important task for chemoinformatitian. And there are lots of way to represent chemical space. One of the common approach is PCA. And recently tSNE and UMAP are used. I wrote template code for plotting these data in my task but didn’t write code as a package. Today I found useful packageContinue reading “Useful package for ploting chemical space rapidly #chemoinformatics #memo”

Generate molecules from molecular formula #Chemoinformatics #memo #jcheminf

Most of chemoinformatitian will think that C6H6 means benzene and its SMILES strings will be ‘c1ccccc1’. However how do you think that how many possible combinations will be generated from molecular formula C6H6? ….. Yah, it’s interesting but difficult question. Recently I read interesting article published from Jounral of chemoinformaitcs. The title is ‘Surge: aContinue reading “Generate molecules from molecular formula #Chemoinformatics #memo #jcheminf”

Easy way to visualize SMARTS #chemoinformatics #memo

SMARTS which is a language for describing molecular patterns like regular expression for NLP is really useful for chemoinformatician. However it’s difficult to understand due to difficulty of visualization SMARTS query. As far as I know, there are few software which can visualize beautiful SMARTS pattern. BioSolveIT provides unique SMARTS editor but it’s required commercialContinue reading “Easy way to visualize SMARTS #chemoinformatics #memo”

A memo about New approach of Drug discovery from ACS medchem letters #memo #journal #RIBOTAC

Recently there are lots of publications and patents about PROTACs (Proteolysis targeting chimeric). As name indicates that the target of PROTACs is the specific protein of degradation (POI) so it’s called chemical knockdown. Compared to inhibitor, sometime PROTACs shows a strong biological activity. It’s an interesting approach. And I found another interesting approach in ACSContinue reading “A memo about New approach of Drug discovery from ACS medchem letters #memo #journal #RIBOTAC”

Get environment SMILES around cutting points #chemoinformatics #memo #RDKit

In this week, I’m in summer vacation but can’t go travel due to COVID19 pandemic and heavy rain. It’s really unusual summer vacation. I hope everyone stay safe. BTW, I often use R-Group decomposition and Matched molecular pairs and these method generate many fragment smiles which has [*] at attachment points. And I would likeContinue reading “Get environment SMILES around cutting points #chemoinformatics #memo #RDKit”

Comparison of rdMMPA cut rules #RDKit #Chemoinformatics #memo

RDKit has code for making mmp in Contrib folder. And also rdkit provides rdMMPA class which can make MMP which is based on user defined cutting rules. Today I checked the rule and modified it with GetSubstructMatches. Default cutting rule is described in rdMMPA document and it’s defined as SMARTS pattern. pattern=’[#6+0;!$(=,#[!#6])]!@!=!#[]’  >> It meansContinue reading “Comparison of rdMMPA cut rules #RDKit #Chemoinformatics #memo”

Read SDF with Multi thread #RDKit #memo #chemoinformatics

In the chemoinformatics task, I often use SDFiles and call SDMolSuppier to read them. BTW, from rdkit version 2020.09.1, Multithreaded file reader for SMILES and SDF is implemented but I’ve never used it. So I used it and compared its speed against default SDMolSupplier. Here is an example. At first I got compound data fromContinue reading “Read SDF with Multi thread #RDKit #memo #chemoinformatics”

Control targeted gene transcription with small molecule #journal #memo

As people well know PROTAC(PROteolysis TArgeting Chimeras) is one of the interesting approach for targeted protein degradation. It has warhead, ligand of targeted protein and ligand of E3 ligase. This bi-functional molecule motif ‘A-Linker-B’ is widely used in drug design. And I read a very interesting article found in my twitter TL. The URL isContinue reading “Control targeted gene transcription with small molecule #journal #memo”

Which is better Graph based or descriptor based model for QSAR prediction? #journal #memo #chemoinformatics

There are lots of Graph convolutional network(GCN) models are applied for QSAR tasks instead of traditional descriptor based model. The interesting point of GCN is that we don’t need feature engineering I think. It means that during the learning process, GCN learns molecular feature from given molecular graph. On the other side, descriptor based modelContinue reading “Which is better Graph based or descriptor based model for QSAR prediction? #journal #memo #chemoinformatics”

Lilly’s Chemoinformatics Tool Kit #memo #chemoinformatics

Almost 7 years ago, I posted a topics about Lilly’s MedChem filter which is disclosed on github. https://iwatobipen.wordpress.com/2013/08/06/lilly%E3%81%AE%E3%83%95%E3%82%A3%E3%83%AB%E3%82%BF/ It’s interesting for me because most of languages of code on Lilly’s repository are not python and C++, R etc. And one of interesting code is LillyMol. The licence of the code is Apache 2.0. I feltContinue reading “Lilly’s Chemoinformatics Tool Kit #memo #chemoinformatics”

Difference between santize mol and not sanitize mol #memo #rdkit

I posted about fast compound search with rdkit. And in the post, I used patternfinger print in the post. Today I checked behavior of the fingerprint. Patternfingerprint can calculate molecules which is not sanitized. However the fingerprint is different to the fingerprint which is calculated from sanitized mol. Here is a simple example. The outputContinue reading “Difference between santize mol and not sanitize mol #memo #rdkit”

Build accurate model with small training data and quantum chemistry #memo #from_ChemRxiv

Recently I read the nice article from ChemRxiv.Here is the link ;) The title is ‘Machine Learning Meets Mechanistic Modelling for Accurate Prediction of Experimental Activation Energies’. I don’t have experience there area but I found and read publications which use Mechanistic DFT. The author mentioned that DFT based approach has difficulties to calculate reactionContinue reading “Build accurate model with small training data and quantum chemistry #memo #from_ChemRxiv”

What is scaffold / Medicinal chemist feeling #memo

Recently I’m interested in the following article. https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c00204 The author tried to detect chemical series (scaffold) like medicinal chemist. In the drug discovery project chemical series / scaffold is very important concept to analyze compounds SAR but it is fuzzy. As chemoinformatitian know Bemis-Murcko scaffold is one of the solution for systemic detection of chemical seriesContinue reading “What is scaffold / Medicinal chemist feeling #memo”

Make pandas dataframe with r-group information #memo

I often forget many things …. So there are same topics will be posted in my blog. Sometime it’s updated due to change of package version or some reasons. And I posted very similar code previously. But I posted again to remember the procedure for myself. It’s just memo… PandasTools of RDKit makes easy toContinue reading “Make pandas dataframe with r-group information #memo”

Useful experimental method for removing solvent #memo #organic_synthesis

There are many kinds of solvents are used in organic synthesis. Such as THF, Et2O, EtOAc, DCM, DMF, DMSO and water etc. etc…. Most of solvent can be easily removed by using rotary evaporator. When I was bench chemist, I like the solvent which has low boiling point. Because it is easy to remove. TheContinue reading “Useful experimental method for removing solvent #memo #organic_synthesis”

Embed interactive plot in jupyter notebook with panel #chemoinformatics #RDKit #memo #panel

As you know Jupyter notebook is very useful tool for data scientist. It can analyze scientific data with nice view. And there are lots of packages for data visualization. And I often use matplotlib and seaborn for my task. However few days ago, I found an interesting package named Panel which is high level appContinue reading “Embed interactive plot in jupyter notebook with panel #chemoinformatics #RDKit #memo #panel”

Example code of DGL for chemoinformatics task #DGL #chemoinformatics #RDKit #memo

There are many publications about graph based approach for chemoinformatics area. I can’t cover all of them but still have interest these area. I think pytorch_geometric (PyG) and deep graph library (DGL) are very attractive and useful package for chemoinformaticians. I wrote some posts about DGL and PyG. Recent DGL is more chemoinformatics friendly soContinue reading “Example code of DGL for chemoinformatics task #DGL #chemoinformatics #RDKit #memo”

Make molecule mesh data #RDKit #chemoinformatics #meshlab

I have an interest to predictive model build with 3D compound information. Pytorch3d and open3d seems attractive package for me. However, to use the package, I need to convert molecular information to 3D data such as pointcloud etc. At first I tried it to use openbabel because recent version of openbabel can convert molecule fromContinue reading “Make molecule mesh data #RDKit #chemoinformatics #meshlab”