Make quantum accumulator #quantum computing #qiskit

It’s not related to today’s topic but I’m sick with back pain. I hope I will get well soon..

Somedays ago I wrote post about qiskit. I think qiskit is very interesting package for quantum computing. And I stared to learn qiskit and quantum computer.

One of the interesting point of quantum computer is that it uses qubit instead of bit and the qubit can have an entangle state which means 0 or 1.

Today I made very simple quantum accumulator with qiskit.

Following code uses 4 qubit and 4 classical register.
Classical register used for results measurement.
And 4 kinds of gates.
Pauli X gate: x, this gate change |0> to |1>
hadamard gate: h, this gate generates entangle state |0> => 1/sq(2) [|0> + |1>]
control not gate: cx, this gate is same as NOT gate of classical computer
And gate: ccx, this gate is same ans AND gate.

Image of quantum circuit for 1 + 0 is below. First two qubits are used for input and third and forth qubits are used for output.

The code is below.

 
%matplotlib inline
import numpy as np
from qiskit import QuantumCircuit
from qiskit import QuantumRegister
from qiskit import execute
from qiskit import BasicAer
from qiskit import ClassicalRegister
from qiskit.visualization import plot_histogram
# I used simulator instead of real quantum computer
backend_sim = BasicAer.get_backend('qasm_simulator')

# 1 + 0 = 1
# Made 4 qubits and 4 classical bits
q = QuantumRegister(4)
c = ClassicalRegister(4)

# make quantum circuit object and add gates.
qc = QuantumCircuit(q,c)
qc.x(q[0])
qc.ccx(q[0],q[1],q[2])
qc.cx(q[0],q[3])
qc.cx(q[1],q[3])
qc.barrier(q)
qc.measure(q[3],c[0])
qc.measure(q[2],c[1])
qc.measure(q[1],c[2])
qc.measure(q[0],c[3])
qc.draw(output='mpl') # draw the circuit above.

OK let’s run the calculation.

 
# shots means number of trials because results of quantum computing depends on probability.
 
job = execute(qc, backend_sim, shots=80)
result = job.result()
plot_histogram(result.get_counts(qc))

1001 meants qubit1(1) + qubit(0) = 01 with 100% probability! It seems work well.

Next, how about 1 + 1? I added additional Pauli X gate to q[1].

 
# 1 + 1 = 2
q = QuantumRegister(4)
c = ClassicalRegister(4)
c = ClassicalRegister(4)

qc = QuantumCircuit(q,c)
qc.x(q[0])
qc.x(q[1])
qc.ccx(q[0],q[1],q[2]) # and gate count up when q[0] and q[1] is 1
qc.cx(q[0],q[3])
qc.cx(q[1],q[3])
qc.barrier(q)
qc.measure(q[3],c[0])
qc.measure(q[2],c[1])
qc.measure(q[1],c[2])
qc.measure(q[0],c[3])
qc.draw(output='mpl')

The results 1110 with 100% probability. It means qubit(1) + qubit(1) = 10, Binary 10 means 2 in decimal.

At last, write code with entangle state.

 
# 0/1 + 0/1 = 0, 1, 2
q = QuantumRegister(4)
c = ClassicalRegister(4)
c = ClassicalRegister(4)

qc = QuantumCircuit(q,c)
#qc.x(q[0])
#qc.x(q[1])
qc.h(q[0])
qc.h(q[1])
qc.ccx(q[0],q[1],q[2])
qc.cx(q[0],q[3])
qc.cx(q[1],q[3])
qc.barrier(q)
qc.measure(q[3],c[0])
qc.measure(q[2],c[1])
qc.measure(q[1],c[2])
qc.measure(q[0],c[3])
qc.draw(output='mpl')
 
job = execute(qc, backend_sim, shots=80)
result = job.result()
plot_histogram(result.get_counts(qc))

Result showed four state with almost same probability (25%).

0 + 0 = 0, 0 + 1 = 1, 1+0=1, 1+1 = 10(2)

I got more unbalanced data when I run code with low shots.

It is difficult for me to make the quantum circuit for complicated problem.

But technology goes very fast. Quantum computer is used for drug discovery in the feature. So I need to keep my eyes open.

Today’s code is uploaded following URL.

https://nbviewer.jupyter.org/github/iwatobipen/quantum_computing/blob/master/basics_sum.ipynb

Advertisements

Psikit update/Draw ESP, HOMO LUMO #RDKit #Chemoinformatics #quantumchemistry

I just updated psikit which is package for quantum-chemoinformatics ;)

It can be installed from conda / pypi :)

I added and updated new function for molecular property rendering.

Current version of psikit can draw not only frontier orbital but also ESP and dual descriptor. Dual descriptor is calculated by psi4. What is dual descriptor? From original ducment.

Calculates the dual descriptor from frontier orbitals: 𝑓2(𝐫)=𝜌LUMO(𝐫)−𝜌HOMO(𝐫)f2(r)=ρLUMO(r)−ρHOMO(r). The dual descriptor is a good measure of nucleophilicity and electrophilicity, containing information essentially equivalent to both Fukui functions combined. More details on the dual descriptor itself can be found in [Morell:2005:205], while the current implementation is described in [Martinez-Araya:2015:451]. This feature is currently only supported for closed shell systems.

It is very easy to get these images!

Let’s test the functions with acetic acid as an example.

Following code is almost borrowed form the UGM material. Thanks for sharing nice code. Import packages, read Reaction data and reaction objects at first. For convenience, I recommend to install ipymol at first.

 
import ipymol
from psikit import Psikit
pk = Psikit()
v = ipymol.viewer
v.start() # pymol will launch

Calculate energy of acetic acid.

Then call getMOview() for getting some cube files.


pk.getMOview()

Now data preparation is finished. I can get several views just call view_on_pymol with target option which I would like to draw in pymol.

 
pk.view_on_pymol(target='ESP')
pk.view_on_pymol(target='DUAL', maprange=0.001)
pk.view_on_pymol('FRONTIER')

Now I could get 3 views in pymol. Following images are results.

ESP
DUAL
LUMO
HOMO

Visualization of quantum chemistry properties are useful for medchem I think. Any comments, requests and suggestions are greatly appreciated.

code example

Today’s code is uploaded my repo and URL is below.

https://github.com/iwatobipen/playground/blob/master/new_function_psikit.ipynb

Enumerate partial heteroaromatic rings in a molecule #RDKit #Chemoinformatics

I posted hetero shuffling before. It worked well but redundant. There is a nice code in RDKit UGM2017 material. URL is below.

https://github.com/rdkit/UGM_2017/blob/master/Notebooks/Cole-Enumerate-Heterocycles.ipynb

The code defined transformation with hard coding and seems nice.

In case of real project, we sometime would like to do enumeration against partial substructure not all structure. I thought how to do it.

Fortunately RDKit can do it by setting “_protected” property of Atoms. It is worth to know (you know, the approach is described in RDKit document of course!).

Following code is almost borrowed form the UGM material. Thanks for sharing nice code. Import packages, read Reaction data and reaction objects at first.

 
from __future__ import print_function
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
import copy
import numpy as np

import pandas as pd
csvfile = './data/heterocycle_reactions.csv'

import csv
smarts_reader = csv.DictReader(open(csvfile))
REACTIONS = []
for row in smarts_reader:
    smarts = row['SMARTS']
    if not smarts:
        continue

    for product in row['CONVERT_TO'].split(','):
        reaction = smarts + '>>' + product
        REACTIONS.append(AllChem.ReactionFromSmarts(reaction))

Then define some functions. I used mol object as an input directly instead of SMILES.

, isomericSmiles=True)
if isosmi in unique:
continue
unique.add(isosmi)
Chem.SanitizeMol(newmol[0])
yield newmol[0]

def enumerate_heterocycles(mol):
start = mol
starting_points = [start]
seen = set()
while starting_points:
for newmol in get_unique_products(starting_points.pop()):
newmol_smiles = Chem.MolToSmiles(newmol)
if newmol_smiles in seen:
continue
starting_points.append(newmol)
seen.add(newmol_smiles)
yield newmol

Now ready to check it.

I used capivasertib which is kinase inhibitor as an example.

 
rwmol = Chem.RWMol(mcs_mol)

rwconf = Chem.Conformer(rwmol.GetNumAtoms())
matches = rwmol.GetSubstructMatch(mcs_mol)

ref_conf = mol1.GetConformer()
for i, match in enumerate(matches):
    print(ref_conf.GetAtomPosition(ref_match[i]).x)
    # Added atom position information from reference molecule
    rwconf.SetAtomPosition(match, ref_conf.GetAtomPosition(ref_match[i]))
rwmol.AddConformer(rwconf)

Check reference molecule and query molecule structure. I made two molobjects one is non protected and the other is protected atom excepting phenyl ring.

 
capivasertib = Chem.MolFromSmiles('c1cc(ccc1[C@H](CCO)NC(=O)C2(CCN(CC2)c3c4cc[nH]c4ncn3)N)Cl')

protected_capivasertib = copy.deepcopy(capivasertib)
atoms = protected_capivasertib.GetAtoms()
phenyl = Chem.MolFromSmiles('c1ccccc1')
mactches = protected_capivasertib.GetSubstructMatches(phenyl)
arr = np.array(mactches)
matches = arr.flatten()
for atom in atoms:
    if atom.GetIdx() not in matches:
        atom.SetProp('_protected', '1')
capivasertib
capivasertib

Let’s check it.

Enumerated hetero shuffled derivative from non protected and protected molecules. Then use ConstrainEmbed method.

Lots of molecules are generated from non protected molecule!

 
enume1 = list(enumerate_heterocycles(capivasertib))
enume2 = list(enumerate_heterocycles(protected_capivasertib))
print(len(enume1), len(enume2))
> 2592 9

And following results shows effect of “_protected” prop. It is very useful I think. RDKit has many cool features for chemoinformatics.

sourcecode language=”python” wraplines=”false” collapse=”false”]
Draw.MolsToGridImage(enume1[:10], molsPerRow=5)
Draw.MolsToGridImage(enume2[:10], molsPerRow=5)
[/sourcecode]

NON PROTECTED
PROTECTED

Lower figure shows hetero shuffled molecules at only phenyl rings.

I uploaded today’s code to my gist and repo.

https://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/Protect%20and%20enumerate%20heterocycles.ipynb

example

Current version Openforcefield supports rdkit #RDKit #Openforcefield #chemoinformatics

I posted about openforcefield(OpenFF) before. You know, old version of openff supports only OpenEyeTK but current version supports RDKit too.

It is worth to know that we can use openff with open source tool kit. I really appreciate developer’s work! It is great. Today I use the package and ipymol which can control pymol in ipython session. It means that you can control pymol from jupyter notebook! My example code is below.

import openforcefield as off
from rdkit import Chem
from rdkit.Chem import AllChem
from simtk import openmm, unit
from simtk.openmm import app
from openforcefield.topology import Topology
from openforcefield.topology import Molecule
from openforcefield.typing.engines.smirnoff import ForceField
from rdkit.Chem.Draw import IPythonConsole

Load SMIRNOF99FROSS forcefield.

ff = ForceField('test_forcefields/smirnoff99Frosst.offxml')

Then define get_energy function.

def get_energy(system, positions):
    integrator = openmm.VerletIntegrator(1.0 * unit.femtoseconds)
    context = openmm.Context(system, integrator)
    context.setPositions(positions)
    state = context.getState(getEnergy=True)
    energy = state.getPotentialEnergy().in_units_of(unit.kilocalories_per_mole)
    return energy

Make sample mol object with rdkit and convert openff mol object. It is easy, just call from_rdkit!.

rdmol = Chem.MolFromSmiles('c1c(c2sccc2)c(c3c[nH]cc3)oc1')
rdmol = Chem.AddHs(rdmol)
rdmol
ofmol = Molecule.from_rdkit(rdmol)

Then generate conformer with openff function. And get topology.

ofmol.generate_conformers(n_conformers=10)
topology = ofmol.to_topology()

Next, create openMM system with generated topology and get position of one conformer.

org_system = ff.create_openmm_system(topology)
pos = ofmol.conformers[0]

At first, calculate energy with default conformation.

get_energy(org_system, pos)
> Quantity(value=80.93719044789302, unit=kilocalorie/mole)

Next I tried to minimize energy with openMM method.

new_system = ff.create_openmm_system(topology)
new_energy = get_energy(new_system, pos)
from sys import stdout
def minimizeOpenMM(Topology, System, Positions):
    integrator = openmm.LangevinIntegrator(
                                        300.0 * unit.kelvin,
                                        1.0 / unit.picosecond,
                                        2.0 * unit.femtosecond)
                                        #.002 * unit.picoseconds)
    simulation = app.Simulation(Topology, System, integrator)
    simulation.context.setPositions(Positions)
    simulation.minimizeEnergy(tolerance=5.0E-9, maxIterations=2000)
    state =  simulation.context.getState(getPositions=True, getEnergy=True)
    positions =state.getPositions(asNumpy=True)
    energy = state.getPotentialEnergy().in_units_of(unit.kilocalories_per_mole)
    
    simulation.reporters.append(app.StateDataReporter(stdout, 1000, step=True, potentialEnergy=True, temperature=True))
    simulation.step(30000)
    print(energy)
    positions = positions / unit.angstroms
    coordlist = list()
    for atom_coords in positions:
        coordlist += [i for i in atom_coords]
    return coordlist, positions

cl, pos=minimizeOpenMM(topology, org_system, pos)

Now I could get minimized position by calling minimizeOpenMM.

Next I updated atom position with optimized atom geometries.

from rdkit.Chem import rdGeometry
from rdkit.Chem.rdchem import Conformer
AllChem.EmbedMolecule(rdmol, useExpTorsionAnglePrefs = True , useBasicKnowledge = True)

conf = rdmol.GetConformer()
w=Chem.SDWriter("test1.sdf")
w.write(rdmol)
w.close()
for i in range(rdmol.GetNumAtoms()):
    conf.SetAtomPosition(i, rdGeometry.Point3D(pos[i][0], pos[i][1],pos[i][2],))
w=Chem.SDWriter("test2.sdf")
w.write(rdmol)
w.close()

I made two SDF one is default conformer and another is minimized conformer.

OK, I use ipymol to communicate pymol and visualize in jupyter notebook.

from ipymol import viewer
viewer.start() # this method launches pymol
viewer.load("test1.sdf")
viewer.load("test2.sdf")
viewer.align("test1","test2")
from ipymol import viewer
viewer.start() # this method launches pymol
viewer.load("test1.sdf")
viewer.load("test2.sdf")
viewer.align("test1","test2")

viewer.ray()
viewer.display()
viewer.deleteAll()
viewer.load("test2.sdf")
viewer.ray()
viewer.display()
viewer.deleteAll()
viewer.load("test1.sdf")
viewer.ray()
viewer.display()

The result seems different to original article. https://www.biorxiv.org/content/early/2018/07/13/286542.full.pdf

Hmm why???

BTW, OpenFF is very attractive package for chemoinformatics I think.

My code can access below.

https://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/openff.ipynb

Comparison of rdChemReactions and EditableMol #RDKit #chemoinformatics

In this year I moved from MedChem team to CompChem team. And now I need to learn SBDD. Today I struggled mol object that has 3D information.

I would like to replace hydrogen which attached aromatic carbon to some atoms. I thought it is easy if I use rdChemReactions method. But I found that it was not good approach. Because RunReactants method generates products but the method can not keep 3D information of reacted atoms.

It is very interesting and good information for me. Let see example code.

Following code is very simple example.

code example

At first, I tried to replace atom with rdChemReactions method. It worked well but the position of fluorine atom was 0, 0, 0. It indicates that the method can not keep 3D information. I think it is reasonable because some chemical reaction dramatically changes molecular conformation.

The second example used Editable mol.

This approach could keep 3D information, it means that the method do just replace atom!

Of course bond length of C-H and C-F is different but the approach is more suitable for atom scanning I think.

I always surprised that RDKit has many useful function for drug design.

Molecular encoder/decoder (VAE) with python3.x (not new topic) #rdkit #chemoinformatics

The first day of 10-day holiday is rainy. And I and my kid will go to dodge ball tournament.

Two years ago, I tried to modify the keras-molecule which is code of molecular encoder decoder. The code is written for python 2.x. So I would like to run the code on python 3.6. I stopped the modification because I often use pytorch and found molencoder for pytorch. https://github.com/cxhernandez/molencoder

Today I tried to modify keras-molecule again.
I think almost done and the code will run on python 3.x, with keras 2.x tensorflow backend.

https://github.com/iwatobipen/keras-molecules

I checked the code on google colab. Because google colab provides GPU environment. It is useful for deep learning. At first, I installed conda and several packages which are used in keras-molecule.

Following code was written on google colab.

# install conda
!wget https://repo.continuum.io/miniconda/Miniconda3-4.4.10-Linux-x86_64.sh
!chmod +x ./Miniconda3-4.4.10-Linux-x86_64.sh
!time bash ./Miniconda3-4.4.10-Linux-x86_64.sh -b -f -p /usr/local

import sys
import os
sys.path.append('/usr/local/lib/python3.6/site-packages/')
# install packages via conda command
!time conda install -y -c conda-forge rdkit
!conda install -y -c conda-forge h5py
!conda install -y -c conda-forge scikit-learn
!conda install -y -c conda-forge pytables
!conda install -y -c conda-forge progressbar
!conda install -y -c conda-forge pandas
!conda install -y -c conda-forge keras

For in my case, rdkit installation took very long time, 20min or more…. I waited patiently.

After installation, check rdkit. Installed old version. But it does not matter.

from rdkit import Chem
from rdkit import rdBase
print(rdBase.rdkitVersion)
> 2018.09.1

Then clone code from github repo, change current dir and download sample smiles. Keras-molecule uses git-lfs for data strage. So files which are stored is not real .h5 files. I skipped installation of git-lfs to colab.

!git clone https://github.com/iwatobipen/keras-molecules
os.chdir('keras-molecules/')
# original data is stored with git-lft. So it can't get directly with git.
# So I uploaded small sample data to my repo.
!wget https://github.com/iwatobipen/playground/raw/master/smiles_50k.h5

Now ready, preprocess is needed at first. It is easy to do it.
After finished processed.h5 file will be stored in data/ .

!python preprocess.py smiles_50k.h5 data/processed.h5

Then load the data and do training. I did it on notebook, so following code is different from original README section.

import train
data_train, data_test, charset = train.load_dataset('data/processed.h5')
model = train.MoleculeVAE()
model.create(charset,  latent_rep_size=292)
checkpointer = train.ModelCheckpoint(filepath = 'model.h5',
                                   verbose = 1,
                                   save_best_only = True)

reduce_lr = train.ReduceLROnPlateau(monitor = 'val_loss',
                                  factor = 0.2,
                                  patience = 3,
                                  min_lr = 0.0001)
model.autoencoder.fit( data_train,
                  data_train,
                  shuffle = True,
                  epochs = 20,
                  batch_size = 600,
                  callbacks = [checkpointer, reduce_lr],
                  validation_data = (data_test, data_test)
            )

It took 20-30min for training on google colab with GPU. I don’t recommend run the code on PC which does not have GPU. I will take too long time for training.

After training, I ran interpolate. Interpolate samples molecules between SORUCE and DEST from latent space. Ideally the approach can change molecule gradually because latent space is continuous space. It sounds interesting.

import interpolate
h5f = interpolate.h5py.File('data/processed.h5' ,'r')
charset = list(h5f['charset'][:])
charset = [ x.decode('utf-8') for x in charset ]
h5f.close()

LATENT_DIM = 292
WIDTH = 120
model.load(charset, 'model.h5', latent_rep_size = LATENT_DIM)

I set two molecules for SOURCE and DEST.

SOURCE = 'CC2CCN(C(=O)CC#N)CC2N(C)c3ncnc1[nH]ccc13'
DEST = 'CCS(=O)(=O)N1CC(C1)(CC#N)N2C=C(C=N2)C3=C4C=CNC4=NC=N3'
STEPS = 10
results = interpolate.interpolate(SOURCE, DEST, STEPS, charset, model, LATENT_DIM, WIDTH)
for result in results:
  print(result[2])

The result was….

Cr1cccccCCCCCCCCCCCCCCCcccccccccccccccc Cr1cccccCCCCCCCCCCCCCCCcccccccccccccccc Cr1cccccCCCCCCCCCCCCCCCCCccccccccccccccc Cr1cccccCCCCCCCCCCCCCCCCCCCcccccccccccccc Cr1cccccCCCCCCCCCCCCCCCCCCCCCccccccccccccc Cr1ccccCCCCCCCCCCCCCCCCCCCCCCCCCCcccccccccc Cr1CcccCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCccccccc Cr1CccCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCcccc CC1CccCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CC1CccCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

Opps! All molecules are not molecules. It just almost C/c……

The model will be improved if I set more large number of epochs and give more learning data.

Naive VAE approach is difficult to generate valid molecules. Now there are many approaches about molecule generation are reported. JT-VAE, Grammer-VAE, etc, etc…. What kind of generator do you like?

Today’s code is redundant but uploaded my github repo.

https://github.com/iwatobipen/playground/blob/master/mol_vae.ipynb

Visualize Molecular Orbital with pymol and psikit #RDKit #psi4 #psikit #pymol

I posted how to visualize MO with VMD, the data was generated from psikit.

I used VMD because psi4 provides visualize function for cubefile and I could not find the same method on pymol. Pymol is familier for me than VMD. So I would like to do same thing on pymol. And today I could do it.

It is very easy! Let’s try it. At first, geometry optimize and generate cubefile of acetic acid and tetrazole.

import sys
sys.path.append('/Users/iwatobipen/develop/chemoenv/psikit/psikit')
from psikit import Psikit
from rdkit import Chem
pk = Psikit()
# acetic acid
pk.read_from_smiles("CC(=O)[O-]")
pk.optimize()
Chem.MolToMolFile(pk.mol, 'acetic_acid.mol')
pk.getMOview()

# tetrazole
pk.read_from_smiles("CC1=NN=N[N-]1")
pk.optimize()
pk.getMOview()

The getMOview() to generate MO cube files. The function generates 4 files named Psi_a/b_n_n_A.cube. n is number of alpha electrons and number of alpha electrons + 1. It means HOMO and LUMO.

After file generation, launch pymol and load mol file and cubefiles. For acetic acid, acetic_acid.mol, Psi_a_16_16-A.cube and Psi_b_16_16-A.cube files are loaded for HOMO drawing.

Then, draw isomesh and set color from pymol command line.

isomesh HOMO_A,  Psi_a_16_16-A, -0.02
isomesh HOMO_B,  Psi_b_16_16-A, 0.02
color blue, HOMO_A
color red, HOMO_B

Now I could get following image on pymol.

And I could get HOMO of tetrazole with same manner.

Pymol can draw ligand-protein complex easily. I think it is interesting for rendering ligand MO on protein-ligand complex and think about new drug design.