Make interactive chemical space plot in jupyter notebook #cheminformatics #Altair

I often use seaborn for data visualization. With the library, user can make beautiful visualization.
BTW, today I tried to use another library that can make interactive plot in jupyter notebook.
Name of the library is ‘altair’.
https://altair-viz.github.io/index.html
The library can be installed from pip or conda and this package based vega and vega-lite. Vega is a python package for data visualization.

Regarding the tutorial, Altair can make beautiful plow with very simple code. I wrote an example that plot chemical space of cdk2.sdf which is provided by RDKit.

Following code is conducted in Google colab.
At first install rdkit in my environment. Fortunately Altair can call directly without installation to colab.

!wget https://repo.anaconda.com/miniconda/Miniconda3-4.5.1-Linux-x86_64.sh
!chmod +x Miniconda3-4.5.1-Linux-x86_64.sh
!time bash ./Miniconda3-4.5.1-Linux-x86_64.sh -b -f -p /usr/local
!time conda install -q -y -c conda-forge rdkit

After installation of RDKit, append rdkit path to sys path,

import sys
import os
sys.path.append('/usr/local/lib/python3.6/site-packages/')

Now ready. Let’s import some libraries.

import pandas as pd
import numpy as np
import altair as alt

from rdkit import Chem
from rdkit import rdBase
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
from rdkit.Chem import PandasTools
from rdkit.Chem import RDConfig
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

from sklearn.decomposition import PCA

Then load dataset

moldf = PandasTools.LoadSDF(os.path.join(RDConfig.RDDocsDir, 'Book/data/cdk2.sdf'))
moldf['SMILES'] = moldf.ROMol.apply(Chem.MolToSmiles)
def mol2fparr(mol):
    arr = np.zeros((0,))
    fp = AllChem.GetMorganFingerprintAsBitVect(mol,2)
    DataStructs.ConvertToNumpyArray(fp, arr)
    return arr

The conduct PCA with molecular fingerprint for plot chemical space.
And make new dataset. cdk2.sdf has Cluster number, so I used the annotation for coloring.

pca = PCA(n_components=2)
X = np.asarray([mol2fparr(mol) for mol in moldf.ROMol])
print(X.shape)
res = pca.fit_transform(X)
print(res.shape)
moldf['PCA1'] = res[:,0]
moldf['PCA2'] = res[:,1]
moldf2 = moldf[['ID', 'PCA1', 'PCA2', 'SMILES' ]]
moldf2['Cluster'] = ["{:0=2}".format(int(cls)) for cls in moldf.loc[:,'Cluster']]

To make scatter plot in Altair, it is easy just call ‘alt.Cahrt.mark_point(pandas data frame)’
mark_* is the method which can access many kids of plots.
From the document, following plots are provided.

Mark Name Method Description Example
area mark_area() A filled area plot. Simple Stacked Area Chart
bar mark_bar() A bar plot. Simple Bar Chart
circle mark_circle() A scatter plot with filled circles. One Dot Per Zipcode
geoshape mark_geoshape() A geographic shape Choropleth Map
line mark_line() A line plot. Simple Line Chart
point mark_point() A scatter plot with configurable point shapes. Multi-panel Scatter Plot with Linked Brushing
rect mark_rect() A filled rectangle, used for heatmaps Simple Heatmap
rule mark_rule() A vertical or horizontal line spanning the axis. Candlestick Chart
square mark_square() A scatter plot with filled squares. N/A
text mark_text() A scatter plot with points represented by text. Bar Chart with Labels
tick mark_tick() A vertical or horizontal tick mark. Simple Strip Plot

Now I would like to make scatter plot, so I call mark_point.
“interactive()” method returns interactive plot in jupyter.
So After run the code, I can see interactive plot in notebook the plot returns tooltip when mouse over the point.

alt.Chart(moldf2).mark_point().encode(
           x = 'PCA1',
           y = 'PCA2',
           color = 'Cluster',
           tooltip = ['ID', 'SMILES']).interactive()

This library is interesting for me because it is easy to implement tooltip. I tried to embed SVG image to tooltip but it did not work. I would like to know how to embed image to the tooltip if it possible.

How to visualize your data is important because it receives different impression from different visualization even if data is same.

Reader who is interested in the post can found whole code from google colab or github. ;-)
https://colab.research.google.com/drive/1hKcWRBcQG51eGsbpDBF2gl6CoZsmVvTs
https://github.com/iwatobipen/playground/blob/master/plot_chemicalspace.ipynb

Advertisements

Build stacking Classification QSAR model with mlxtend #chemoinformatics #mlxtend #RDKit

I posed about the ML method named ‘blending’ somedays ago. And reader recommended me that how about try to use “mlxtend”.
When I learned ensemble learning package in python I had found it but never used.
So try to use the library to build model.
Mlxtend is easy to install and good document is provided from following URL.
http://rasbt.github.io/mlxtend/

Following code is example for stacking.
In ipython notebook…
Use base.csv for test and load some functions.

%matplotlib inline

import warnings
warnings.filterwarnings('ignore')
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
import numpy as np
import pandas as pd

df = pd.read_csv("bace.csv")

The dataset has pIC50 for objective value.

mols = [Chem.MolFromSmiles(smi) for smi in df.mol]
fps = [AllChem.GetMorganFingerprintAsBitVect(mol,2, nBits=1024) for mol in mols]
pIC50 = [i for i in df.pIC50]
Draw.MolsToGridImage(mols[:10], legends=["pIC50 "+str(i) for i in pIC50[:10]], molsPerRow=5)

Images of compounds is below.


Then calculate molecular fingerprint. And make binary activity array as y_bin.

X = []
for fp in fps:
    arr = np.zeros((1,))
    DataStructs.ConvertToNumpyArray(fp, arr)
    X.append(arr)
X = np.array(X)
y = np.array(pIC50)
y_bin = np.asarray(y>7, dtype=np.int)

Then load some classifier model from sklearn and split data for training and testing.

from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, balanced_accuracy_score, confusion_matrix
from sklearn.decomposition import PCA
from xgboost import XGBClassifier
from mlxtend.classifier import StackingClassifier
from mlxtend.plotting import plot_decision_regions
from mlxtend.plotting import plot_confusion_matrix
import numpy as np
x_train, x_test, y_train, y_test = train_test_split(X,y_bin, test_size=0.2)

To make stacking classifier, it is very simple just call StackingClassifier and set classifier and meta_classifier as arguments.
I use SVC as meta_classifier.

clf1 = RandomForestClassifier(random_state=794)
clf2 = GaussianNB()
clf3 = XGBClassifier(random_state=0)
clf4 = SVC(random_state=0)
clflist = ["RF", "GNB", "XGB", "SVC", "SCLF"]

sclf = StackingClassifier(classifiers=[clf1,clf2,clf3], meta_classifier=clf4)

Then let’s learn the data!

skf = StratifiedKFold(n_splits=5)
for j, (train_idx,test_idx) in enumerate(skf.split(x_train, y_train)):
    for i, clf in enumerate([clf1, clf2, clf3, clf4, sclf]):
        clf.fit(x_train[train_idx],y_train[train_idx])
        ypred = clf.predict(x_train[test_idx])
        acc = accuracy_score(y_train[test_idx], ypred)
        b_acc = balanced_accuracy_score(y_train[test_idx], ypred)
        print("round {}".format(j))
        print(clflist[i])
        print("accuracy {}".format(acc))
        print("balanced accuracy {}".format(b_acc))
        print("="*20)

> output
round 0
RF
accuracy 0.8148148148148148
balanced accuracy 0.8026786943947115
====================
round 0
GNB
accuracy 0.6625514403292181
balanced accuracy 0.680450351191296
====================
round 0
XGB
accuracy 0.8271604938271605
balanced accuracy 0.8136275995042005
====================
round 0
SVC
accuracy 0.7325102880658436
balanced accuracy 0.7072717256576229
====================
round 0
SCLF
accuracy 0.8148148148148148
balanced accuracy 0.8026786943947115
====================
round 1
RF
accuracy 0.7603305785123967
balanced accuracy 0.7534683684794672
====================
round 1
GNB
accuracy 0.640495867768595
balanced accuracy 0.6634988901220866
====================
round 1
XGB
accuracy 0.8140495867768595
balanced accuracy 0.8127081021087681
====================
round 1
SVC
accuracy 0.756198347107438
balanced accuracy 0.7414678135405106
===================
.....

Reader who is interested in stacking, you can find nice document here
http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/#example-1-simple-stacked-classification

And my all code is uploaded to myrepo on github.
http://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/mlxtend_test.ipynb

Mlxtend has many functions for building, analyzing and visualizing machine learning model and data. I will use the package more and more.

Vote Vote Vote #chemoinformatics

Somedays ago, I posted about ensemble classification method named ‘blending’. The method is not implemented in scikit-learn. So I am implementing the function now.
By the way, scikit-learn has an ensemble classification method named ‘VotingClassifer’.

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html#sklearn.ensemble.VotingClassifier
Following explanation from sklearn document.

The idea behind the VotingClassifier is to combine conceptually different machine learning classifiers and use a majority vote or the average predicted probabilities (soft vote) to predict the class labels. Such a classifier can be useful for a set of equally well performing model in order to balance out their individual weaknesses.

The classifier can combine many classifiers very easily.
The function has two modes, one is hard and the other is soft.
From document…
If ‘hard’, uses predicted class labels for majority rule voting. Else if ‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.

I used the classifier for QSAR modeling.

Following code, I used four classifier and BASE.csv from molecule net as test dataset.
Code is very simple! Just pass defined dictionary to VotingClassifier.

import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from xgboost import XGBClassifier

clf_dict = {'RF': RandomForestClassifier(n_estimators=100),
        'ETC': ExtraTreesClassifier(n_estimators=100),
        'GBC': GradientBoostingClassifier(learning_rate=0.05),
        'XGB': XGBClassifier(n_estimators=100),
        'SVC': SVC(probability=True, gamma='auto')}

voting_clf = VotingClassifier(estimators=[("RF", clf_dict["RF"]),
                                        ("GBC", clf_dict["GBC"]),
                                        ("XGB", clf_dict["XGB"]),
                                        ("SVC", clf_dict["SVC"])        
                                    ], voting='hard')

dataset = np.load("train.npz")['arr_0']
X = dataset[:,:-1]
y = dataset[:,-1]
idx = np.random.permutation(y.size)
X = X[idx]
y = y[idx]
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=794)
nfolds = 10
skf = StratifiedKFold(nfolds)
for i, (train, val) in enumerate(skf.split(train_X, train_y)):
    #print('fold {}'.format(i))
    X_train = train_X[train]
    y_train = train_y[train]
    X_val = train_X[val]
    y_val = train_y[val]
    voting_clf.fit(X_train, y_train)
y_pred = voting_clf.predict(test_X)
print("Voting!")
print(confusion_matrix(test_y, y_pred))
print(classification_report(test_y, y_pred))

rf_clf = RandomForestClassifier(n_estimators=100)
 
for i, (train, val) in enumerate(skf.split(train_X, train_y)):
    #print('fold {}'.format(i))
    X_train = train_X[train]
    y_train = train_y[train]
    X_val = train_X[val]
    y_val = train_y[val]
    rf_clf.fit(X_train, y_train)
y_pred = rf_clf.predict(test_X)
print("Random Forest!")
print(confusion_matrix(test_y, y_pred))
print(classification_report(test_y, y_pred))

svc_clf = SVC(probability=True, gamma='auto')
for i, (train, val) in enumerate(skf.split(train_X, train_y)):
    #print('fold {}'.format(i))
    X_train = train_X[train]
    y_train = train_y[train]
    X_val = train_X[val]
    y_val = train_y[val]
    svc_clf.fit(X_train, y_train)
y_pred = svc_clf.predict(test_X)
print("SV!")
print(confusion_matrix(test_y, y_pred))
print(classification_report(test_y, y_pred)) 

Then run the code.
In this example voting method does not outperform to random forest, support vector classifier. But it is worth to know that sklearn provides useful feature for ensemble learning I think. ;-)

iwatobipen$ python voting.py 
Voting!
[[10  0  0]
 [ 0 11  0]
 [ 0  1  8]]
              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00        10
         1.0       0.92      1.00      0.96        11
         2.0       1.00      0.89      0.94         9

   micro avg       0.97      0.97      0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Random Forest!
[[10  0  0]
 [ 0 11  0]
 [ 0  1  8]]
              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00        10
         1.0       0.92      1.00      0.96        11
         2.0       1.00      0.89      0.94         9

   micro avg       0.97      0.97      0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

SV!
[[10  0  0]
 [ 0 11  0]
 [ 0  1  8]]
              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00        10
         1.0       0.92      1.00      0.96        11
         2.0       1.00      0.89      0.94         9

   micro avg       0.97      0.97      0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Visualize pharmacophore in RDKit #RDKit

RDKit has pharmacophore feature assignment function. The function can retrieve molecular features based on pre-defined ph4core.
And RDKit IPythonconsole can draw molecules on ipython notebook.
Today I tried to visualize ph4core on notebook.
Code is very simple.

from rdkit import Chem
from rdkit.Chem import ChemicalFeatures
from rdkit import rdBase
from rdkit.RDPaths import RDDocsDir
from rdkit.RDPaths import RDDataDir
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
import os
print(rdBase.rdkitVersion)
IPythonConsole.ipython_useSVG=True
> 2018.09.1

First, load feature definition.

fdefFile = os.path.join(RDDataDir,'BaseFeatures.fdef')
featFact = ChemicalFeatures.BuildFeatureFactory(fdefFile)

Then calculate pharmacophore. And compute 2D cordes.

mols = [m for m in Chem.SDMolSupplier(os.path.join(RDDocsDir,"Book/data/cdk2.sdf"))]
featslists = [featFact.GetFeaturesForMol(mol) for mol in mols]
for mol in mols:
    AllChem.Compute2DCoords(mol)

Next I defined drawing function. To highlight ph4core, highlightAtomLists and legends are used as optional arguments of MolsToGridImage.

def drawp4core(mol, feats):
    atoms_list = {}
    for feat in feats:
        atom_ids = feat.GetAtomIds()
        feat_type = feat.GetType()
        atoms_list[feat_type] = atom_ids
    return Draw.MolsToGridImage([mol]*len(atoms_list), legends=list(atoms_list.keys()), highlightAtomLists=list(atoms_list.values()))

Test the function.

im = drawp4core(mols[1], featslists[1])
im

I could get following image.

The function worked and it could pick up pharmacophore of given molecule. ;-)

Generate possible list of SMLIES with RDKit #RDKit

In the computer vision, it is often used data augmentation technique for getting large data set. On the other hand, Canonical SMILES representations are used in chemoinformatics area.
RDKit UGM in last year, Dr. Esben proposed new approach for RNN with SMILES. He expanded 602 training molecules to almost 8000 molecules with different smiles representation technique.
https://github.com/rdkit/UGM_2017/blob/master/Presentations/Bjerrum_RDKitUGM_Smiles_Enumeration_for_RNN.pdf
This approach seems works well.
In the UGM hackathon at this year, this random smiles generate function is implemented and it can call from new version of RDKit!
I appreciate rdkit developers!

It is very easy to use, pls see the code below.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
print(rdBase.rdkitVersion)
>2018.09.1

I used kinase inhibitor as an example.

testsmi = 'CC(C1=C(C=CC(=C1Cl)F)Cl)OC2=C(N=CC(=C2)C3=CN(N=C3)C4CCNCC4)N'
mol = Chem.MolFromSmiles(testsmi)
mol


Default output of MolToSmiles is canonical manner.

print(Chem.MolToSmiles(mol))
>CC(Oc1cc(-c2cnn(C3CCNCC3)c2)cnc1N)c1c(Cl)ccc(F)c1Cl

But if you set MolToSmiles with doRandom=True option, the function return random but valid SMILES.

mols = []
for _ in range(50):
  smi = Chem.MolToSmiles(mol, doRandom=True)
  print(smi)
  m = Chem.MolFromSmiles(smi)
  mols.append(m)

>Fc1c(Cl)c(C(Oc2cc(-c3cn(nc3)C3CCNCC3)cnc2N)C)c(cc1)Cl
>O(c1cc(-c2cnn(c2)C2CCNCC2)cnc1N)C(c1c(Cl)c(ccc1Cl)F)C
>--snip--
>c1(N)ncc(-c2cnn(C3CCNCC3)c2)cc1OC(c1c(Cl)ccc(F)c1Cl)C
#check molecules
Draw.MolsToGridImage(mols, molsPerRow = 10)

Different SMILES but same molecule!

There are many deep learning approaches which use SMIELS as input. It is useful for these models to augment input data I think.

I uploaded my example code on google colab and github my repo.

Colab
https://colab.research.google.com/drive/1dMmgCpskrfI1afh3qmPdv8ixkGTuOCDH

github
https://github.com/iwatobipen/playground/blob/master/random_smiles_rdkit.ipynb

nbviewer
http://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/random_smiles_rdkit.ipynb

Tracking progress of machine learning #MachineLearning

To conduct machine learning it is needed to optimize hyper parameters.
For example scikit-learn provides grid search method. And you know there are several packages to do that such as hyperopt or gyopt etc. How do you mange builded models? It is difficult for me.
Recently I am interested in mlflow . MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It tackles three primary functions.
MLflow can track each model of hyper parameters and serve the models and also it can provide good web UI.
I used it in very simple example.
Code is below.
At first I got sample data and visualize the data set with PCA.

# on the ipython notebook
%matplotlib inline
!wget https://raw.githubusercontent.com/mlflow/mlflow/master/examples/sklearn_elasticnet_wine/wine-quality.csv -P ./data/
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cm
import pandas as pd
import numpy as np

from sklearn.decomposition import PCA
data = pd.read_csv('./data/wine-quality.csv')

cmap = plt.get_cmap("Blues", len(data.quality.unique()))
pca = PCA()
wine_pca = pca.fit_transform(data.iloc[:,:-1])
plt.scatter(wine_pca[:,0], wine_pca[:,1], c=data.quality, cmap=cmap)
plt.xlim(np.min(wine_pca[:,0]), np.max(wine_pca[:,0]))
plt.ylim(np.min(wine_pca[:,1]), np.max(wine_pca[:,1]))
plt.colorbar()

Next train function is defined.
mlflow.log_param function can track scoring parameters and mlflow.sklearn.log_model can store the model.
After running the code, mlruns folder is generated in current directory and stored data.


def train():

    import os
    import warnings
    import sys
    import pandas as pd
    import numpy as np
    from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
    from sklearn.model_selection import train_test_split
    from sklearn.svm import SVR
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import StratifiedKFold
    from sklearn.model_selection import cross_val_score
    import mlflow
    import mlflow.sklearn

    def eval_metrics(actual, pred):
        rmse = np.sqrt(mean_squared_error(actual, pred))
        mae = mean_absolute_error(actual, pred)
        r2 = r2_score(actual, pred)
        return rmse, mae, r2

    warnings.filterwarnings("ignore")
    np.random.seed(40)
    data = pd.read_csv('./data/wine-quality.csv')
    train, test = train_test_split(data)

    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]
    param = {'C':[0.01, 0.1, 1, 10, 100, 1000, 10000 ],
             'gamma':[1.0, 1e-1, 1e-2, 1e-3, 1e-4]}
    for c in param['C']:
        for g in param['gamma']:
            with mlflow.start_run():
                print(c,g)
                skf = StratifiedKFold(n_splits=5)
                svr = SVR(C=c, gamma=g)score = cross_val_score(svr, train_x, train_y, cv=skf, n_jobs=-1)
                svr.fit(train_x, train_y)
                predicted_qualities = svr.predict(test_x)
                (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)
                print("  RMSE: %s" % rmse)
                print("  MAE: %s" % mae)
                print("  R2: %s" % r2)
                mlflow.log_param("C", c)
                mlflow.log_param("gamma", g)
                mlflow.log_metric("r2", r2)
                mlflow.log_metric("rmse", rmse)
                mlflow.log_metric("mae", mae)
                mlflow.sklearn.log_model(svr, "model")

Run the function.

train()
0.01 1.0
  RMSE: 0.876717955304063
  MAE: 0.6586558965180616
  R2: 0.007250505904323745
0.01 0.1
  RMSE: 0.872902609847314
  MAE: 0.6523680676966712
  R2: 0.015872299345786156
--snip--
10000 0.0001
  RMSE: 0.7902872331540974
  MAE: 0.570097398346025
  R2: 0.19334133272639453
'''

After running the code, MLflow can provide very useful webUI. To access the UI, just type following command from terminal ;-).
And then access http://127.0.0.1:5000/#/.

iwatobipen$ mlflow server

I can check the summary of the training with metrics like below.

And each model is stored. I can see details and access each model like below.

It is useful to manage many experiments and many models I think.

Read maestro format file from RDKit

RDKitter knows that Schrodinger contributes RDKit I think.
https://www.schrodinger.com/news/schr%C3%B6dinger-contributes-rdkit

Schrodinger provides many computational tools for drug discovery, that is not only GUI tool but also python API. Many tool can call from python and also RDKit. And RDKit can read maestro file vise versa.
It is easy to do it like reading SDFiles.
I am writing the blog post on my personal PC and I do not have schrodinger software’s license. So I got test files from schrodingers github repository.
https://github.com/schrodinger

On ipython notebook

!wget https://raw.githubusercontent.com/schrodinger/maeparser/master/test/test.mae
!wget https://github.com/schrodinger/maeparser/raw/master/test/test2.maegz
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import rdmolfiles
from rdkit.Chem.Draw import rdDepictor
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
import gzip
rdDepictor.SetPreferCoordGen(True)
rdBase.rdkitVersion
>'2018.09.1'

Read mols from mae format.

maemols = rdmolfiles.MaeMolSupplier("test.mae")
mols = [m for m in maemols]
Draw.MolsToGridImage(mols)

Read mols form maegz

maemols2 = rdmolfiles.MaeMolSupplier(gzip.open("test2.maegz"))
mols2 = [m for m in maemols2]
Draw.MolsToGridImage(mols2)

Both cases work fine.
If user want to integrate rdkit and schrodinger tools, the method will be useful.
Note book uploaded my github repo.
https://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/maeparser.ipynb