It become large data file when large amount of molecules are saved as SDF format. So I often convert to SMILES from SDF.
I use MolToSmiles function to do that. But, new version of RDKit has convenient method to convert file format.
Here is sample snippet.
from rdkit import Chem from rdkit.Chem.ChemUtils import SDFToCSV f = open( 'out.csv', 'w' ) suppl = Chem.SDMolSupplier( 'cdk2.sdf' ) # convert sdf to smiles SDFToCSV.Convert( suppl, f ) f.close()
Now I got out.csv file. Check the file.
iwatobipen$ head -n 10 out.csv SMILES,id,Cluster,MODEL.SOURCE,MODEL.CCRATIO,r_mmffld_Potential_Energy-OPLS_2005,r_mmffld_RMS_Derivative-OPLS_2005,b_mmffld_Minimization_Converged-OPLS_2005 CC(C)C(=O)COc1nc(N)nc2[nH]cnc12,ZINC03814457,1,CORINA 3.44 0027 09.01.2008,1,-78.6454,0.000213629,1 Nc1nc(OCC2CCCO2)c2nc[nH]c2n1,ZINC03814459,2,CORINA 3.44 0027 09.01.2008,1,-67.4705,9.48919e-05,1 Nc1nc(OCC2CCC(=O)N2)c2nc[nH]c2n1,ZINC03814460,2,CORINA 3.44 0027 09.01.2008,1,-89.4303,5.17485e-05,1 Nc1nc(OCC2CCCCC2)c2nc[nH]c2n1,ZINC00023543,3,CORINA 3.44 0027 09.01.2008,1,-70.2463,6.35949e-05,1 Nc1nc(OCC2CC=CCC2)c2nc[nH]c2n1,ZINC03814458,3,CORINA 3.44 0027 09.01.2008,1,-72.9091,6.51479e-05,1 Cn1cnc2c(NCc3ccccc3)nc(NCCO)nc21,ZINC01641925,3,CORINA 3.44 0027 09.01.2008,1,-42.2404,0.000120409,1 CCC(CO)Nc1nc(NCc2ccccc2)c2ncn(C(C)C)c2n1,ZINC01649340,3,CORINA 3.44 0027 09.01.2008,1,-33.4734,7.14544e-05,1 COc1ccc(CNc2nc(N(CCO)CCO)nc3c2ncn3C(C)C)cc1,ZINC01487345,3,CORINA 3.44 0027 09.01.2008,1,-23.1357,8.18592e-05,1 Nc1nc(N)c(N=O)c(OCC2CCCCC2)n1,ZINC03814479,4,CORINA 3.44 0027 09.01.2008,1,-112.542,8.83166e-05,1
🎵🎵