Recently Rust is becoming popular language and I have interest Rust. There is a hurdle for me to move programming language from python to others because I would like to use RDKit from my coding environment for chemoinformatics tasks ;)
As many chemoinformaticians know that recently rdkit provides new C Foreign Function Interface (CFFI). And many language supports cffi also Rust supports it too. It means that we can use minimal function of rdkit from Rust. It sound great isn’t it.
And I found really cool project in github named ‘rdkitcffi’ which is rdkit wrapper for Rust. https://github.com/chrissly31415/rdkitcffi
So I tried to use the crate ;)
At first, install rust and install other required packages and clone rdkitcffi.
# https://www.rust-lang.org/tools/install
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# my os is ubuntu 20.04 LTS
$ sudo apt-get install build-essential
$ sudo apt-get install libclang-dev
$ gh repo clone chrissly31415/rdkitcffi
The build.rs of an original code defined ld_library_path as relative but to use the package as library, I modified it from relative to absolute path. And added ‘rdkitcffi_linux/linux-64/’ path to LD_LIBRARY_PATH (environment variable).
# rdkitcffi/build.rs
#from relative path
let shared_lib_dir = "./lib/rdkitcffi_linux/linux-64/";
#to absolute path
let shared_lib_dir = "/home/user/hogehoge/lib/rdkitcffi_linux/linux-64/";
Now almost there, let make new project. And add rdkitcffi in dependency of Cargo.toml. rdkitcffi is not published in cargo.io so local crate is used.
$ cargo new rdkrust
$ cat rdkrust/Cargo.toml
[package]
name = "rdkrust"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
rdkitcffi = {path="/home/hogehoge/rusttest/rdkitcffi"}
Then let’s write src/main.rs for compound descriptor calculation from sdf.
# ./src/main.rs
use std::env;
use rdkitcffi::{Molecule, read_sdfile};
fn main() {
let args: Vec<String> = env::args().collect();
let sd_filename = &args[1];
println!("filename is {}", sd_filename);
let mut mol_opt_list: Vec<Option<Molecule>> = read_sdfile(sd_filename);
let mut mol_list: Vec<Molecule> = mol_opt_list.into_iter().filter_map(|m| m).collect();
mol_list.iter_mut().for_each(|m| m.remove_all_hs());
for m in mol_list {
let desc = m.get_descriptors();
println!("{}", desc)
}
println!("Done");
}
OK let’s build the code.
$ cargo build
# build rdkitcffi wrapper and sample script
After the process, main command tool is generated, the tool can calculate molecular descriptors and returns it like below.
rust_rdkit$ time ./target/debug/rdkrust cdk2.sdf
filename is cdk2.sdf
{"exactmw":235.10692,"amw":235.247,"lipinskiHBA":7.0,"lipinskiHBD":3.0,"NumRotatableBonds":4.0,"NumHBD":2.0,"NumHBA":6.0,"NumHeavyAtoms":17.0,"NumAtoms":30.0,"NumHeteroatoms":7.0,"NumAmideBonds":0.0,"FractionCSP3":0.4,"NumRings":2.0,"NumAromaticRings":2.0,"NumAliphaticRings":0.0,"NumSaturatedRings":0.0,"NumHeterocycles":2.0,"NumAromaticHeterocycles":2.0,"NumSaturatedHeterocycles":0.0,"NumAliphaticHeterocycles":0.0,"NumSpiroAtoms":0.0,"NumBridgeheadAtoms":0.0,"NumAtomStereoCenters":0.0,"NumUnspecifiedAtomStereoCenters":0.0,"labuteASA":97.42084,"tpsa":106.78,"CrippenClogP":0.53899,"CrippenMR":61.4361,"chi0v":9.59729,"chi1v":5.19743,"chi2v":2.25934,"chi3v":2.25934,"chi4v":1.23253,"chi0n":9.59729,"chi1n":5.19743,"chi2n":2.25934,"chi3n":2.25934,"chi4n":1.23253,"hallKierAlpha":-2.17999,"kappa1":11.3097,"kappa2":4.36167,"kappa3":2.32463,"Phi":2.90171}
---snip---
real 0m0.085s
user 0m0.077s
sys 0m0.004s
rdkitcffi supports not only descriptor calculation but also other function of rdkit. And Rust has lots of useful package like python pandas, plotly, scikit-learn etc….
I would like to write more code with rust.
Today’s my code is uploaded my githu repo. Thanks for reading.