I introduced rdkit-sys which is wrapper of rdkit for rust. The package development is ongoing so code is chaged frequentry. To use the package, user need to build rdkit with static link ON option. Build rdkit-sys with static link (.a) is efficient way for portability of the code however it means that it’s difficult to use conda env rdkit because, conda env provides only dynamic link (.so).
So I would like to build rdkit-sys and call rdkit from rust with dylib.
After several try and error, I found solution to do it. I would like to share my trial.
At first, I cloned rdkit-sys in my PC.
$ gh repo clone rdkit-rs/rdkit-sys
$ cd rdkit-sys
Then edit build.rs script show below.
fn main() {
if std::env::var("DOCS_RS").is_ok() {
return;
}
env_logger::init();
let library_root = match (std::env::consts::OS, std::env::consts::ARCH) {
("macos", "x86_64") => "/usr/local",
("macos", "aarch64") => "/opt/homebrew",
//("linux", _) => "/usr",
//following path is my conda env's path
("linux", _) => "/home/iwatobipen/miniconda3/envs/chemo_py310",
(unsupported_os, unsupported_arch) => panic!(
"sorry, rdkit-sys doesn't support {} on {} at this time",
unsupported_os, unsupported_arch
),
};
let brew_lib_path = format!("{}/lib", library_root);
let include = format!("{}/include", library_root);
let rdkit_include = format!("{}/include/rdkit", library_root);
let dir = std::fs::read_dir("src/bridge").unwrap();
let rust_files = dir
.into_iter()
.filter_map(|p| match p {
Ok(p) => {
if p.metadata().unwrap().is_file() {
Some(p.path())
} else {
None
}
}
Err(_) => None,
})
.filter(|p| !p.ends_with("mod.rs"))
.collect::<Vec<_>>();
let mut cc_paths = vec![];
let wrapper_root = std::path::PathBuf::from("wrapper");
for file in &rust_files {
let file_name = file.file_name().unwrap();
let file_name = file_name.to_str().unwrap();
let base_name = &file_name[0..file_name.len() - 3];
let cc_path = wrapper_root.join("src").join(format!("{}.cc", base_name));
let meta = std::fs::metadata(&cc_path).unwrap();
if !meta.is_file() {
panic!("{} must exist", cc_path.display())
}
cc_paths.push(cc_path);
let h_path = wrapper_root
.join("include")
.join(format!("{}.h", base_name));
let meta = std::fs::metadata(&h_path).unwrap();
if !meta.is_file() {
panic!("{} must exist", h_path.display())
}
}
cxx_build::bridges(rust_files)
.files(cc_paths)
.include(include)
.include(rdkit_include)
.include(std::env::var("CARGO_MANIFEST_DIR").unwrap())
.flag("-std=c++14")
.warnings(false)
// rdkit has warnings that blow up our build. we could enumerate all those warnings and tell
// the compiler to allow them... .warnings_into_errors(true)
.compile("rdkit");
println!("cargo:rustc-link-search=native={}", brew_lib_path);
// println!("cargo:rustc-link-lib=static=c++");
for lib in &[
"Catalogs",
"ChemReactions",
"ChemTransforms",
"DataStructs",
"Descriptors",
"FileParsers",
"Fingerprints",
"GenericGroups",
"GraphMol",
"MolStandardize",
"RDGeneral",
"RDGeometryLib",
"RingDecomposerLib",
"SmilesParse",
"Subgraphs",
"SubstructMatch",
] {
//swich static link to dynamic link!!!
//println!("cargo:rustc-link-lib=static=RDKit{}_static", lib);
println!("cargo:rustc-link-lib=dylib=RDKit{}", lib);
}
println!("cargo:rustc-link-lib=static=boost_serialization");
}
After the modification, I set up LD_LIBRARY_PATH.
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib;$LD_LIBRARY_PATH
After that, I could cargo build command in rdkit-sys folder ;). Then cloned rdkit-rs/rdkit with the crate. To do that I edited Cargo.toml like below.
$ gh repo clone rdkit-rs/rdkit
$ cd rdkit
$ vim Cargo.toml
[package]
name = "rdkit"
version = "0.2.6"
edition = "2021"
authors = ["Xavier Lange <xrlange@gmail.com>"]
license = "MIT"
description = "High level RDKit functionality for rust"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
cxx = "1"
log = "0.4"
#rdkit-sys = "0.2.7"
rdkit-sys = { path = "../rdkit-sys"} # I used modified version of rdkit-sys
flate2 = "1"
After that, I wrote code with these packages.
my rdkitest code is below.
#Cargo.toml
[package]
name = "rdkrust"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
rdkit = { path = "/home/iwatobipen/dev/sandbox/rusttest/rdkit" }
#src/main.rs
use std::env;
use rdkit;
fn main() {
let args: Vec<String> = env::args().collect();
let sdgz_filename = &args[1];
println!("filename is {}", sdgz_filename);
let mol_block_iter =
rdkit::MolBlockIter::from_gz_file(sdgz_filename, false, false, false).unwrap();
let mol_list = mol_block_iter.collect::<Vec<_>>();
for m in mol_list {
let smi = m.unwrap().as_smile();
println!("{}", smi)
}
println!("Done");
}
The code will read sdf.gz and retrieve molecules and then convert molecules to SMILES.
OK, let’s check the code.
$ cargo build
#Then
$ target/debug/rdkrust cdk2.sdf.gz
filename is cdk2.sdf.gz
[H]C1=NC2=C(N=C(N([H])[H])N=C2OC([H])([H])C(=O)C([H])(C([H])([H])[H])C([H])([H])[H])N1[H]
[H]C1=NC2=C(OC([H])([H])C3([H])OC([H])([H])C([H])([H])C3([H])[H])N=C(N([H])[H])N=C2N1[H]
'''snip
'''
Done
It seems work fine. In summary, rdkit-rs activity is really cool project for chemoinformatics because rust is really efficient language. I would like to learn rust more and more and develop useful apps for chemoinformatics.
Thanks for reading.