Call RDKit from Rust with conda env #RDKit #Rust #Chemoinformatics

I introduced rdkit-sys which is wrapper of rdkit for rust. The package development is ongoing so code is chaged frequentry. To use the package, user need to build rdkit with static link ON option. Build rdkit-sys with static link (.a) is efficient way for portability of the code however it means that it’s difficult to use conda env rdkit because, conda env provides only dynamic link (.so).

So I would like to build rdkit-sys and call rdkit from rust with dylib.

After several try and error, I found solution to do it. I would like to share my trial.

At first, I cloned rdkit-sys in my PC.

$ gh repo clone rdkit-rs/rdkit-sys
$ cd rdkit-sys

Then edit build.rs script show below.

fn main() {
    if std::env::var("DOCS_RS").is_ok() {
        return;
    }

    env_logger::init();

    let library_root = match (std::env::consts::OS, std::env::consts::ARCH) {
        ("macos", "x86_64") => "/usr/local",
        ("macos", "aarch64") => "/opt/homebrew",
        //("linux", _) => "/usr",
        //following path is my conda env's path
        ("linux", _) => "/home/iwatobipen/miniconda3/envs/chemo_py310",
        (unsupported_os, unsupported_arch) => panic!(
            "sorry, rdkit-sys doesn't support {} on {} at this time",
            unsupported_os, unsupported_arch
        ),
    };

    let brew_lib_path = format!("{}/lib", library_root);
    let include = format!("{}/include", library_root);
    let rdkit_include = format!("{}/include/rdkit", library_root);

    let dir = std::fs::read_dir("src/bridge").unwrap();
    let rust_files = dir
        .into_iter()
        .filter_map(|p| match p {
            Ok(p) => {
                if p.metadata().unwrap().is_file() {
                    Some(p.path())
                } else {
                    None
                }
            }
            Err(_) => None,
        })
        .filter(|p| !p.ends_with("mod.rs"))
        .collect::<Vec<_>>();

    let mut cc_paths = vec![];

    let wrapper_root = std::path::PathBuf::from("wrapper");
    for file in &rust_files {
        let file_name = file.file_name().unwrap();
        let file_name = file_name.to_str().unwrap();
        let base_name = &file_name[0..file_name.len() - 3];

        let cc_path = wrapper_root.join("src").join(format!("{}.cc", base_name));
        let meta = std::fs::metadata(&cc_path).unwrap();
        if !meta.is_file() {
            panic!("{} must exist", cc_path.display())
        }
        cc_paths.push(cc_path);

        let h_path = wrapper_root
            .join("include")
            .join(format!("{}.h", base_name));
        let meta = std::fs::metadata(&h_path).unwrap();
        if !meta.is_file() {
            panic!("{} must exist", h_path.display())
        }
    }

    cxx_build::bridges(rust_files)
        .files(cc_paths)
        .include(include)
        .include(rdkit_include)
        .include(std::env::var("CARGO_MANIFEST_DIR").unwrap())
        .flag("-std=c++14")
        .warnings(false)
        // rdkit has warnings that blow up our build. we could enumerate all those warnings and tell
        // the compiler to allow them... .warnings_into_errors(true)
        .compile("rdkit");

    println!("cargo:rustc-link-search=native={}", brew_lib_path);
    // println!("cargo:rustc-link-lib=static=c++");

    for lib in &[
        "Catalogs",
        "ChemReactions",
        "ChemTransforms",
        "DataStructs",
        "Descriptors",
        "FileParsers",
        "Fingerprints",
        "GenericGroups",
        "GraphMol",
        "MolStandardize",
        "RDGeneral",
        "RDGeometryLib",
        "RingDecomposerLib",
        "SmilesParse",
        "Subgraphs",
        "SubstructMatch",
    ] {
        //swich static link to dynamic link!!!
        //println!("cargo:rustc-link-lib=static=RDKit{}_static", lib);
        println!("cargo:rustc-link-lib=dylib=RDKit{}", lib);
    }
    println!("cargo:rustc-link-lib=static=boost_serialization");
}

After the modification, I set up LD_LIBRARY_PATH.

export LD_LIBRARY_PATH=$CONDA_PREFIX/lib;$LD_LIBRARY_PATH

After that, I could cargo build command in rdkit-sys folder ;). Then cloned rdkit-rs/rdkit with the crate. To do that I edited Cargo.toml like below.

$ gh repo clone rdkit-rs/rdkit
$ cd rdkit
$ vim Cargo.toml
[package]
name = "rdkit"
version = "0.2.6"
edition = "2021"
authors = ["Xavier Lange <xrlange@gmail.com>"]
license = "MIT"
description = "High level RDKit functionality for rust"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
cxx = "1"
log = "0.4"
#rdkit-sys = "0.2.7"
rdkit-sys = { path = "../rdkit-sys"} # I used modified version of rdkit-sys
flate2 = "1"

After that, I wrote code with these packages.

my rdkitest code is below.

#Cargo.toml
[package]
name = "rdkrust"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
rdkit = { path = "/home/iwatobipen/dev/sandbox/rusttest/rdkit" }
#src/main.rs
use std::env;
use rdkit;

fn main() {
    let args: Vec<String> = env::args().collect();
    let sdgz_filename = &args[1];

    println!("filename is {}", sdgz_filename);
    let mol_block_iter =
        rdkit::MolBlockIter::from_gz_file(sdgz_filename, false, false, false).unwrap();
    let mol_list = mol_block_iter.collect::<Vec<_>>();
    for m in mol_list {
        let smi = m.unwrap().as_smile();
        println!("{}", smi)
    }
    println!("Done");
}
    

The code will read sdf.gz and retrieve molecules and then convert molecules to SMILES.

OK, let’s check the code.

$ cargo build
#Then
$ target/debug/rdkrust cdk2.sdf.gz
filename is cdk2.sdf.gz
[H]C1=NC2=C(N=C(N([H])[H])N=C2OC([H])([H])C(=O)C([H])(C([H])([H])[H])C([H])([H])[H])N1[H]
[H]C1=NC2=C(OC([H])([H])C3([H])OC([H])([H])C([H])([H])C3([H])[H])N=C(N([H])[H])N=C2N1[H]
'''snip
'''
Done

It seems work fine. In summary, rdkit-rs activity is really cool project for chemoinformatics because rust is really efficient language. I would like to learn rust more and more and develop useful apps for chemoinformatics.

Thanks for reading.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: