Most of chemoinformatitian will think that C6H6 means benzene and its SMILES strings will be ‘c1ccccc1’.
However how do you think that how many possible combinations will be generated from molecular formula C6H6?
…..
Yah, it’s interesting but difficult question.
Recently I read interesting article published from Jounral of chemoinformaitcs. The title is ‘Surge: a fast open-source chemical graph generator’.
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-022-00604-9
The authors developed a fast chemical graph generator which generates molecules from formula.
To generate chemical graph from formula, several steps are required 1. generate graph generation and check automorphism, bond multipicity.
In the case of C6H6, over 200 molecules are generated with surge!!!
Fortunately, binay version of surge is provided from following URL.
https://structuregenerator.github.io/
So I used to it. At first, I got program from the URL above and generate molecules from formula C6H6.
$ $ ./surge-linux-v1.0 -o hoge.sdf C6H6
Then hoge.sdf was generated. And I checked generated structure.
As described in the article, surge has a limitation. Current version doesn’t perform a Huckel aromaticity test. It means surge will generate dupilicates structure for kekule versions of aromatic rings.
However it works fast and interesting tool for molecular generation. BTW it’s difficult to filter from generated molecules with desired compound properties in the drug discovery field.