SMILES is widely used in chemoinformatics area due to its small datasize and easy to handle it in compound generation etc. However SMILES string can’t keep many kinds of atomic information except of chirality, charge, atom_mapping number.
ChemAxon developed Extended SMILES strings named CXSMILES. The details are described following URL.
And recent version of rdkit can handle these kinds of molecular representation. Chem.MolToCXSmiles and Chem.MolFromSmiles which can read not only basic smiles but also cxsmiles.
Following code is an example to compare default smiles and cxsmiles. I defined simple molecule and calculate atomic charge with extended huckel theory which is implemented rdkit. Then added atomic index and atomic charge in to each atom.
After that the molecule is converted basic SMILES and CSXSMILES and the constructed molecule from these SMILES.
As you can see molecule from CXSMILES can keep atomic properties which are defined before converting CXSMILES. It’s interesting for me that cxsmiles can keep many information in its strings.
Have a nice weekend :)