Recently I’m interested in the following article.
https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c00204
The author tried to detect chemical series (scaffold) like medicinal chemist. In the drug discovery project chemical series / scaffold is very important concept to analyze compounds SAR but it is fuzzy. As chemoinformatitian know Bemis-Murcko scaffold is one of the solution for systemic detection of chemical series but BM scaffold isn’t mach chemist feeling sometime.
So more chemist friendly definition of the chemical series flow seems useful.
The author used UPGMA (unweighted pair group method with arithmetic mean) to cluster the molecules structure and calculate MCS to detect scaffold and then analyze frequency of the scaffold. Final step is interesting because more frequent MCS such as benzene etc. is too common and not suitable for scaffold.
So their approach seems very reasonable for me.
BTW, this approach seems effective for retrospective analysis to detect chemical series of each project. But chemist defines chemical series at the beginning of their project. There are few compound data at the time. So chemist intuition/feeling/common sense are required to define the chemical serirs.
I feel it is very interesting point. Recently AI based drug discovery is raising in these are. But current AI(ML) can’t move 0 to 1 even if it can move 1 to 2 or 10 ;) (it is my personal opinion..)
So important point is The right person(tool) in the right place. It’s difficult things but I should think at that point.