You know recurrent neural network (RNN) is universally used in machine learning for natural language, handwriting, speech and also chemistry.
Recently there are lots of reports that use RNN against SMILES strings to solve chemoinformatics problems. Today I read a short article published from Prof. Gisbert Schneider’s group.
URL is below.
They applied RNN ( LSTM ) for designing of antimicrobial peptides(AMPs). The strategy is basic. First added tag to peptide sequence and padded fixed length. Then encoded one hot vector.
I think key point of their method is selection of training peptides. They removed the sequences that containing Cys because Cys residues potentially forming S-S bridges. It will complicate problems.
Finally they evaluate trained model and the model generate novel peptides that have suitable hydrophobic nature and length.
I think their strategy (remove Cys residues) is nice and fit to RNN.
BTW, regarding the method machine learns peptides as bunch of strings but does not lean features of each amino acid. This is same as SMILES in chemoinformatics area.
I have no answer about it.
If reader who is interested in the approach you can get source code from following URL.