Generate dataset for deep learning

I discussed with experts about the issue of deep learning in drug discovery. In my understanding, there are two major problems. First, we can't use large amount of dataset for building model in the early stage of project. Second, we need to find descriptor of molecules for DL. BTW, in the image classification area, there