I discussed with experts about the issue of deep learning in drug discovery.
In my understanding, there are two major problems. First, we can’t use large amount of dataset for building model in the early stage of project. Second, we need to find descriptor of molecules for DL.
BTW, in the image classification area, there is practical method for fighting against over fitting with little amount of data.
Keras has good function for generate additional dataset for image classification.
Today, I tried the code with penguins image. ;-)
Following code runs on jupyter notebook.
# image generator from keras.preprocessing.image import ImageDataGenerator, array_to_img, load_img, img_to_array datagen = ImageDataGenerator( rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest') img = load_img( 'imdata/Emperor_penguins.jpg' ) x = img_to_array( img ) plt.imshow(x)
Loop and use flow method could generate additional data.
x = x.reshape((1,)+x.shape) i = 0 for batch in datagen.flow( xx, batch_size=1, save_to_dir='imdata', save_prefix='pen', save_format='jpeg'): i += 1 if i > 20: break
Hmm. it’s useful for image classification. Can I apply the method to QSAR ?
How to describe molecule.
Above code was described in keras blog.
Following URL is very useful and impressive for me.