Exploring Google Deep Dream

I did this project for an Image Recognition class in 2018.
You can view the complete project here.

This is the output of one of the final layers of the model. This layer seems to specialize in complex objects such as structures, plants, vehicles and even some people.

Convolutional Neural Networks (CNNs) have revolutionized the field of image recognition and image classification. CNNs have been used to segment images, identify multiple objects in images and classify images as a whole. However, recent research has begun to formalize the use of the generative properties of CNNs to create things. Google Deep Dream is one such generative model based on CNNs that can be used to generate images by the amplifying categories it was trained to recognize.

The deep dream algorithm provides insight into the 'black box' of deep learning models by allowing us to look into the outputs of each layer of a neural network and check what it has learned. This is extremely important because it allows researchers to look at what features are being learned by the model. In other words, it allows researchers to see what part of the image the network 'thinks' marks it as an image of a cat or dog for instance.

Past research has found that inverting a CNN model trained to classify images could help reconstruct different aspects of the original image due to various layers of the network retaining different information about the image. While the deep dream framework has been used to informally explore the outputs of various neurons and layers in the network, a more formal exploration of what these layers produce across various image datasets can shed light on a more generalized answer to what these individual layers may be learning. Such a formalization could help check for correctness of representation in other networks.

Motivated by this, I was interested in answering the following question:

Do categories of objects that appear most in the training dataset appear most during the generation phase as well?

In other words, given that different layers remember different properties of images, what things do they seem to learn? I also wanted to see whether irrespective of this, do higher level layers tend to remember certain higher level objects (For eg. cars, structures, animals, etc.) much better than the others? And are these objects the ones that occur the most frequently in the original dataset?

This is interesting because it can indicate whether image recognition algorithms or models are biased towards certain objects in the dataset and help see which objects these are. This can then guide researchers to modify their algorithm to account for sparse object categories or see which images can supplement their dataset to create a more balanced model. Additionally, this information could also guide digital artists who wish to generate art using the Deep Dream framework.

You may also like