Neural style transfer

For neural style transfer, we need to have content and style representation. Pretrained models are used as feature extractors for content representation. The image is passed through the network and an intermediate layer is taken as a feature map that carries information about the general content, instead of the exact pixel values of the input image. Feature maps of the content and output image are then compared in the content loss function which we are trying to minimize, so that the output image generates the same response at a certain layer of the CNN as the content image.


For style representation, a feature space is built on top of the filter responses in multiple layers. It consists of the correlations between different filter responses (feature maps) obtained as the Gram matrix of vectorised feature maps in each layer. Using multiple layers, a multi-scale representation of the input image is obtained which captures the texture information but not the spatial information from the style image. Gram matrices of feature maps of the style and the output image are compared in the style loss function.


The task of neural style transfer now becomes an optimisation problem of minimizing the joint loss function which we'll solve using gradient descent.


Scroll down for the examples.


centered image

Iterations of the gradient descent algorithm. We used gradient descent to optimize joint loss function obtained as a linear combination of content loss and style loss, with content weight (alpha) and style weight (beta) as the coefficients of the linear combination. By varying these parameters, we can tweak out result image to look more like the style image, or more like the content image.


centered image

The effect of content weight (α) and style weight (β).

centered image

When using output of blocks closer to the input, we see that more content is preserved in the style transfer, since the content representation is less compressed.

centered image

When using fewer blocks for the comparison of style, we lose style information and the style transfered image will look more like the content image.



What if Picasso painted Mona Lisa?

centered image