Learn to transfer artistic style of an image using Neural style transfer


Learning about neural style transfer is a great place to start if you want to improve in Convolutional Neural Networks (CNN).
In this article, we will be implementing neural style transfer in python using the VGG19 model and applying the styles of one image to another. The codes are implemented using the mathematics from the paper "A Neural Algorithm of Artistic Style" as a reference.
Let's start!
What is neural style transfer
Neural style transfer is a technique that takes two input images- a content image and a style reference image and gives an output image called a stylized image. This image has the same content as the content image and has a style similar to the style image.
The architecture of the VGG-19 model
We will be using this model for our style transfer so let's understand what happens in its architecture first.
Extracting content
Along the processing hierarchy of the network, the input image is transformed into representations that increasingly care about the actual content of the image compared to its detailed pixel values.
We, refer to the feature responses in higher layers of the network as the content representation.
The second filter of the fifth convolutional layer (conv5_2) of the pre-trained VGG-19 network is used as a content extractor.
Extracting style
To obtain a representation of the style of an input image, we use correlations between the different filter responses across the different parts of the image. We obtain a stationary, multi-scale representation of the input image, which captures its texture information but not the global arrangement.
The correlations between feature maps are known as the gram matrix.
The layers used for the calculation of the gram matrix are conv1_1, conv2_1, conv3_1, conv4_1, and conv5_1.
Style transfer using VGG-19 in Tensorflow
Preparing environment
Note: If you already know how to enable GPU in Google Colab, you can skip this part๐.
In my case, I am programming in Google Colab so that I can train the neural network on GPUs for free. Using GPU reduces the training time by a huge amount.
To enable GPU on Colab you have to:
Go to "change runtime type" under Runtime:
Select GPU as a hardware accelerator:
With that we will have access to a GPU. Now we have to make Tensorflow use it. To do so we have to run the following code.
# Make tensorflow use GPU
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
Now everything we do on Colab will run on GPU.
Importing libraries and images
import numpy as np
import matplotlib.pyplot as plt
import os
%matplotlib inline
from tensorflow import keras
from keras import optimizers
from keras.optimizers import schedules
import PIL
import cv2
For content and style images, I mounted my google drive and then used the drive images file path.
# Load content and style images
content_path = '/content/drive/MyDrive/Images/chicago.jpg'
style_path = '/content/drive/MyDrive/Images/great-wave.jpg'
# read the image file in numpy array
content = plt.imread(content_path)
style = plt.imread(style_path)
# Display the images
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(15,15))
ax1.imshow(content)
ax1.set_title('Content image')
ax2.imshow(style)
ax2.set_title('Style image')
plt.show()
Gram matrix and loss functions
Gram matrix
To get the correlation of all the channels w.r.t each other we need to calculate the gram matrix, we will use the gram matrix to measure the degree of correlation between channels which later will act as a measure of style itself.
In simple words, a gram matrix is a matrix created by multiplying a matrix with its transpose. The dot product of the transpose of the matrix is created by a vector of feature maps and the matrix itself gives a gram matrix.
def gram_matrix(input_tensor):
input_tensor = tf.transpose(input_tensor, (2, 0, 1))
features = tf.reshape(input_tensor, [tf.shape(input_tensor)[0], -1])
gram = tf.matmul(features, tf.transpose(features))
return gram
Now that we have the Gram matrix we can calculate the loss function of the style, which is the degree of correlation between the styles within a layer.
Style cost
To calculate the loss function we are going to calculate the Gram matrix of both the image to be transferred and the resulting image and calculate the mean square error.
Where "A" and "G" are the style representations in layer l which are calculated using the gram matrix function. This cost function will later get used to calculate style loss.
def style_cost(style, combination):
G = gram_matrix(style)
A = gram_matrix(combination)
channels = 3
size = img_nrows * img_ncols
return tf.reduce_sum(tf.square(G-A)) / (4.0 * (channels ** 2)*(size ** 2))
Content cost
The content loss function is much simpler than the style function. Because similar images tend to have similar deep layers. Therefore, if two images have similar content, then they will have similar deep layers.
Where "p" is the input content image, "x" is generated combined image and "l" is the layer whose activation we are going to use to compute loss. In my function below there won't be any necessity to send an activation layer.
def content_cost(content, combination):
return tf.reduce_sum(tf.square(combination - content)) / 2
Loading the model
# Loading VGG19 model
from keras.applications import vgg19
from keras.utils import plot_model
model = vgg19.VGG19(weights="imagenet", include_top=False)
The "include_top" parameter is set to false because we are using our model for feature extraction so we won't need the classifier part. And we are using the weights used by VGG19 on imagenet dataset.
Calculation of loss function
First, let us create a feature extractor from a pre-trained model.
from keras.models import Model
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])
feature_extractor = Model(inputs=model.inputs, outputs=outputs_dict)
The first line of the code creates a Python dictionary that maps the name of each layer in the pre-trained model to its output tensor. The second line of the code creates a new model object named feature_extractor. This new model takes the same inputs as the original pre-trained model (model.inputs) but its outputs are the dictionary created in the previous step. The resulting feature_extractor can be used to extract features from inputs using the pre-trained model's layers.
Now, we are going to define which layers we are going to use to calculate the loss function of the style and which layer we are going to use to calculate the loss function of the content.
style_layers = [
"block1_conv1",
"block2_conv1",
"block3_conv1",
"block4_conv1",
"block5_conv1",
]
content_layer = "block5_conv2"
content_weight = 2.5e-8
style_weight = 1e-6
def loss_function(combination_image, content_image, style_image):
# 1. Combine all the images in the same tensor
input_tensor = tf.concat(
[content_image, style_image, combination_image],
axis = 0
)
# 2. Get the values in all the layers for the three image
features = feature_extractor(input_tensor)
# 3. Initialize the loss
loss = tf.zeros(shape=())
# 4. Extract the content layers + content loss
layer_features = features[content_layer]
content_image_features = layer_features[0, :, :, :]
combination_features = layer_features[2, :, :, :]
loss = loss + content_weight * content_cost(
content_image_features, combination_features
)
# 5. Extract the style layers + style loss
for layer_name in style_layers:
layer_features = features[layer_name]
style_features = layer_features[1, :, :, :]
combination_features = layer_features[2, :, :, :]
sl = style_cost(style_features, combination_features)
loss += (style_weight / len(style_layers)) * sl
return loss
Learning of the Neural Style Transfer network
We will now write another function which will:
Calculate the gradients of the loss function we just defined.
Use these gradients to update the target image.
With GradientTape, we can take advantage of automatic differentiation, which can calculate the gradients of a function based on its composition. We will also use the tf.function decorator to speed up the operations.
When a function is decorated with @tf.function(), Tensorflow will convert the python function into a Tensorflow graph, which can be executed much more efficiently than the original Python code.
@tf.function
def compute_loss_and_grads(combination_image, content_image, style_image):
with tf.GradientTape() as tape:
loss = loss_function(combination_image, content_image, style_image)
grads = tape.gradient(loss, combination_image)
return loss, grads
With this we have the learning phase done.
Image processing and generation
Preprocess image
The preprocessing of the images consists of giving the images the format that our network requires.
def preprocess_image(image_path):
# Util function to open, resize and format images into appropriate tensors
img = keras.preprocessing.image.load_img(
image_path, target_size=(img_nrows, img_ncols)
)
img = keras.preprocessing.image.img_to_array(img)
img = np.expand_dims(img, axis=0)
img = vgg19.preprocess_input(img)
return tf.convert_to_tensor(img)
Deprocess image
To deprocess the images we will have to follow an almost reverse process to the one we have used to process the images.
# Deprocess image
def deprocess_image(x):
# Convert tensor to array
x = x.reshape((img_nrows, img_ncols, 3))
x[:, :, 0] += 103.939
x[:, :, 1] += 116.779
x[:, :, 2] += 123.68
# Convert BGR to RGB
x = x[:, :, ::-1]
# We make sure it is in between 0 and 255
x = np.clip(x, 0, 255).astype("uint8")
return x
Training our Neural Style Transfer network
Now that we have all the functions ready, creating the training loop is quite simple. But first, we are going to create a simple function that generates our final image.
# Function to generate an image
def show_result(iteration):
img = deprocess_image(combination_image.numpy())
img = np.array(img, dtype=np.uint8)
if np.ndim(img)>3:
assert img.shape[0] == 1
img = img[0]
img = PIL.Image.fromarray(img)
plt.imshow(np.array(img))
plt.show()from keras.optimizers import SGD
width, height = tf.keras.utils.load_img(content_path).size
img_nrows = 400
img_ncols = int(width * img_nrows / height)
optimizer = SGD(
keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate=100.0, decay_steps=100, decay_rate=0.96
)
)
content_image = preprocess_image(content_path)
style_image = preprocess_image(style_path)
combination_image = tf.Variable(preprocess_image(content_path))
iterations = 4000
for i in range(1, iterations + 1):
loss, grads = compute_loss_and_grads(
combination_image, content_image, style_image
)
grads_and_vars = [(grads, combination_image)]
optimizer.apply_gradients(grads_and_vars)
if i % 500 == 0:
print("Iteration %d: loss=%.2f" % (i, loss))
if i == 4000:
show_result(i)
Now that we have everything prepared, let's code the main training loop of our Neural style transfer network. We will be using Stochastic Gradient Descent i.e SGD optimizer which is an iterative method for optimizing an objective function with suitable smoothness properties. We can use other optimizers as well but upon using the Adam optimizer I couldn't get as good of a result as SGD so I used this optimizer.
from keras.optimizers import SGD
width, height = tf.keras.utils.load_img(content_path).size
img_nrows = 400
img_ncols = int(width * img_nrows / height)
optimizer = SGD(
keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate=100.0, decay_steps=100, decay_rate=0.96
)
)
content_image = preprocess_image(content_path)
style_image = preprocess_image(style_path)
combination_image = tf.Variable(preprocess_image(content_path))
iterations = 4000
for i in range(1, iterations + 1):
loss, grads = compute_loss_and_grads(
combination_image, content_image, style_image
)
grads_and_vars = [(grads, combination_image)]
optimizer.apply_gradients(grads_and_vars)
if i % 500 == 0:
print("Iteration %d: loss=%.2f" % (i, loss))
if i == 4000:
show_result(i)
Conclusion
Our implementation style transfer ends here, you can play with the hyper-parameters and can also use other networks like Alexnet to produce even better effects. This was my code implementation project as a part of understanding CNN. For the full Colab notebook, you can look here. Hope this article helped you.
Do you have any suggestions for improving my code? Let me know ๐.
See you later!
References
https://anderfernandez.com/en/blog/how-to-code-neural-style-transfer-in-python/
https://towardsdatascience.com/a-brief-introduction-to-neural-style-transfer-d05d0403901d
Image references
Subscribe to my newsletter
Read articles from Diwakar Basnet directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Diwakar Basnet
Diwakar Basnet
I am a bachelors student perusing my degree in Computer Science. Passionate about artificial intelligence and video games. Interested in topics such as Machine Learning, Deep Learning and game development. Watch anime and play video games as a hobby.