How to Create Custom Layers in TensorFlow?
In TensorFlow, layers are the fundamental building blocks for creating machine learning models. While TensorFlow offers a wide variety of ready-to-use built-in layers, there are situations where these standard layers are not enough to meet specific needs. For example, when implementing advanced architectures like residual networks, transformers, or creating layers with unique behaviors, custom layers become essential. By defining custom layers, we can incorporate specialized computations, parameter sharing, or custom initialization schemes that are not available in standard layers.
What is a layer?
A layer is a callable object that takes input tensors, performs computations on them, and then outputs tensors. To create our own layer, we inherit the base layer from TensorFlow. Let's analyze the base layer.
(Callable object: when we create an instance of it, we can use it as a method, which means it has implemented a __call__ method.)
Let's understand how to create our own custom layer for the MNIST handwritten digit classification, which contains 10 classes from 0 to 9.
Imports and data loading
First, we will import TensorFlow as tf and then load the MNIST dataset that is built into TensorFlow.
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
The train and test sets have 60,000 and 10,000 samples, respectively.
Custom Dense Layer
Now we will create our own layer using the Keras base layer.
class CustomLayer(tf.keras.layers.Layer):
def __init__(self, units, activation=None):
super().__init__()
self.units = units
self.activation = tf.keras.activations.get(activation)
def build(self, input_shape):
self.kernel = self.add_weight(
shape=(input_shape[-1], self.units),
initializer="glorot_uniform",
trainable=True,
name="kernel",
)
self.bias = self.add_weight(
shape=(self.units,),
initializer="zeros",
trainable=True,
name="bias",
)
def call(self, inputs):
return self.activation(tf.matmul(inputs, self.kernel) + self.bias)
There are important methods while implementing our own custom layer: __init__
, build
, and call
. In the __init__
, we initialize some variables, like, in our case, the number of neurons, units, or activation. We can also initialize the weight variables if we know the input shape, but the recommended method is to use the build
method for lazy initialization. The build
method is used for the initialization of weight variables, and the call
method is used for the implementation of the computation of the layer.
How to Define Weights in Custom Layers?
The recommended method to define variables in custom layers is by using the add_weight()
method. The inputs to this method are shape, initializer, dtype, trainable boolean, and other optional parameters.
Creating our model
The next step is to create a model using our custom layer.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
CustomLayer(units=128, activation='relu'),
CustomLayer(units=10, activation='softmax'),
])
We first call the Keras flatten layer to flatten the input images so that they can be fed to dense layers. Our first custom layer has 128 neurons with relu activation, and the second layer has 10 neurons with softmax function.
Compiling and fitting the model
We will use the Adam optimizer, with sparse categorical cross entropy and accuracy as metrics, and train the model for 10 epochs.
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))
After 10 epochs, the model has an accuracy of approximately 96%. Note that this accuracy can vary based on various factors, like random seed, batch size, etc.
Inputs to the Layer
Some of the inputs to the layer are:
trainable: A boolean that indicates whether the weights are trainable.
name: The name of the layer.
dtype: Defines the data type of the weights, which is
float32
by default.
Important Attributes
Some important attributes of layers that can be set and retrieved are:
name: The name of the layer.
dtype: The data type of the weights.
trainable_weights: A list of trainable variables.
non_trainable_weights: A list of non-trainable variables.
weights: A combination of both trainable and non-trainable variables.
trainable: A boolean indicating whether the layer is trainable or not.
Note: The trainable attribute is very important because, when we implement transfer learning and want to train only the top layers, we set the trainable to true.
The code used in this blog is available on Github.
References
Subscribe to my newsletter
Read articles from Rashid Ul Haq directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Rashid Ul Haq
Rashid Ul Haq
I am a passionate AI and machine learning expert with extensive experience in deep learning, TensorFlow, and advanced data analytics. Having completed numerous specializations and projects, I have a wealth of knowledge and practical insights into the field. I am sharing my journey and expertise through detailed articles on neural networks, deep learning frameworks, and the latest advancements in AI technology.