Deep Dive into UserData
Introduction
Have you ever wondered how to customize cloud infrastructure to fit your specific needs? User data is the key, and in this article, we'll take a deep dive into it, covering everything you need to know with complete clarity.
At its core, user data is a set of instructions or scripts that can be run on a new instance to configure its behaviour and settings. With the help of cloud-init, a powerful open-source tool, users can automate the initialization process and define scripts and commands to run when an instance is launched.
In this article, we'll explore the importance of user data in AWS, how it works, and how to use cloud-init to get the most out of it. So, buckle up and get ready for a deep dive into the ins and outs of user data in AWS.
What is UserData?
User data in AWS is a set of instructions or scripts that can be provided to a new instance during its launch. These instructions can be used to customize the instance's behaviour and settings, such as installing software packages, configuring network settings, or running custom scripts. By using user data, you can automate the initialization process and quickly set up instances with the configurations you need. User data can be passed to instances using the AWS Management Console, CLI, or SDKs.
Let us see a simple user data that installs and enables an Apache HTTP server on an aws ec2 instance:
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Hello World from $(hostname -f)</h1>" > /var/www/html/index.html
Notes:
'#!' is called a Shebang and it tells the machine, which interpreter it must use for executing the user data script. Here it is Bash Shell that should be used.
Here we have installed an Apache HTTP service on the ec2 which will fire up the index.html page displaying "hello world from the <hostname>" when triggered on port 80.
See it for yourself...
Fig-1: AWS EC2 Instance
Fig-2: Displaying the index.html page
Note:
- You need to open the address on the EC2 with HTTP and call it on port 80, So it will be like "http://<IPv4 of EC2>:80". You can't run on HTTPS by default. This is because HTTPS requires a valid SSL/TLS certificate, and by default, EC2 instances don't come with one pre-installed.
But how does all of this happen - CloudInit comes into the picture
According to the docs of Cloud-Init:
" Cloud-init
is the industry standard multi-distribution method for cross-platform cloud instance initialisation. It is supported across all major public cloud providers, provisioning systems for private cloud infrastructure, and bare-metal installations.
During boot, cloud-init
identifies the cloud it is running on and initialises the system accordingly. Cloud instances will automatically be provisioned during first boot with networking, storage, SSH keys, packages and various other system aspects already configured.
Cloud-init
provides the necessary glue between launching a cloud instance and connecting to it so that it works as expected.
For cloud users, cloud-init
provides no-install first-boot configuration management of a cloud instance. For cloud providers, it provides an instance setup that can be integrated with your cloud. "
A bit confusing right...Let me tell you. Cloud-init is basically "Cloud Initialisation" meaning you can initialise your cloud instances with the specific configuration during boot. The cloud-init software package helps you to run the user data and set up the necessary configuration.
But is that all...
UserData holds a very special characteristic with it. It runs only once when the instance is launched for the first time. But you might have seen that you are allowed to change the user data after stopping the instance. So "it runs only once" does not make sense. Well, It does.
Even though you will be able to change the user data, definitely it will show you the modified user data also. But it will not apply these changes to the machine for real.
See for yourself...
Below is the updated index.html, We stopped the running instance and modified the user data and started the instance.
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Changed User Data Hello World from $(hostname -f)</h1>" > /var/www/html/index.html
Let's check, does this make a difference...
Well, it does not which justifies the characteristic we learn about the user data. Also, one thing to note is that even rebooting the instances won't make a difference because rebooting does not reinitialise the machine with user data but is used for other cases like the machine won't change its public address etc.
Going into the BTS...
Let's discuss what happens behind the scenes. To ensure that user data runs only once, AWS sets a flag during the initial boot process to indicate that the user data has been executed. This prevents the user data from running again if the instance is stopped and restarted or if it undergoes a reboot. But who sets this flag in the instance, It's the cloud-init. It sets the flag and checks whether the user data has already run or not. Note that user data running only once also has a cause associated with it and that's why it has been designed in such a way. The cloud-init has a file called "config_scripts_user" which is a semaphore file created by cloud-init to track whether the user data has already been executed on an instance. This file stores the state of the user data being executed or not.
So can we not do something so that our changes in the user data are reflected given that it's not the first launch on the instance? Well, it's pretty obvious by now that the main control is with Cloud-init and it's the one responsible for running the user data.
There are ways in which we can run our user data again just once or even on every boot. Let's look into that.
Running UserData script multiple times...
You might have guessed already that if the "config_scripts_user" file is a state file which is checked by cloud-init to determine whether the user data has run or not. So simply deleting this file will effectively reset the flag and then starting the instance again will probably run the user data script.
This is true and it does happen. See it for yourself...
We can see after removing the file and restarting the instance, Our changes have been applied successfully.
But what if we change the user data again, Will it make these changes?
You can see it for yourself, but trust me...It does not apply the changes. The reason is the same because again it will save the state and you'll be stuck again.
So is there a way to execute the user data on every boot...Well, there is. The answer is pretty simple you'll need to play and make changes to Cloud-init. Actually, instead of removing the state file, You can simply override the default configurations and make the user data run on every boot.
Cloud-init operates in 5 stages for the complete system configuration, the one which is of use to us is the final stage. In the final stage, cloud-init executes configurations and scripts specified in the cloud_final_modulessection of the cloud.cfg file, allowing for post-boot tasks, package installations, and additional user scripts. It offers flexibility to enable, disable, or add modules to this section.
Now we need to override the scripts_user property to make it run always on every boot. For this actually, we need to change the user data a bit, Here it goes a bit complex but for this to work, this is what we do
Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0
--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"
#cloud-config
cloud_final_modules:
- [scripts-user, always]
--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"
// YOUR USER_DATA_SCRIPT GOES HERE NOW
The provided code snippet represents a multipart MIME message that includes two parts: a cloud-config file and a shell script as user data.
The cloud-config
section contains configuration directives written in the YAML format and specifies the cloud_final_modules
section with the value [scripts-user, always]
. This configuration ensures that the scripts-user
module is executed during the final stage of cloud-init, regardless of whether it was already executed before.
The userdata.txt
section contains a shell script written in plain text. This script will be executed as part of the user data processing during the instance launch. It can include any custom commands or actions you want to perform on the instance.
See it for yourself...
Stop the instance, change the user data and restart the system.
Conclusion
We learnt about user data and how cloud-init comes into the picture for running the script. But we need to understand that it is not a good way to alter the default configurations and make the user data run again and again. User data is designed so to run only once. This behaviour is intended to prevent unintended consequences or repetitive executions of potentially disruptive or non-idempotent user data scripts. If you need to execute certain actions or configurations repeatedly, it's recommended to use other mechanisms such as configuration management tools or automation frameworks to handle the ongoing management of the instance's configuration.
So what do we do if we want to install configurations with the help of code [IAsC] and modify the configurations along the way as needed? Is there a way we can do that without modifying and tampering with user data and its default configurations?
Well Yes, there is. "Metadata" is the one which comes to our rescue and can do everything we needed and discuss. Also, it adds certain additional benefits compared to user-data. We'll look into Metadata and Cloudformation initialisation in another blog.
That brings us to the end of the article.
Hope you enjoyed...Happy Learning :)
Subscribe to my newsletter
Read articles from Pratyush Misra directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by