How to Automate EC2 Backups Using Python: A Complete Guide
In today's cloud-centric world, automating backups is crucial for maintaining data integrity and ensuring business continuity.
In this guide, I'll walk you through creating an automated backup solution for your Amazon EC2 instances using Python, AWS CLI, and Cron jobs.
We'll cover everything from setting up the environment to deploying a Python script that manages AMI creation and retention.
Prerequisites
Before we dive into the script, make sure you have the following:
An AWS Account: You’ll need access to Amazon EC2 and IAM services.
Create AWS IAM User: Create a user with full EC2 access. This is required to configure the AWS CLI.
A Linux-based Server: For this guide, we'll assume you're using Ubuntu.
Step 1: Install and Configure AWS CLI
First, we need to install and configure the AWS CLI on your server.
Download the AWS CLI Installer:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
Install Unzip (if not already installed):
sudo apt install unzip
Unzip the Installer:
unzip awscliv2.zip
Run the Installer:
sudo ./aws/install
Configure AWS CLI:
aws configure
You'll be prompted to enter your AWS Access Key ID, Secret Access Key, region, and output format.
Step 2: Prepare Your Environment
Next, create the necessary files and install the required packages.
Install the python3 environment
sudo apt-get install python3-venv -y
Create a Python Virtual Environment (optional but recommended):
python3 -m venv /home/ubuntu/myenv
Activate the Virtual Environment:
source /home/ubuntu/myenv/bin/activate
Install Required Python Packages:
pip install boto3
Create a CSV Configuration File:
Create a file named instance_config.csv
in /home/ubuntu/
with the following format: (Replace with your server details)
INSTANCE_ID,REGION,BACKUP_TYPE,RETENTION_PERIOD,INSTANCE_NAME
i-080736c2d00ca9de2,ap-south-1,hourly,5 days,MyInstanceName
i-080736c2d00ca9de2,ap-south-1,monthly,3 months,MyInstanceName
You can add more instances and adjust the Retention Period and Backup Type values as needed.
Whenever you add a new instance to the instance_config.csv file, run the script manually the first time to ensure the AMI is created, and it will then run as scheduled.
If you remove an instance from the instance_config.csv file, the script will stop creating new AMIs, deregistering old AMIs, and deleting associated snapshots for that instance, as it only processes instances listed in the CSV file.
Step 3: Create the Backup Script
Save the following Python script as backup.py
in /home/ubuntu/
:
import boto3
import csv
from datetime import datetime, timedelta
import sys
# Log file path
LOG_FILE = '/home/ubuntu/backup.log'
def log_message(message, level="INFO"):
color_codes = {
"INFO": "\033[94m", # Blue
"SUCCESS": "\033[92m", # Green
"WARNING": "\033[93m", # Yellow
"ERROR": "\033[91m" # Red
}
reset_code = "\033[0m"
color = color_codes.get(level, "\033[94m") # Default to blue for INFO
with open(LOG_FILE, 'a') as log_file:
log_file.write(f"{datetime.now()}: {color}{message}{reset_code}\n")
# Initialize boto3 client for EC2
def initialize_client(region):
return boto3.client('ec2', region_name=region)
# Create AMI for the instance
def create_ami(ec2_client, instance_id, instance_name, backup_type):
try:
# Update AMI name format to: Instance-ID_Backup-Type_Instance-Name_Date&Time
ami_name = f"{instance_id}_{backup_type}_{instance_name}_{datetime.now().strftime('%Y%m%d%H%M%S')}"
response = ec2_client.create_image(
InstanceId=instance_id,
Name=ami_name,
NoReboot=True # Ensures no reboot during AMI creation
)
ami_id = response['ImageId']
log_message(f"AMI created: {ami_name}, AMI ID: {ami_id}")
# Tag the AMI with InstanceId, InstanceName, and BackupType
ec2_client.create_tags(
Resources=[ami_id],
Tags=[
{'Key': 'InstanceId', 'Value': instance_id},
{'Key': 'InstanceName', 'Value': instance_name},
{'Key': 'BackupType', 'Value': backup_type}
]
)
log_message(f"Tagged AMI {ami_id} with InstanceId: {instance_id}, InstanceName: {instance_name}, BackupType: {backup_type}")
return ami_id
except Exception as e:
log_message(f"Error creating AMI for {instance_id}: {e}")
sys.exit(1)
# Parse retention period
def parse_retention_period(retention_str):
units = retention_str.split()
num = int(units[0])
unit = units[1].lower()
if 'day' in unit:
retention_delta = timedelta(days=num)
elif 'month' in unit:
retention_delta = timedelta(days=num * 30) # Approximate month as 30 days
else:
log_message(f"Invalid retention period: {retention_str}")
sys.exit(1)
return retention_delta
# Delete old AMIs based on retention period
def delete_old_amis(ec2_client, instance_id, retention_delta):
try:
images = ec2_client.describe_images(
Filters=[{'Name': 'tag:InstanceId', 'Values': [instance_id]}],
Owners=['self']
)
if not images['Images']:
log_message(f"No AMIs found for instance {instance_id}")
else:
log_message(f"Found {len(images['Images'])} AMIs for instance {instance_id}")
for image in images['Images']:
creation_time = datetime.strptime(image['CreationDate'], '%Y-%m-%dT%H:%M:%S.%fZ')
if datetime.now() - creation_time > retention_delta:
log_message(f"Deleting AMI: {image['ImageId']} created on {creation_time}")
ec2_client.deregister_image(ImageId=image['ImageId'])
# Deleting associated snapshots
for device in image['BlockDeviceMappings']:
if 'Ebs' in device:
snapshot_id = device['Ebs']['SnapshotId']
ec2_client.delete_snapshot(SnapshotId=snapshot_id)
log_message(f"Deleted associated snapshot: {snapshot_id}")
else:
log_message(f"AMI {image['ImageId']} created on {creation_time} is still within retention period")
except Exception as e:
log_message(f"Error deleting old AMIs for {instance_id}: {e}")
sys.exit(1)
def main():
# Path to the CSV file
csv_file = '/home/ubuntu/instance_config.csv'
current_day = datetime.now().day
with open(csv_file, mode='r') as file:
reader = csv.DictReader(file)
for row in reader:
instance_id = row['INSTANCE_ID']
region = row['REGION']
backup_type = row['BACKUP_TYPE']
retention_period = row['RETENTION_PERIOD']
instance_name = row.get('INSTANCE_NAME', 'Unknown')
# Initialize EC2 client
ec2_client = initialize_client(region)
# Ensure monthly backups are only created on the 1st of the month
if backup_type == 'monthly' and current_day != 1:
log_message(f"Skipping monthly backup for {instance_id} as today is not the 1st of the month")
continue
# Create AMI
ami_id = create_ami(ec2_client, instance_id, instance_name, backup_type)
# Parse retention period and delete old AMIs
retention_delta = parse_retention_period(retention_period)
delete_old_amis(ec2_client, instance_id, retention_delta)
if __name__ == "__main__":
main()
Check Logs: You can monitor the backup operations and any issues by checking the log file located at /home/ubuntu/backup.log.
Logs Output looks like this:
Step 4: Set Up a Cron Job
To ensure your script runs automatically, set up a cron job:
Edit the Crontab File:
crontab -e
Add the Following Lines for Different Backup Frequencies:
Hourly Backup:
0 * * * * /home/ubuntu/myenv/bin/python /home/ubuntu/backup.py
Daily Backup (if needed):
0 0 * * * /home/ubuntu/myenv/bin/python /home/ubuntu/backup.py
Monthly Backup (on the 1st day of the month):
0 0 1 * * /home/ubuntu/myenv/bin/python /home/ubuntu/backup.py
Here is a some screenshots, automatically created the AMI's on Hourly basis:
Conclusion
By following these steps, you’ve automated the backup process for your EC2 instances using Python and AWS CLI. This setup ensures that your instances are backed up according to your specified schedule and retention periods, providing peace of mind and data integrity.
Feel free to reach out with questions or comments, and happy automating!
Subscribe to my newsletter
Read articles from Sachin Yalagudkar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by