Automating Log Management with Bash and S3


One of the most common tasks when working with the cloud is creating scripts to automate processes. These scripts are typically written in either Bash or Python. In this project, I’ll guide you through setting up the infrastructure needed to host an application while also implementing a solution to back up the logs to an S3 bucket.
Before we start, here is the plan:
We will start with creating a new VPC.
Setup an S3 bucket to store the logs.
Create the right IAM Policy and Role to keep things a little bit more secure.
Provision the EC2 instance that is going to host our application.
Enable CloudTrail for monitoring and auditing.
Install the application on the EC2 instance.
Monitor activity with CloudTrail.
Automate Log Backup via Cron job
Now that we have an overview of the steps, let’s get started.
Create the VPC
Let’s kick things off by laying the foundation via creation of a new VPC. Start by selecting the "VPC and more" option to simplify the process. Give your VPC a name that’s easy to identify, then configure the following settings:
IPv4 CIDR block: Choose your range.
Availability Zone: Set it to 1.
Subnets: Add 1 Public Subnet and 1 Private Subnet.
Gateways: Include a NAT Gateway and an S3 Gateway.
Once everything’s set up, click "Create VPC"
The S3 Gateway creates an endpoint which allows us to access an S3 bucket from within the VPC vs. sending the traffic over the public internet. We are implementing this solution to ensure that only a specific EC2 instance can access the S3 bucket but at the same time, keep the traffic internally.
Create the S3 Bucket
Head over to S3 to create a bucket. I will be naming mine myapachelogs. Once the bucket is created, open it, navigate to the Permissions tab, and update the Bucket Policy with the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::your-s3-bucket-name",
"arn:aws:s3:::your-s3-bucket-name/*"
],
"Condition": {
"StringEquals": {
"aws:SourceVpce": "S3-vpce-id"
}
}
}
]
}
(Replace your-s3-bucket name with the bucket you just created. Also, replace vpce-id with the S3 VPC Endpoint that was created with the VPC)
You can easily grab the S3 VPC ID by searching for endpoints in the search bar.
Create the IAM policy and IAM role
Next, we’ll create an IAM role with an S3 policy using the Policy Editor.
For S3 access, we’ll allow all actions except delete actions from the EC2 instance. Use the following to create the policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-s3-bucket-name",
"arn:aws:s3:::your-s3-bucket-name/*"
]
}
]
}
(Make sure to edit with your bucket name)
After editing the policy, go ahead and click on Create Policy.
Now we have to create the IAM role. Please click on Create role.
Attach the ApacheLogsPolicy, provide a name for the role, then create the role with the name ApacheS3LogsRole.
Launch the EC2 Instance
We have pretty much took care of all prerequisites. Please head over to the EC2 console and launch an Ubuntu instance. Give it the name “Apache HTTP Server” and ensure to keep everything in free tier (AMI, Instance type).
Create a new key pair and in Network settings, ensure that the right VPC and private subnet are selected. Also, select the default subnet group in the Firewall section.
Scroll down to Advanced details and expand it. Click on IAM instance profile and select the ApacheS3LogsRole. We can launch the instance now.
The next step is to log into the instance. The easiest way to do this without a bastion host is to select the instance, go to actions, Connect.
In the Connect to instance screen, Click on Connect using EC2 instance Connect Endpoint. Locate the EC2 Instance Connect Endpoint section and click on Select an endpoint then Create an endpoint.
On the next screen, provide a name tag, change type to EC2 Instance Connect Endpoint and select the proper VPC.
Now select the default security group and private subnet and create the endpoint.
Wait for the endpoint to reach Available status before attempting to access the instance.
We can return to the Connect to instance screen after the endpoint is ready. The endpoint that was just created should be listed in the dropdown box. Select the endpoint and click on Connect.
After successfully logging into the instance, please run the following commands:
#update package index:
sudo apt update
#install required dependencies:
sudo apt install -y curl unzip
#download the AWS CLI installer:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
#unzip the installer:
unzip awscliv2.zip
#run the installer:
sudo ./aws/install
#verify the installation:
aws --version
Before we install the application and setup the logs, we need to make sure that the EC2 instance has access to S3. Let’s upload a test file.
echo "This is a test file" | aws s3 cp - s3://myapachelogs/testfile.txt
Then go ahead and attempt to view the bucket contents.
aws s3 ls s3://myapachelogs
The screenshot above shows that the EC2 instance is able to list the test file.
Enable CloudTrail
Since we should utilize a secondary bucket just for the CloudTrail logs, let’s create one more called myapachelogscloudtrail. We can accomplish this by heading to the CloudTrail console and clicking on Create a trail. On the next screen, click on Create trail up above where the Create trail link resides.
Follow the screenshot below for the rest of the CloudTrail configuration. Leave everything else not in the screenshot as is.
On the next screen please make to to enable the following:
Management events - This tracks the creation or deletion of S3 buckets, security configurations and logging setup.
Data events - This tracks Object level API operations in S3 such as GetObject, PutObject.
Network activity events - This tracks VPC endpoint actions from private VPC to AWS services.
In the Data events section choose S3 as Resource type as well as Log all events. For Network activity events, select EC2 and also Log all events. Click next then Create the trail.
Install the Application (Apache)
Run the following commands to install Apache.
#install apache
sudo apt install apache2 -y
#start apache service
sudo systemctl start apache2
#enable apache to start at boot
sudo systemctl enable apache2
#check service status
sudo systemctl status apache2
Apache logs files are usually stored in /var/log/apache2. Let’s take a quick look at the info currently present in the logs right now.
For now, the access.log is empty and error.log has a few lines.
Log Management setup
Using a built in Linux editor, create the script with the code below to setup the log management:
#!/bin/bash
# Set variables
APACHE_LOG_DIR="/var/log/apache2" # Directory containing Apache logs
S3_BUCKET="s3://apache-log-archives" # S3 bucket name
RETENTION_DAYS=7 # Number of days to retain logs locally
# Check if the Apache log directory exists
if [ ! -d "$APACHE_LOG_DIR" ]; then
echo "Apache log directory $APACHE_LOG_DIR does not exist. Exiting."
exit 1
fi
# Upload Apache logs (including rotated logs) to S3
echo "Uploading Apache logs to S3 bucket: $S3_BUCKET..."
aws s3 cp "$APACHE_LOG_DIR" "$S3_BUCKET" --recursive --exclude "*" --include "*.log" --include "*.log.gz"
if [ $? -eq 0 ]; then
echo "Apache logs successfully uploaded to S3."
else
echo "S3 upload failed. Please check AWS CLI configuration and bucket permissions."
exit 1
fi
# Delete logs older than RETENTION_DAYS
echo "Deleting Apache logs older than $RETENTION_DAYS days from $APACHE_LOG_DIR..."
find "$APACHE_LOG_DIR" -type f -mtime +$RETENTION_DAYS -name "*.log" -o -name "*.log.gz" -exec rm -v {} \;
# Notify the user
echo "Apache log management completed. Logs uploaded to S3 and old logs deleted."
# Exit script
exit 0
Copy and paste the script and save it as apache_log_management.sh. Then make it executable with:
chmod +x apache_log_management.sh
Execute the shell script and verify that the upload was successful.
./apache_log_management.sh
As you can see from both screenshots, the log uploads were successful.
Monitor with CloudTrail
If we check the CloudTrail S3 bucket, we should see some data already present.
I found the trail showing when the script executed the put action into S3.
CloudTrail is a very good way to keep tabs on all API activity. In this case we are using it to monitor EC2 and S3 actions.
Automate the log backup process
A very simple way to automate the log management process is to create a Cron job. For example:
crontab -e
# this line runs the script at 2 AM
0 2 * * * /path/to/apache_log_management.sh >> /var/log/apache_log_management.log 2>&1
This cron job executes the script apache_log_management.sh located at /path/to/ every day at 2:00 AM. Here's a breakdown of its components:
0 2 * * *: Specifies the schedule for the job.
0 means it runs at the start of the hour.
2 indicates 2:00 AM.
The * * * means it runs every day, every month, and every weekday.
/path/to/apache_log_management.sh: The path to the script being executed.
>> /var/log/apache_log_management.log: Appends the output of the script to a log file at /var/log/apache_log_management.log.
2>&1: Redirects any errors (stderr) to the same log file, combining both standard output (stdout) and errors into a single log.
Conclusion
So in this project we tackled key DevOps and Cloud Engineering tasks like standing up and configuring infrastructure, implementing log management as well as automating it (with bash) and integrating it with S3 for secure storage. We also ensured efficient networking with a private subnet, VPC & EC2 endpoints and setting up CloudTrail to keep a close watch on what is going on in the environment.
One last thing, it’s important to clean up resources to avoid unnecessary costs and maintain a tidy AWS environment. Thanks for reading!!!
Subscribe to my newsletter
Read articles from Samuel Colon directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
