Use Python & Boto3 to Backup files/logs to AWS S3

Pushp VashishtPushp Vashisht
8 min read

Introduction

Let’s say we have a folder on our server in which our logs are generated for various services that are running to make our application available to the users. Now, what if we want to back up those logs to the AWS S3 bucket daily at 00:00 hour? Well, this guide is exactly to help us achieve the same! Let’s dive in!

Getting the S3 bucket ready


Let’s Write Some Code!

1. Create the project directory & python virtual environment

$ mkdir 'Backup Logs S3'
$ cd 'Backup Logs S3'
$ python3 -m venv env
$ source env/bin/activate

2. Create a requirements.txt file to mention all the packages that are used for this project.

schedule==0.6.0
boto3==1.13.20

3. Install the requirements using pip

(env)$ pip install -r requirements.txt

4. Create a function to upload a file to an S3 bucket

Use your favourite editor to create backup_logs_s3.py as follows:

import boto3
from botocore.exceptions import ClientError


def upload_file_to_s3(file_name, bucket, object_name=None, folder_name=None):
    """
    Upload a file to an S3 bucket.
    Params:
        file_name: File to upload
        bucket: Bucket to upload to
        object_name: S3 object name. If not specified then file_name is used
        folder_name: Folder name in which file is to be uploaded
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = file_name.split('/')[-1]
        # If folder_name was specified, upload in the folder
        if folder_name is not None:
            object_name = f'{folder_name}/{object_name}'

    # Upload the file
    try:
        s3_client = boto3.client(
            service_name='s3',
            aws_access_key_id=YOUR_AWS_ACCESS_KEY_ID,
            aws_secret_access_key=YOUR_AWS_SECRET_ACCESS_KEY
        )
        response = s3_client.upload_file(file_name, bucket, object_name)
        print(response)
    except ClientError as e:
        print(e)

The function accepts 4 parameters:

  • file_name: Name of the file with the absolute path

  • bucket: Name of the bucket to which the file is to be uploaded

  • object_name: (Optional) To specify the name of the object when the file is uploaded to the bucket

  • folder_name: (Optional) Name of the folder under which the file will be uploaded, if the folder doesn’t exist already, a new folder will be created.

In this function what we are doing is, first, we assign object_name with the name of the file after splitting the path from the file_name, if the object_name is given as a parameter.

Then, if folder_name is given, we assign the object_name to be ‘folder_name/object_name’.

In the try block, we create a client by calling the client method of the boto3 package. Make sure to replace ‘YOUR_AWS_ACCESS_KEY_ID’ and ‘YOUR_AWS_SECRET_ACCESS_KEY’ with your actual keys which I asked you to keep handy earlier.

This client is then used to call the function upload_file to upload the file to our S3 bucket and the response returned by this function is printed.

5. Create a function to append the date to log files (Optional)

This step is optional if you simply want to upload your files to S3, feel free to skip this step. Suppose, I have a log file named ‘server.log’ which gets appended by the requests that the server receives. So, if my server has been running for a week, then all requests for the whole week have been logged to the same file, this makes checking the logs for a particular day troublesome. To resolve this, each day at 00:00 when we back up the logs to the S3 bucket, first, we will append the date of the previous day to the file name and then upload the file to S3, which will help us to browse through the logs date-wise.

import boto3
from botocore.exceptions import ClientError
import os

def upload_file_to_s3(file_name, bucket, object_name=None, folder_name=None):
    """
    Upload a file to an S3 bucket.
    Params:
        file_name: File to upload
        bucket: Bucket to upload to
        object_name: S3 object name. If not specified then file_name is used
        folder_name: Folder name in which file is to be uploaded
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = file_name.split('/')[-1]
        # If folder_name was specified, upload in the folder
        if folder_name is not None:
            object_name = f'{folder_name}/{object_name}'

    # Upload the file
    try:
        s3_client = boto3.client(
            service_name='s3',
            aws_access_key_id=YOUR_AWS_ACCESS_KEY_ID,
            aws_secret_access_key=YOUR_AWS_SECRET_ACCESS_KEY
        )
        response = s3_client.upload_file(file_name, bucket, object_name)
        print(response)
    except ClientError as e:
        print(e)


def append_text_to_file_names(files, text):
    """
    Appends given text to the name of the files.
    Params:
        files: List(str): list of file paths
        text: str: Text that is to appended
    Returns:
        files: List(str): list of file paths with text appended
    """
    for i in range(len(files)):
        file_splitted = files[i].split('/')
        file_path = file_splitted[:-1]
        file_name = file_splitted[-1]
        file_name_splitted = file_name.split('.')
        new_file_name = '.'.join([file_name_splitted[0], text, file_name_splitted[1]])
        file_path.append(new_file_name)
        new_file_name_with_path = '/'.join(file_path)
        os.rename(files[i], new_file_name_with_path)
        files[i] = new_file_name_with_path
    return files

The function append_text_to_file_names() accepts 2 parameters:

  • files: List of file names with their absolute path

  • text: String that is to be appended to the file names

In this function, we rename, by appending the given text to the name of the files. After renaming the files we return the list of the files with the new names.

6. Create a function that will use the above functions

The motive of this function is to call the above functions in it, which will be used as a task for scheduling later on.

import boto3
from botocore.exceptions import ClientError
import os
from datetime import datetime, timedelta

def upload_file_to_s3(file_name, bucket, object_name=None, folder_name=None):
    """
    Upload a file to an S3 bucket.
    Params:
        file_name: File to upload
        bucket: Bucket to upload to
        object_name: S3 object name. If not specified then file_name is used
        folder_name: Folder name in which file is to be uploaded
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = file_name.split('/')[-1]
        # If folder_name was specified, upload in the folder
        if folder_name is not None:
            object_name = f'{folder_name}/{object_name}'

    # Upload the file
    try:
        s3_client = boto3.client(
            service_name='s3',
            aws_access_key_id='YOUR_AWS_ACCESS_KEY_ID',
            aws_secret_access_key='YOUR_AWS_SECRET_ACCESS_KEY'
        )
        response = s3_client.upload_file(file_name, bucket, object_name)
        print(response)
    except ClientError as e:
        print(e)


def append_text_to_file_names(files, text):
    """
    Appends given text to the name of the files.
    Params:
        files: List(str): list of file paths
        text: str: Text that is to appended
    Returns:
        files: List(str): list of file paths with text appended
    """
    for i in range(len(files)):
        file_splitted = files[i].split('/')
        file_path = file_splitted[:-1]
        file_name = file_splitted[-1]
        file_name_splitted = file_name.split('.')
        new_file_name = '.'.join([file_name_splitted[0], text, file_name_splitted[1]])
        file_path.append(new_file_name)
        new_file_name_with_path = '/'.join(file_path)
        os.rename(files[i], new_file_name_with_path)
        files[i] = new_file_name_with_path
    return files


def rename_and_backup_logs_s3():
    """
    Backsup log files to s3 bucket
    """
    today = datetime.now()
    yesterday = today - timedelta(days=1)
    text = yesterday.strftime('%d-%m-%Y')

    log_files = [
        '/home/pushp/logs/server1.log',
        '/home/pushp/logs/server2.log',
        '/home/pushp/logs/server3.log',
        '/home/pushp/logs/server4.log'
    ]

    print('Appending date to log files...')
    log_files = append_text_to_file_names(log_files, text)
    print('Appended date to log files...')

    print('Uploading logs to S3...')
    for log_file in log_files:
        upload_file_to_s3(
            file_name=log_file,
            bucket='YOUR_BUCKET_NAME',
            folder_name='server_logs'
        )
    print('Uploaded logs to S3...')

In the function rename_and_backup_logs_s3(), the previous day’s date is calculated and converted to the ‘DD-MM-YYYY’ string format. log_files list is used to store all the files that we want to back up every day. We call the append_text_to_file_names() passing the list of files and the previous day’s date in ‘DD-MM-YYYY’ format to append it to the name of the files. upload_file_to_s3() is called for each renamed file in the list, to upload it to the S3 bucket. Remember to replace YOUR_BUCKET_NAME with the actual name of the bucket that you assigned while creating the bucket.

7. Final Step, Scheduling the task

We create a schedule to run the task ‘rename_and_backup_logs_s3’ to run daily at ‘00:00’.

import boto3
from botocore.exceptions import ClientError
import os
from datetime import datetime, timedelta
import schedule
import time

def upload_file_to_s3(file_name, bucket, object_name=None, folder_name=None):
    """
    Upload a file to an S3 bucket.
    Params:
        file_name: File to upload
        bucket: Bucket to upload to
        object_name: S3 object name. If not specified then file_name is used
        folder_name: Folder name in which file is to be uploaded
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = file_name.split('/')[-1]
        # If folder_name was specified, upload in the folder
        if folder_name is not None:
            object_name = f'{folder_name}/{object_name}'

    # Upload the file
    try:
        s3_client = boto3.client(
            service_name='s3',
            aws_access_key_id='YOUR_AWS_ACCESS_KEY_ID',
            aws_secret_access_key='YOUR_AWS_SECRET_ACCESS_KEY'
        )
        response = s3_client.upload_file(file_name, bucket, object_name)
        print(response)
    except ClientError as e:
        print(e)


def append_text_to_file_names(files, text):
    """
    Appends given text to the name of the files.
    Params:
        files: List(str): list of file paths
        text: str: Text that is to appended
    Returns:
        files: List(str): list of file paths with text appended
    """
    for i in range(len(files)):
        file_splitted = files[i].split('/')
        file_path = file_splitted[:-1]
        file_name = file_splitted[-1]
        file_name_splitted = file_name.split('.')
        new_file_name = '.'.join([file_name_splitted[0], text, file_name_splitted[1]])
        file_path.append(new_file_name)
        new_file_name_with_path = '/'.join(file_path)
        os.rename(files[i], new_file_name_with_path)
        files[i] = new_file_name_with_path
    return files


def rename_and_backup_logs_s3():
    """
    Backsup log files to s3 bucket
    """
    today = datetime.now()
    yesterday = today - timedelta(days=1)
    text = yesterday.strftime('%d-%m-%Y')

    log_files = [
        '/home/pushp/logs/server1.log',
        '/home/pushp/logs/server2.log',
        '/home/pushp/logs/server3.log',
        '/home/pushp/logs/server4.log'
    ]

    print('Appending date to log files...')
    log_files = append_text_to_file_names(log_files, text)
    print('Appended date to log files...')

    print('Uploading logs to S3...')
    for log_file in log_files:
        upload_file_to_s3(
            file_name=log_file,
            bucket='YOUR_BUCKET_NAME',
            folder_name='server_logs'
        )
    print('Uploaded logs to S3...')


if __name__ == "__main__":
    schedule.every().day.at("00:00").do(rename_and_backup_logs_s3)
    while True:
        schedule.run_pending()
        time.sleep(60)  # wait one minute

Run the script:

(env)$ python3 backup_logs_s3.py

In the production environment, use supervisord to start up the script.

Resources:

1
Subscribe to my newsletter

Read articles from Pushp Vashisht directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pushp Vashisht
Pushp Vashisht

Pushp Vashisht is studying MSc in Computing Science at University College Cork, Ireland. For more information pay a visit at: pushp.ml/