Read, write and copy files in S3 with Python Boto3

Objective:

I wanted to read a file in s3, process it, store the data in database, move the file to another “location” in s3

I needed to take care of three things.

Credentials

It is important to have an IAM user and generate an access key for them. I needed this access key and secret key to access the files. The secret key is only visible/available when you create the access key.

Access Keys can be created through the below way.

AWS Admin Console -> IAM -> User -> <user_name> -> Security Credentials -> Access Keys -> Create Access Key

Permissions

The S3 bucket I was working on had ‘BucketOwnerEnforced’ setting enabled. This meant that instead of object level permissions, the file access is controlled by bucket policy. So, I needed to put in the right permissions in the bucket policy.

{
    "Version": "2023-12-16",
    "Statement": [
        {
            "Sid": "AllowGetObject",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::<aws_account_id>:user/<user_name>"
                ]
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::<bucket_name>",
                "arn:aws:s3:::<bucket_name>/*"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalOrgID": "<org-id>"
                }
            }
        }
    ]
}

Note: The org id can be obtained from AWS account console. Click on the user name @ <account_id> link at the top right -> Organization -> Organization Id.

Code

Step 1: Created a s3 session

import boto3
session = boto3.Session(aws_access_key_id=AWS_ACCESS_KEY, aws_secret_access_key=AWS_SECRET_KEY)
s3_resource: object = session.resource('s3')

Step 2: Read file(s) as dataframe

Use the session to get a client object and read the file. I had an excel file to read, so I used the below method(s).

s3_client = session.client('s3')
obj = s3_client.get_object(Bucket=<S3_BUCKET_NAME>, Key=<file_name>)
dataframe = pd.read_excel(obj['Body'].read(), sheet_name='<sheet-name>')

I got the entire worksheet in a dataframe to process, transform and store in database.

This is where your business function goes — do what you want with the data. I am not delving into that.

Finally, as is usually the case, you may want to move the file you read to another “location”. Note: The location is air quoted because s3 doesn’t really have a folder structure. It is individual files all the way down but that is a different discussion.

Step 3: Copied a file object from one location to another (it is actually cloning a file into a new file and then deleting the old one)

s3_resource.Object(S3_BUCKET_NAME, new_name).copy_from(CopySource=S3_BUCKET_NAME+"/"+old_name)
💡
In the copy_from method, including the bucket name is crucial. I spent a lot of time trying to figure out the permission issues that were getting spit out — especially the one below.

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CopyObject operation: Access Denied

ConclusionS

💡
Once you take care of the above three i.e. role, permission and code (API), you should be pretty much done.
0
Subscribe to my newsletter

Read articles from Ramesh Lakshmipathy directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ramesh Lakshmipathy
Ramesh Lakshmipathy