Read, write and copy files in S3 with Python Boto3


Objective:
I wanted to read a file in s3, process it, store the data in database, move the file to another “location” in s3
I needed to take care of three things.
Credentials
It is important to have an IAM user and generate an access key for them. I needed this access key and secret key to access the files. The secret key is only visible/available when you create the access key.
Access Keys can be created through the below way.
AWS Admin Console -> IAM -> User -> <user_name> -> Security Credentials -> Access Keys -> Create Access Key
Permissions
The S3 bucket I was working on had ‘BucketOwnerEnforced’ setting enabled. This meant that instead of object level permissions, the file access is controlled by bucket policy. So, I needed to put in the right permissions in the bucket policy.
{
"Version": "2023-12-16",
"Statement": [
{
"Sid": "AllowGetObject",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<aws_account_id>:user/<user_name>"
]
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::<bucket_name>",
"arn:aws:s3:::<bucket_name>/*"
],
"Condition": {
"StringEquals": {
"aws:PrincipalOrgID": "<org-id>"
}
}
}
]
}
Note: The org id can be obtained from AWS account console. Click on the user name @ <account_id> link at the top right -> Organization -> Organization Id.
Code
Step 1: Created a s3 session
import boto3
session = boto3.Session(aws_access_key_id=AWS_ACCESS_KEY, aws_secret_access_key=AWS_SECRET_KEY)
s3_resource: object = session.resource('s3')
Step 2: Read file(s) as dataframe
Use the session to get a client object and read the file. I had an excel file to read, so I used the below method(s).
s3_client = session.client('s3')
obj = s3_client.get_object(Bucket=<S3_BUCKET_NAME>, Key=<file_name>)
dataframe = pd.read_excel(obj['Body'].read(), sheet_name='<sheet-name>')
I got the entire worksheet in a dataframe to process, transform and store in database.
This is where your business function goes — do what you want with the data. I am not delving into that.
Finally, as is usually the case, you may want to move the file you read to another “location”. Note: The location is air quoted because s3 doesn’t really have a folder structure. It is individual files all the way down but that is a different discussion.
Step 3: Copied a file object from one location to another (it is actually cloning a file into a new file and then deleting the old one)
s3_resource.Object(S3_BUCKET_NAME, new_name).copy_from(CopySource=S3_BUCKET_NAME+"/"+old_name)
“botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CopyObject operation: Access Denied”
ConclusionS
Subscribe to my newsletter
Read articles from Ramesh Lakshmipathy directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
