Migrating from Google Cloud Storage to AWS S3: A Comprehensive Guide


You ever start a "simple" task just sync a GCP bucket to AWS S3 and suddenly find yourself staring at the terminal at 1 AM, questioning your life choices?
Let's rewind.
🛠️ The gsutil
Nightmare
I had a GCP bucket full of data and figured, 'No big deal, just gsutil rsync
to S3? and I'm out,' I told myself. Big mistake.
Command:
gsutil rsync -r gs://<gcp_bucket> s3://<aws_bucket>
Everything seemed fine… until it wasn’t:
Caught non‑retryable exception while listing s3://<bucket>/: AccessDeniedException: 403 InvalidAccessKeyId
<?xml version="1.0" encoding="UTF‑8"?>
<Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message>…
CommandException: Caught non‑retryable exception – aborting rsync
What? I checked .boto
and AWS creds fresh service account, correct perms, but no dice. Hours wasted. Frustration exploding.
🎯 Why AWS DataSync Was the Answer
That’s when I came across AWS DataSync. It felt like one of those “where have you been all my life?” moments.
Think of DataSync as AWS’s way of making data transfers between different storage systems easier, faster, and more secure without all the manual effort. Sounds good, right?
AWS DataSync gives you:
⚡ Built-in encryption and integrity checks
🧠 Monitoring, logging, and CloudWatch integration
🗓️ Task scheduling and audit trails
🌍 Move data faster
That was enough to convince me. And if you're wondering how it all fits together, let me walk you through it but first, a little about DataSync.
Think of a DataSync location like a "home address" for your data. It can point to an AWS S3 bucket, an NFS share, or even a GCP Cloud Storage bucket (yes, cross-cloud!). You create a source location (where your data lives) and a destination location (where you want it to go). This setup tells AWS DataSync where to look and where to drop the files securely and efficiently.
The task is the actual moving truck. Once your source and destination locations are ready, a DataSync task defines what to transfer and how. You can choose to sync everything, skip files that haven’t changed, or even schedule it to run daily. It’s all automated no more messy CLI scripts or failed gsutil
retries at midnight.
📝 AWS DataSync Setup
Here’s exactly how I got it working:
1. Create a GCS HMAC key
In the GCP Console, navigate to:
IAM & Admin → Service Accounts
Create or select a service account.
Grant it Storage Object Viewer access.
Then head to:
Cloud Storage → Settings → Interoperability
Under Access keys for service accounts, select your service account and generate a new key.
Save the access key ID and secret key securely. You'll need these soon.
Copy your GCS endpoint URI:
storage.googleapis.com
2. Launch a DataSync agent on EC2
Open a terminal and copy the following AWS CLI command to get the latest DataSync Amazon Machine Image (AMI) ID for the Region where you want to deploy your Amazon EC2 agent.
aws ssm get-parameter --name /aws/service/datasync/ami --region your-region
then launch an instance (m5.2xlarge or t2.2xlarge works) in your VPC, with port 80 inbound for activation.
3. Activating your AWS DataSync agent
When activating your agent in the DataSync console, DataSync can get the activation key for you by using the Automatically get the activation key from your agent option. To use this option, your browser must be able to reach your agent on port 80.
4. Register Source Location (GCS)
Go to DataSync → Locations → Create location
Choose:
Type: Object storage
Server:
storage.googleapis.com
Bucket name: Your GCS bucket
Paste your GCP service account HMAC key (access + secret)
5. Create Destination Location (S3)
Type: Amazon S3
Select your S3 bucket and optional prefix
Choose an IAM role with proper S3 write permissions.
{ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:GetBucketLocation", "s3:ListBucket", "s3:ListBucketMultipartUploads" ], "Effect": "Allow", "Resource": "arn:aws:s3:::amzn-s3-demo-bucket" "Condition": { "StringEquals": { "aws:ResourceAccount": "1111111111" } } }, { "Action": [ "s3:AbortMultipartUpload", "s3:DeleteObject", "s3:GetObject", "s3:GetObjectTagging", "s3:GetObjectVersion", "s3:GetObjectVersionTagging", "s3:ListMultipartUploadParts", "s3:PutObject", "s3:PutObjectTagging" ], "Effect": "Allow", "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*" "Condition": { "StringEquals": { "aws:ResourceAccount": "1111111111" } } } ] }
Ensure the trust policy includes
datasync.amazonaws.com
as the service principal
6. Create and Configure the Task
Source → your GCS location
Destination → S3 location
On settings: Use “Enhanced” (no agent required) or “Basic” with agent. I kept agent-based.
- Optionally set include/exclude filters.
7. Run the Task & Monitor
Click Start task
Watch transfer logs, performance stats, and error messages in the DataSync console
Use CloudWatch Logs or the Monitoring tab to observe bandwidth, file count, duration, and error rates
💬 Final Thoughts
Moving data across clouds should be simple but in reality, it rarely is. After fighting with gsutil
, permission errors, AWS DataSync turned out to be the tool that just worked.
Appreciate you making it to the end, feel free to leave a comment or a like if this helped you out!
Subscribe to my newsletter
Read articles from Kanak Vyas directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Kanak Vyas
Kanak Vyas
Driving Efficiency in DevOps Realms ∞