Migrating from Google Cloud Storage to AWS S3: A Comprehensive Guide

Kanak VyasKanak Vyas
4 min read

You ever start a "simple" task just sync a GCP bucket to AWS S3 and suddenly find yourself staring at the terminal at 1 AM, questioning your life choices?

Let's rewind.

🛠️ The gsutil Nightmare

I had a GCP bucket full of data and figured, 'No big deal, just gsutil rsync to S3? and I'm out,' I told myself. Big mistake.

Command:

gsutil rsync -r gs://<gcp_bucket> s3://<aws_bucket>

Everything seemed fine… until it wasn’t:

Caught non‑retryable exception while listing s3://<bucket>/: AccessDeniedException: 403 InvalidAccessKeyId
<?xml version="1.0" encoding="UTF‑8"?>
<Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message>…
CommandException: Caught non‑retryable exception – aborting rsync

What? I checked .boto and AWS creds fresh service account, correct perms, but no dice. Hours wasted. Frustration exploding.

🎯 Why AWS DataSync Was the Answer

That’s when I came across AWS DataSync. It felt like one of those “where have you been all my life?” moments.

Think of DataSync as AWS’s way of making data transfers between different storage systems easier, faster, and more secure without all the manual effort. Sounds good, right?

AWS DataSync gives you:

  • ⚡ Built-in encryption and integrity checks

  • 🧠 Monitoring, logging, and CloudWatch integration

  • 🗓️ Task scheduling and audit trails

  • 🌍 Move data faster

That was enough to convince me. And if you're wondering how it all fits together, let me walk you through it but first, a little about DataSync.

Think of a DataSync location like a "home address" for your data. It can point to an AWS S3 bucket, an NFS share, or even a GCP Cloud Storage bucket (yes, cross-cloud!). You create a source location (where your data lives) and a destination location (where you want it to go). This setup tells AWS DataSync where to look and where to drop the files securely and efficiently.

The task is the actual moving truck. Once your source and destination locations are ready, a DataSync task defines what to transfer and how. You can choose to sync everything, skip files that haven’t changed, or even schedule it to run daily. It’s all automated no more messy CLI scripts or failed gsutil retries at midnight.

📝 AWS DataSync Setup

Here’s exactly how I got it working:

1. Create a GCS HMAC key

  • In the GCP Console, navigate to:

    • IAM & Admin → Service Accounts

    • Create or select a service account.

    • Grant it Storage Object Viewer access.

  • Then head to:

    • Cloud Storage → Settings → Interoperability

    • Under Access keys for service accounts, select your service account and generate a new key.

    • Save the access key ID and secret key securely. You'll need these soon.

    • Copy your GCS endpoint URI:

storage.googleapis.com

2. Launch a DataSync agent on EC2

Open a terminal and copy the following AWS CLI command to get the latest DataSync Amazon Machine Image (AMI) ID for the Region where you want to deploy your Amazon EC2 agent.

aws ssm get-parameter --name /aws/service/datasync/ami --region your-region

then launch an instance (m5.2xlarge or t2.2xlarge works) in your VPC, with port 80 inbound for activation.

3. Activating your AWS DataSync agent

When activating your agent in the DataSync console, DataSync can get the activation key for you by using the Automatically get the activation key from your agent option. To use this option, your browser must be able to reach your agent on port 80.

4. Register Source Location (GCS)

  • Go to DataSync → Locations → Create location

  • Choose:

    • Type: Object storage

    • Server: storage.googleapis.com

    • Bucket name: Your GCS bucket

    • Paste your GCP service account HMAC key (access + secret)

5. Create Destination Location (S3)

  • Type: Amazon S3

  • Select your S3 bucket and optional prefix

  • Choose an IAM role with proper S3 write permissions.

      {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Action": [
                   "s3:GetBucketLocation",
                   "s3:ListBucket",
                   "s3:ListBucketMultipartUploads"
               ],
               "Effect": "Allow",
               "Resource": "arn:aws:s3:::amzn-s3-demo-bucket"
               "Condition": {
                   "StringEquals": {
                       "aws:ResourceAccount": "1111111111"
                   }
               }
           },
           {
               "Action": [
                   "s3:AbortMultipartUpload",
                   "s3:DeleteObject",
                   "s3:GetObject",
                   "s3:GetObjectTagging",
                   "s3:GetObjectVersion",
                   "s3:GetObjectVersionTagging",
                   "s3:ListMultipartUploadParts",
                   "s3:PutObject",
                   "s3:PutObjectTagging"
                 ],
               "Effect": "Allow",
               "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*"
               "Condition": {
                   "StringEquals": {
                       "aws:ResourceAccount": "1111111111"
                   }
               }
           }
       ]
      }
    
  • Ensure the trust policy includes datasync.amazonaws.com as the service principal

6. Create and Configure the Task

  • Source → your GCS location

  • Destination → S3 location

  • On settings: Use “Enhanced” (no agent required) or “Basic” with agent. I kept agent-based.

    • Optionally set include/exclude filters.

7. Run the Task & Monitor

  • Click Start task

  • Watch transfer logs, performance stats, and error messages in the DataSync console

  • Use CloudWatch Logs or the Monitoring tab to observe bandwidth, file count, duration, and error rates


💬 Final Thoughts

Moving data across clouds should be simple but in reality, it rarely is. After fighting with gsutil, permission errors, AWS DataSync turned out to be the tool that just worked.

Appreciate you making it to the end, feel free to leave a comment or a like if this helped you out!

0
Subscribe to my newsletter

Read articles from Kanak Vyas directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kanak Vyas
Kanak Vyas

Driving Efficiency in DevOps Realms ∞