GCP to AWS Migration – Part 1: Architecture, Data Transfer & Infrastructure Setup

Cyril SebastianCyril Sebastian
6 min read

🧭 Why We Migrated: Business Drivers Behind the Move

Our platform, serving millions of daily users, was running smoothly on GCP. However, evolving business goals, pricing considerations, and long-term cloud ecosystem alignment led us to migrate to AWS.

Key components of our GCP-based stack:

  • Web Tier: Next.js frontend + Django backend

  • Databases: MongoDB replica sets, MySQL clusters

  • Asynchronous Services: Redis, RabbitMQ

  • Search: Apache Solr for full-text search

  • Infrastructure: GCP Compute Engine VMs, managed instance groups, HTTPS Load Balancer

  • Storage: 21 TB of data in Google Cloud Storage (GCS)


πŸ“‹ Step 0: Creating a Migration Runbook

We treated this as a mission-critical project. Our runbook included:

  • Stakeholders: CTO, DevOps Lead, Database Architect, Application Owners

  • Timeline: 8 weeks from planning to cutover

  • Phases: Network Setup β†’ Data Migration β†’ Database Sync β†’ Application Migration β†’ Cutover

  • Rollback Plan: Prepared and rehearsed with timelines for failback


πŸ› οΈ Infrastructure Mapping: GCP vs AWS

Challenges in Mapping:

  • GCP allows custom CPU and RAM, AWS uses fixed instance types (t3, m6i, r6g, etc)

  • IOPS differences between GCP SSDs and AWS EBS gp3 required tuning

  • Cost model varies significantly (especially egress from GCP)

We used AWS Pricing Calculator and GCP Pricing Calculator to simulate monthly billing and select cost-optimized instance types.


🌐 Phase 1: AWS Network Infrastructure Setup

AWS Network Infrastructure (ap-south-1)

                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                            β”‚    GCP / DC VPC      β”‚
                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚
                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚ Site-to-Site VPN  β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚
                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚      VPC          β”‚
                           β”‚  (ap-south-1)     β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚                          β”‚                             β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Public Subnet AZ1 β”‚     β”‚ Public Subnet AZ2   β”‚       β”‚ Public Subnet AZ3   β”‚
β”‚ - Bastion Host    β”‚     β”‚ - NAT Gateway       β”‚       β”‚ - Internet Gateway  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                          β”‚                             β”‚
          β–Ό                          β–Ό                             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Private Subnet 1β”‚        β”‚Private Subnet 2β”‚              β”‚Private Subnet 3β”‚
β”‚App / DB Tier   β”‚        β”‚App / DB Tier   β”‚              β”‚App / DB Tier   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                 Security Groups + NACLs as per GCP mapping
                 VPC Flow Logs β†’ CloudWatch Logs

🧩 Components Breakdown

ComponentPurpose
3 AZsHigh availability and fault tolerance
Public SubnetsBastion, NAT, IGW for ingress/egress
Private SubnetsIsolated app and DB tiers
VPNSecure hybrid GCP–AWS connectivity
SecuritySecurity Groups + NACLs derived from GCP firewall rules
MonitoringVPC Flow Logs + CloudWatch Metrics for visibility

πŸ“¦ Phase 2: Data Migration (GCS β†’ S3)

We migrated over 21 TB of user-generated and application asset data from Google Cloud Storage (GCS) to Amazon S3. Given the scale, this phase required surgical precision in planning, execution, and cost control.

Tools & Techniques Used

  • AWS DataSync:
    Chosen for its efficiency, security, and ability to handle large-scale object transfers.

  • Service Account HMAC Credentials:
    Used for secure bucket-to-bucket authentication between GCP and AWS.

  • Phased Sync Strategy:

    • Initial Full Sync β€” Began 3 weeks before cutover

    • Delta Syncs β€” Repeated every 2–3 days

    • Final Cutover Sync β€” During the 6-hour cutover window

Note: We carefully validated checksums and object counts after each sync phase to ensure data integrity and avoid overwriting unchanged files.


πŸ’‘ Smart Optimization Decisions

  • Selective Data Migration:

    • Identified several GCS buckets containing temporary compliance logs with auto-expiry policies.

    • Instead of migrating them and incurring egress charges, we chose to let them expire in GCP.

    • This alone saved several thousand dollars in unnecessary transfer costs.

  • Delta Awareness:

    • Designed the sync jobs to be delta-aware to prevent redundant data movement.

    • Ensured that only modified/new objects were transferred during delta and final syncs.

Post-Migration S3 Tuning

After the bulk migration was completed, we fine-tuned our S3 environment for cost optimization, data hygiene, and long-term sustainability.

  • Lifecycle Policies Implemented:

  • Automatic archival of infrequently accessed data to S3 Glacier.

  • Expiry rules for:

    • Temporary staging files.

    • Orphaned or abandoned data older than a defined threshold.

  • Configured S3 Incomplete Multipart Upload Aborts:

    • Any incomplete uploads are now automatically aborted after 7 days, preventing unnecessary storage billing from partial uploads caused by network or user errors.

Lessons Learned

  • Data Volume β‰  Data Complexity:
    Even though we had tools for the job, coordinating syncs across staging, pre-prod, and production environments required careful orchestration and monitoring.

  • Egress and DTO Costs:
    Data Transfer Out from GCP was a hidden but substantial cost centerβ€”plan ahead for this when budgeting.

  • S3 Behavior Is Not GCS:
    We had to adjust application logic and IAM policies post-migration to align with S3 object handling, access policies, and permissions model.


πŸ—„οΈ Phase 3: Database Migration

MongoDB Migration

Migrating MongoDB from GCP to AWS was one of the most sensitive components of the move due to its role in powering real-time operations and user sessions.

Our Strategy:

  • Replica Set Initialization: Set up MongoDB replica sets on AWS EC2 instances to mirror the topology running in GCP.

  • Oplog-Based Sync: Enabled oplog-based replication between AWS and GCP MongoDB nodes to ensure near real-time data synchronization without full data dumps.

  • Hybrid Node Integration: Deployed a MongoDB node in AWS, directly connected to the GCP replica set, acting as a bridge before full cutover.

  • iptables for Controlled Access: Used iptables rules to restrict write access during the sync period. This allowed inter-DB synchronization traffic only, blocking application-level writes and ensuring data consistency before switchover.

  • Failover Testing: Conducted multiple failover and promotion drills to validate readiness, with rollback plans in place.

Key Takeaway: Setting up a hybrid node and controlling access at the OS level allowed us to minimize data drift and test production-grade failovers without service disruption.

MySQL Migration

The MySQL component required careful orchestration to ensure transactional consistency and minimal downtime.

Our Approach:

  • Master-Slave Topology: Established a classic master-slave setup on AWS EC2 instances to replicate data from the GCP-hosted MySQL master.

  • Replication Lag Challenges: One of the major blockers encountered was replication lag during promotion drills, especially under active write-heavy workloads.

  • Controlled Write Freeze: We implemented iptables-based rules at the OS level to block application write traffic, allowing replication to catch up safely before cutover.

  • Promotion Strategy:

    • Executed a time-based cutover window.

    • Promoted the AWS slave node to master using a custom validation script to check replication offsets and ensure data integrity.

    • All secondary nodes were reconfigured to follow the new AWS master, ensuring consistency across the cluster.

Key Takeaway: Blocking writes via iptables provided a clean buffer for promotion without the risk of in-flight transactions, making the cutover smooth and predictable.


End of Part 1: Setting the Stage for Migration

You’ve seen how we architected an AWS environment from scratch, replicated critical systems like MongoDB and MySQL, and seamlessly migrated over 21 TB of assets from GCP to S3β€”all while optimizing for cost, security, and scalability.

But this was just the calm before the storm.

"Give me six hours to chop down a tree and I will spend the first four sharpening the axe."
β€” Abraham Lincoln

We were well-prepared. But would the systemsβ€”and the teamβ€”hold up during live cutover?

In Part 2: The Real Cutover & Beyond, we’ll step into the fire:

  • What went wrong,

  • What we had to patch live,

  • And what we did to walk away from it stronger.

πŸ‘‰ Don't miss it. Follow me on LinkedIn for more deep-dive case studies and real-world DevOps/CloudOps stories like this.

0
Subscribe to my newsletter

Read articles from Cyril Sebastian directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Cyril Sebastian
Cyril Sebastian

I’m Cyril Sebastian, a DevOps and Cloud Infrastructure architect with 10+ years of experience building, scaling, and securing cloud-native and hybrid systems. I specialize in automation, cost optimization, observability, and platform engineering across AWS, GCP, and Oracle Cloud. My passion lies in solving complex infrastructure challengesβ€”from cloud migrations to Infrastructure as Code (IaC), and from deployment automation to scalable monitoring strategies. I blog here about: Cloud strategy and migration playbooks Real-world DevOps and automation with Terraform, Jenkins, and Ansible DevSecOps practices and security-first thinking in production Monitoring, cost optimization, and incident response at scale If you're building in the cloud, optimizing infra, or exploring DevOps cultureβ€”let’s connect and share ideas! πŸ”— linkedin.com/in/sebastiancyril