Terraform Associate: Query Data Sources

Chintan BogharaChintan Boghara
5 min read

Terraform is renowned for its ability to manage infrastructure as code, not only by creating resources but also by integrating with existing infrastructures. One of the most powerful features of Terraform is data sources. They allow you to query and retrieve information about existing resources from various providers, making your configurations more dynamic and adaptable. In this guide, we’ll delve into what data sources are, explore several practical examples, and share best practices for using them effectively.

What is a Data Source?

In Terraform, a data source is used to fetch existing information from infrastructure providers (such as AWS, Azure, or Google Cloud Platform) or external services. Unlike resource blocks that create or manage infrastructure, data sources simply retrieve and reference information. This makes them invaluable for scenarios such as:

  • Integrating with Pre-existing Infrastructure: Access details of resources not created by Terraform.

  • Dynamic Configuration: Retrieve dynamic values like the latest AMI or existing VPCs to use them in your deployments.

  • Centralized Secrets Management: Query sensitive information like passwords from services such as AWS Secrets Manager without hardcoding them.

Real-World Examples of Data Sources

1. Querying an Existing AWS VPC

Imagine you have an existing VPC tagged with Name=my-vpc and want to reference its ID in your Terraform configuration. Here’s how you can do it:

data "aws_vpc" "existing_vpc" {
  filter {
    name   = "tag:Name"
    values = ["my-vpc"]
  }
}

output "vpc_id" {
  value = data.aws_vpc.existing_vpc.id
}

In this example, the data source filters AWS VPCs by a tag and outputs the corresponding VPC ID. This approach avoids duplication and helps maintain a single source of truth for resource identifiers.

2. Retrieving the Latest Amazon Linux AMI

Keeping your infrastructure updated with the latest Amazon Machine Images (AMIs) is a common requirement. Use a data source to dynamically fetch the most recent Amazon Linux 2 AMI:

data "aws_ami" "latest_amazon_linux" {
  most_recent = true

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["amazon"]
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.latest_amazon_linux.id
  instance_type = "t2.micro"
}

output "ami_id" {
  value = data.aws_ami.latest_amazon_linux.id
}

This configuration ensures that your EC2 instance always launches with the latest AMI, improving security and performance without manual intervention.

3. Accessing an Existing Security Group

Sometimes, you may need to reference an already established security group. For example, to attach a specific security group to an EC2 instance:

data "aws_security_group" "web_sg" {
  filter {
    name   = "group-name"
    values = ["web-server-sg"]
  }
}

resource "aws_instance" "example" {
  ami                    = "ami-12345678"
  instance_type          = "t2.micro"
  vpc_security_group_ids = [data.aws_security_group.web_sg.id]
}

Here, Terraform retrieves the security group with the name web-server-sg and uses its ID in the EC2 instance configuration.

4. Querying AWS Secrets Manager

Managing sensitive information securely is critical. Data sources can retrieve secrets from AWS Secrets Manager without exposing them in your code:

data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "my-db-password"
}

variable "db_password" {
  type      = string
  sensitive = true
  default   = data.aws_secretsmanager_secret_version.db_password.secret_string
}

output "db_password" {
  value     = var.db_password
  sensitive = true
}

This example retrieves a database password securely and marks it as sensitive, ensuring it isn’t inadvertently exposed.

5. Data Sources Across Different Providers

Terraform’s versatility extends beyond AWS. Here are examples for Azure and Google Cloud:

Azure: Get an Existing Resource Group

data "azurerm_resource_group" "existing_rg" {
  name = "my-resource-group"
}

output "resource_group_location" {
  value = data.azurerm_resource_group.existing_rg.location
}

Google Cloud: Get an Existing GCS Bucket

data "google_storage_bucket" "existing_bucket" {
  name = "my-bucket"
}

output "bucket_location" {
  value = data.google_storage_bucket.existing_bucket.location
}

Both examples demonstrate how to query existing infrastructure elements, enabling you to integrate them seamlessly into your Terraform configurations.

Best Practices for Using Data Sources

  1. Avoid Duplication:
    Use data sources to reference existing resources rather than recreating them in your configuration.

  2. Minimize API Calls:
    Cache frequently used values in variables to reduce the number of API calls, improving performance and reducing potential rate limits.

  3. Combine with Outputs:
    Use outputs to display data source results for debugging and transparency in automation pipelines.

  4. Dynamic Values:
    Leverage data sources for dynamic and ever-changing resources like AMIs, VPCs, and secrets. This ensures that your infrastructure always adapts to the latest configurations and standards.

Conclusion

Data sources in Terraform provide a powerful mechanism to integrate and interact with your existing infrastructure. By querying external resources—be it an AWS VPC, the latest Amazon Linux AMI, a security group, or even secrets stored in AWS Secrets Manager—you can build more dynamic, maintainable, and secure Terraform configurations. Applying best practices further enhances the reliability and efficiency of your infrastructure as code approach.

Embrace the power of data sources to streamline your Terraform workflows and maintain a cohesive infrastructure environment across multiple cloud providers.

Happy coding and automating!

Reference

  1. Terraform Data Sources Documentation
    https://www.terraform.io/language/data-sources
    Detailed official documentation on how data sources work in Terraform.

  2. Terraform AWS VPC Data Source
    https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/vpc
    Explains querying existing AWS VPCs and integrating them into your Terraform configurations.

  3. Terraform AWS AMI Data Source
    https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ami
    Provides guidance on retrieving the latest Amazon Linux AMI dynamically.

  4. Terraform AzureRM Resource Group Data Source
    https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/data-sources/resource_group
    Covers how to query and use existing Azure resource groups in your Terraform projects.

  5. Terraform Google Cloud Storage Bucket Data Source
    https://registry.terraform.io/providers/hashicorp/google/latest/docs/data-sources/storage_bucket
    Demonstrates how to access existing Google Cloud Storage buckets using Terraform data sources.

10
Subscribe to my newsletter

Read articles from Chintan Boghara directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Chintan Boghara
Chintan Boghara

Exploring DevOps ♾️, Cloud Computing ☁️, DevSecOps 🔒, Site Reliability Engineering ⚙️, Platform Engineering 🛠️, Machine Learning Operations 🤖, and AIOps 🧠