Mastering AWS Security - Post 5: Amazon Macie – Classify and Protect Sensitive Data

1. Introduction

In today’s cloud-first world, data is your crown jewel—and your greatest liability if not protected properly. From personal identifiable information (PII) to intellectual property, the data you store in AWS must be secured against leaks, breaches, and compliance failures. Enter Amazon Macie.

Amazon Macie is a fully managed data security and data privacy service that uses machine learning (ML) and pattern matching to discover and protect your sensitive data in AWS. It’s purpose-built for identifying sensitive data at scale, especially in Amazon S3, and integrates seamlessly with other AWS services for alerting and remediation.

Whether you’re just getting started in cloud security or preparing for the AWS Security Specialty certification, this blog will walk you through how Macie works, its powerful capabilities, real-world use cases, and how to get the most out of it.


2. How Amazon Macie Works

At its core, Macie continuously scans Amazon S3 buckets to identify and classify sensitive data. It uses pre-trained ML models and pattern matching to detect:

  • PII: Names, addresses, phone numbers, national IDs

  • Financial data: Credit card numbers, bank account details

  • Credentials: Access keys, secrets

  • Custom data patterns you define

Supported Sources: Currently, Macie only supports scanning Amazon S3. It doesn't work with EBS, RDS, DynamoDB, or other AWS data stores.

Process Overview:

  1. Macie evaluates your S3 inventory for security risks (e.g., unencrypted or publicly accessible buckets).

  2. You define discovery jobs to scan buckets for sensitive data.

  3. Macie classifies the data and generates findings.

  4. Findings can be forwarded to AWS Security Hub, EventBridge, or processed with Lambda.


3. Key Features and Capabilities

S3 Bucket Inventory and Risk Analysis

Macie gives a high-level view of all your S3 buckets, highlighting those with potential risks:

  • Public access

  • Unencrypted data

  • Access control policies

This is your first checkpoint to understand where to focus.

Sensitive Data Discovery Jobs

Discovery jobs are how Macie scans data:

  • One-time: Great for audits or initial scans.

  • Recurring: For continuous monitoring.

You can scope jobs by:

  • Bucket names

  • Object prefixes (like folders)

  • Object age (e.g., only files created in the last 90 days)

  • Tags (e.g., tag sensitive workloads with data:sensitive=true)

Custom and Managed Data Identifiers

Before Macie can detect any sensitive data, you must configure what data types it should look for. This is done through Managed Data Identifiers (MDIs) and Custom Data Identifiers (CDIs).

By default, Macie does not start scanning with any data identifiers after enabling the service. You must create a classification job and explicitly define which MDIs or CDIs to use.

Managed Data Identifiers (MDI)

Managed Data Identifiers (MDIs) are pre-built detection rules provided by AWS. These identifiers use a combination of machine learning, context-based logic, and pattern recognition to find common types of sensitive data like:

  • Email addresses

  • Credit card numbers

  • Social Security numbers (SSNs)

  • Passport numbers

  • AWS credentials

  • IP addresses and MAC addresses

Important: MDIs are not enabled automatically when you enable Macie. You must choose which ones to include during classification job creation.

To include all MDIs in a job using the AWS CLI:

aws macie2 create-classification-job \
  --job-type ONE_TIME \
  --name "FullScanJob" \
  --s3-job-definition 'BucketDefinitions=[{AccountId="123456789012",Buckets=["my-bucket"]}]' \
  --custom-data-identifier-ids [] \
  --managed-data-identifier-ids ALL

Or, to specify a subset:

--managed-data-identifier-ids "CreditCardNumber" "EmailAddress"

These identifiers are backed by machine learning and contextual analysis to reduce false positives. They're regularly updated by AWS to reflect real-world data formats and are ideal for:

  • Compliance-driven scans (PCI-DSS, HIPAA, GDPR)

  • Broad coverage of universally sensitive data

  • Quick deployments when you need fast insights

You can select which managed identifiers to include or exclude in a job, giving you control over scan scope and cost.

Custom Data Identifiers (CDI)

While managed identifiers cover most common data types, there are cases when your organization deals with proprietary or industry-specific data. That’s where custom data identifiers come in.

Custom identifiers allow you to define specific patterns using:

  • Regular expressions (Regex): Match complex, structured data

  • Keywords: Additional context to improve match precision

  • Proximity rules: How close keywords must be to a regex match

Example: Employee ID Custom Identifier

Say your internal Employee ID format is EMP123456. You can create a custom identifier as follows:

{
  "Name": "EmployeeID",
  "Regex": "EMP[0-9]{6}",
  "Keywords": ["employee", "staff"],
  "MaximumMatchDistance": 50
}
Why Use Custom Identifiers?
  • Detect internal formats like customer account numbers, case IDs, or contract codes

  • Tighten precision for proprietary data detection

  • Avoid false positives in noisy datasets

The best practice is to combine both. Start with managed identifiers for wide coverage, and layer in custom identifiers to align Macie to your specific environment and risk profile.

Findings and Alerts

When Macie completes a discovery job and identifies sensitive data or risk indicators, it generates findings. These findings contain rich metadata including:

  • Data type found (e.g., credit card number, AWS key)

  • S3 object metadata (name, bucket, region, etc.)

  • Severity (low/medium/high)

  • Resource permissions (e.g., public access, cross-account access)

By default, Macie stores all findings in its own dashboard. However, sending those findings to other AWS services requires explicit configuration:

  • Amazon EventBridge: Auto-enabled

    • Macie automatically sends all findings to EventBridge without extra setup.

    • You can build custom automation using EventBridge rules and targets (e.g., trigger a Lambda).

  • AWS Security Hub: Requires manual enablement

    • You must explicitly enable integration between Macie and Security Hub in each account/region.

    • Once enabled, Macie findings appear in Security Hub alongside GuardDuty, Inspector, and more.

  • Amazon GuardDuty: Does not ingest Macie findings directly

    • There is no native direct integration.

    • However, both services can be correlated in Security Hub or via custom automation.

NOTE: Currently, Macie findings are not pushed to services like AWS Config, CloudTrail, AWS Detective (indirect correlation only if using Security Hub)

So, if you need centralized insight and correlation, Security Hub is your best option, and EventBridge is your go-to for automating responses.

Be sure to enable these integrations explicitly where needed for full visibility and automated protection workflows.

Scalability and Multi-account Support

Macie integrates with AWS Organizations to manage multiple accounts.

  • Use a delegated admin account to manage Macie across org units.

  • Centralize findings and discovery job configurations.


4. Integration with Broader AWS Security Stack

Macie + EventBridge + Lambda (Automated Remediation)

Step-by-step:

  1. Enable Macie and start a discovery job.

  2. Create a rule in Amazon EventBridge to catch Macie findings:

  3. Trigger a Lambda function that:

  • Notifies security via SNS

  • Quarantines the S3 object

  • Tags the file for review

Example AWS CLI Setup :

## Enable Macie
aws macie2 enable-macie --status ENABLED

aws events put-rule \
  --name "MacieSensitiveDataFound" \
  --event-pattern file://macie-event-pattern.json \
  --region us-east-1

#Add Target
aws events put-targets \
  --rule "MacieSensitiveDataFound" \
  --targets "Id"="1","Arn"="arn:aws:lambda:us-east-1:<account-id>:function:MacieQuarantineLambda"

#Grant Permissions to EventBridge to Invoke Lambda
aws lambda add-permission \
  --function-name MacieQuarantineLambda \
  --statement-id EventBridgeInvoke \
  --action "lambda:InvokeFunction" \
  --principal events.amazonaws.com \
  --source-arn arn:aws:events:us-east-1:<account-id>:rule/MacieSensitiveDataFound

macie-event-pattern.json

{
  "source": ["aws.macie"],
  "detail-type": ["Macie Finding"]
}

lambda_function.py

import boto3
import json

def lambda_handler(event, context):
    s3 = boto3.client('s3')

    # Extract bucket and object key from Macie finding
    detail = event['detail']
    bucket = detail['resourcesAffected']['s3Bucket']['name']
    key = detail['resourcesAffected']['s3Object']['key']

    # Example action: Add a tag to the object for quarantine
    s3.put_object_tagging(
        Bucket=bucket,
        Key=key,
        Tagging={
            'TagSet': [
                {
                    'Key': 'quarantine',
                    'Value': 'true'
                }
            ]
        }
    )
    return {'status': 'tagged', 'bucket': bucket, 'key': key}

Macie + GuardDuty

Macie findings about credentials or sensitive data exposure can be correlated with GuardDuty to detect threats like:

  • Compromised access keys

  • Data exfiltration attempts

For example:

  • Macie detects unencrypted PII in a publicly exposed S3 bucket

  • GuardDuty simultaneously detects suspicious access to that same bucket from an unusual IP address

  • Security Hub aggregates both findings to help analysts prioritize response

Macie + Security Hub

Macie findings appear as Security Standards in Security Hub, enabling:

  • Centralized visibility

  • Compliance scoring

  • Cross-service automation


5. Compliance and Governance Use Cases

Macie helps meet compliance for:

  • GDPR: Right to access, data minimization, and breach reporting

  • HIPAA: PHI discovery and access control

  • PCI-DSS: Cardholder data detection

  • SOC 2: Data security and privacy controls

How it helps:

  • Keep a record of where sensitive data is stored

  • Alert on unencrypted or publicly exposed data

  • Integrate into audits and risk assessments


6. Cost Optimization and Management

Macie is priced by:

  • S3 object count for inventory

  • GB scanned for sensitive data

Cost Control Tips:

  • Filter jobs using object prefixes or age

  • Use object tags to target sensitive data only

  • Avoid scanning buckets with logs or non-sensitive data

Example CLI to scan only tagged buckets:

aws macie2 create-classification-job \
  --job-type ONE_TIME \
  --s3-job-definition 'IncludeCriteria={TagValues=[{Key="data",Value="sensitive"}]}' \
  --name "TargetedSensitiveScan"

7. Real-World Use Cases and Scenarios

Preventing Data Leakage in a SaaS Company

A SaaS company stores tenant data in S3. A misconfigured bucket policy exposed it publicly. Macie:

  • Flagged the bucket as public

  • Discovered PII data (email, phone numbers)

  • Sent a finding to EventBridge

  • Triggered a Lambda that:

    • Locked down the bucket policy

    • Sent a Slack alert to SecOps

Financial Institution Detecting Secrets in Logs

Logs from various systems were stored in S3. Macie detected AWS Access Keys in raw logs:

  • Created an alert

  • Lambda quarantined the file

  • IAM role was rotated

  • Finding pushed to Security Hub

Global Enterprise with GDPR Obligations

Company with EU customers needs to map all PII across S3:

  • Recurring Macie job scans tagged buckets monthly

  • Reports sent to DPO for compliance

  • Alerts for any new unencrypted or public data


8. Hands-On: How to Get Started with Macie

Step 1: Enable Macie

aws macie2 enable-macie --status ENABLED

Step 2: View Your S3 Inventory

aws macie2 list-s3-resources

Step 3: Create a Discovery Job

aws macie2 create-classification-job \
  --job-type ONE_TIME \
  --s3-job-definition 'BucketDefinitions=[{AccountId="123456789012",Buckets=["my-bucket"]}]' \
  --name "InitialSensitiveScan"

Step 4: Review Findings

aws macie2 list-findings

9. Best Practices and Common Pitfalls

  • Tag data at source: Helps target scanning jobs

  • Use custom identifiers wisely: Avoid overly broad regexes

  • Monitor costs: Don’t scan unnecessary buckets

  • Review false positives: Tune identifiers based on feedback

  • Limit access: Use IAM conditions to restrict who can view Macie findings


10. Macie in AWS Security Specialty Certification

Macie is part of the Domain 4: Data Protection in the AWS Security Specialty exam.

Key topics:

  • How Macie identifies PII in S3

  • Integration with other services

  • Role in compliance strategy

  • Types of findings

Sample Scenario:
"You are notified that sensitive data may be publicly accessible. How can Macie help in this case?"

  • You should know: Bucket inventory + discovery job + EventBridge automation

11. Conclusion

Amazon Macie is not just a checkbox for compliance — it’s a powerful engine for discovering, classifying, and protecting sensitive data in AWS. For security teams, architects, and auditors alike, it provides essential visibility and control.

Getting started is simple, but using Macie effectively requires planning, scoping, and integration. With the right setup, Macie can be your automated watchdog, silently scanning and defending your data perimeter.

Stay secure. Stay smart.


0
Subscribe to my newsletter

Read articles from Suman Thallapelly directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Suman Thallapelly
Suman Thallapelly

Hey there! I’m a seasoned Solution Architect with a strong track record of designing and implementing enterprise-grade solutions. I’m passionate about leveraging technology to solve complex business challenges, guiding organizations through digital transformations, and optimizing cloud and enterprise architectures. My journey has been driven by a deep curiosity for emerging technologies and a commitment to continuous learning. On this space, I share insights on cloud computing, enterprise technologies, and modern software architecture. Whether it's deep dives into cloud-native solutions, best practices for scalable systems, or lessons from real-world implementations, my goal is to make complex topics approachable and actionable. I believe in fostering a culture of knowledge-sharing and collaboration to help professionals navigate the evolving tech landscape. Beyond work, I love exploring new frameworks, experimenting with side projects, and engaging with the tech community. Writing is my way of giving back—breaking down intricate concepts, sharing practical solutions, and sparking meaningful discussions. Let’s connect, exchange ideas, and keep pushing the boundaries of innovation together!