Amazon Macie: Protecting Your Sensitive Data

1. Introduction

In today’s cloud-first world, data is your crown jewel—and your greatest liability if not protected properly. From personal identifiable information (PII) to intellectual property, the data you store in AWS must be secured against leaks, breaches, and compliance failures. Enter Amazon Macie.

Amazon Macie is a fully managed data security and data privacy service that uses machine learning (ML) and pattern matching to discover and protect your sensitive data in AWS. It’s purpose-built for identifying sensitive data at scale, especially in Amazon S3, and integrates seamlessly with other AWS services for alerting and remediation.

Whether you’re just getting started in cloud security or preparing for the AWS Security Specialty certification, this blog will walk you through how Macie works, its powerful capabilities, real-world use cases, and how to get the most out of it.

2. How Amazon Macie Works

At its core, Macie continuously scans Amazon S3 buckets to identify and classify sensitive data. It uses pre-trained ML models and pattern matching to detect:

PII: Names, addresses, phone numbers, national IDs
Financial data: Credit card numbers, bank account details
Credentials: Access keys, secrets
Custom data patterns you define

Supported Sources: Currently, Macie only supports scanning Amazon S3. It doesn't work with EBS, RDS, DynamoDB, or other AWS data stores.

Process Overview:

Macie evaluates your S3 inventory for security risks (e.g., unencrypted or publicly accessible buckets).
You define discovery jobs to scan buckets for sensitive data.
Macie classifies the data and generates findings.
Findings can be forwarded to AWS Security Hub, EventBridge, or processed with Lambda.

3. Key Features and Capabilities

S3 Bucket Inventory and Risk Analysis

Macie gives a high-level view of all your S3 buckets, highlighting those with potential risks:

Public access
Unencrypted data
Access control policies

This is your first checkpoint to understand where to focus.

Sensitive Data Discovery Jobs

Discovery jobs are how Macie scans data:

One-time: Great for audits or initial scans.
Recurring: For continuous monitoring.

You can scope jobs by:

Bucket names
Object prefixes (like folders)
Object age (e.g., only files created in the last 90 days)
Tags (e.g., tag sensitive workloads with data:sensitive=true)

Custom and Managed Data Identifiers

Before Macie can detect any sensitive data, you must configure what data types it should look for. This is done through Managed Data Identifiers (MDIs) and Custom Data Identifiers (CDIs).

By default, Macie does not start scanning with any data identifiers after enabling the service. You must create a classification job and explicitly define which MDIs or CDIs to use.

Managed Data Identifiers (MDI)

Managed Data Identifiers (MDIs) are pre-built detection rules provided by AWS. These identifiers use a combination of machine learning, context-based logic, and pattern recognition to find common types of sensitive data like:

Email addresses
Credit card numbers
Social Security numbers (SSNs)
Passport numbers
AWS credentials
IP addresses and MAC addresses

Important: MDIs are not enabled automatically when you enable Macie. You must choose which ones to include during classification job creation.

To include all MDIs in a job using the AWS CLI:

aws macie2 create-classification-job \
  --job-type ONE_TIME \
  --name "FullScanJob" \
  --s3-job-definition 'BucketDefinitions=[{AccountId="123456789012",Buckets=["my-bucket"]}]' \
  --custom-data-identifier-ids [] \
  --managed-data-identifier-ids ALL

Or, to specify a subset:

--managed-data-identifier-ids "CreditCardNumber" "EmailAddress"

These identifiers are backed by machine learning and contextual analysis to reduce false positives. They're regularly updated by AWS to reflect real-world data formats and are ideal for:

Compliance-driven scans (PCI-DSS, HIPAA, GDPR)
Broad coverage of universally sensitive data
Quick deployments when you need fast insights

You can select which managed identifiers to include or exclude in a job, giving you control over scan scope and cost.

Custom Data Identifiers (CDI)

While managed identifiers cover most common data types, there are cases when your organization deals with proprietary or industry-specific data. That’s where custom data identifiers come in.

Custom identifiers allow you to define specific patterns using:

Regular expressions (Regex): Match complex, structured data
Keywords: Additional context to improve match precision
Proximity rules: How close keywords must be to a regex match

Example: Employee ID Custom Identifier

Say your internal Employee ID format is EMP123456. You can create a custom identifier as follows:

{
  "Name": "EmployeeID",
  "Regex": "EMP[0-9]{6}",
  "Keywords": ["employee", "staff"],
  "MaximumMatchDistance": 50
}

Why Use Custom Identifiers?

Detect internal formats like customer account numbers, case IDs, or contract codes
Tighten precision for proprietary data detection
Avoid false positives in noisy datasets

The best practice is to combine both. Start with managed identifiers for wide coverage, and layer in custom identifiers to align Macie to your specific environment and risk profile.

Findings and Alerts

When Macie completes a discovery job and identifies sensitive data or risk indicators, it generates findings. These findings contain rich metadata including:

Data type found (e.g., credit card number, AWS key)
S3 object metadata (name, bucket, region, etc.)
Severity (low/medium/high)
Resource permissions (e.g., public access, cross-account access)

By default, Macie stores all findings in its own dashboard. However, sending those findings to other AWS services requires explicit configuration:

Amazon EventBridge: Auto-enabled
- Macie automatically sends all findings to EventBridge without extra setup.
- You can build custom automation using EventBridge rules and targets (e.g., trigger a Lambda).
AWS Security Hub: Requires manual enablement
- You must explicitly enable integration between Macie and Security Hub in each account/region.
- Once enabled, Macie findings appear in Security Hub alongside GuardDuty, Inspector, and more.
Amazon GuardDuty: Does not ingest Macie findings directly
- There is no native direct integration.
- However, both services can be correlated in Security Hub or via custom automation.

NOTE: Currently, Macie findings are not pushed to services like AWS Config, CloudTrail, AWS Detective (indirect correlation only if using Security Hub)

So, if you need centralized insight and correlation, Security Hub is your best option, and EventBridge is your go-to for automating responses.

Be sure to enable these integrations explicitly where needed for full visibility and automated protection workflows.

Scalability and Multi-account Support

Macie integrates with AWS Organizations to manage multiple accounts.

Use a delegated admin account to manage Macie across org units.
Centralize findings and discovery job configurations.

4. Integration with Broader AWS Security Stack

Macie + EventBridge + Lambda (Automated Remediation)

Step-by-step:

Enable Macie and start a discovery job.
Create a rule in Amazon EventBridge to catch Macie findings:
Trigger a Lambda function that:

Notifies security via SNS
Quarantines the S3 object
Tags the file for review

Example AWS CLI Setup :

## Enable Macie
aws macie2 enable-macie --status ENABLED

aws events put-rule \
  --name "MacieSensitiveDataFound" \
  --event-pattern file://macie-event-pattern.json \
  --region us-east-1

#Add Target
aws events put-targets \
  --rule "MacieSensitiveDataFound" \
  --targets "Id"="1","Arn"="arn:aws:lambda:us-east-1:<account-id>:function:MacieQuarantineLambda"

#Grant Permissions to EventBridge to Invoke Lambda
aws lambda add-permission \
  --function-name MacieQuarantineLambda \
  --statement-id EventBridgeInvoke \
  --action "lambda:InvokeFunction" \
  --principal events.amazonaws.com \
  --source-arn arn:aws:events:us-east-1:<account-id>:rule/MacieSensitiveDataFound

macie-event-pattern.json

{
  "source": ["aws.macie"],
  "detail-type": ["Macie Finding"]
}

lambda_function.py

import boto3
import json

def lambda_handler(event, context):
    s3 = boto3.client('s3')

    # Extract bucket and object key from Macie finding
    detail = event['detail']
    bucket = detail['resourcesAffected']['s3Bucket']['name']
    key = detail['resourcesAffected']['s3Object']['key']

    # Example action: Add a tag to the object for quarantine
    s3.put_object_tagging(
        Bucket=bucket,
        Key=key,
        Tagging={
            'TagSet': [
                {
                    'Key': 'quarantine',
                    'Value': 'true'
                }
            ]
        }
    )
    return {'status': 'tagged', 'bucket': bucket, 'key': key}

Macie + GuardDuty

Macie findings about credentials or sensitive data exposure can be correlated with GuardDuty to detect threats like:

Compromised access keys
Data exfiltration attempts

For example:

Macie detects unencrypted PII in a publicly exposed S3 bucket
GuardDuty simultaneously detects suspicious access to that same bucket from an unusual IP address
Security Hub aggregates both findings to help analysts prioritize response

Macie + Security Hub

Macie findings appear as Security Standards in Security Hub, enabling:

Centralized visibility
Compliance scoring
Cross-service automation

5. Compliance and Governance Use Cases

Macie helps meet compliance for:

GDPR: Right to access, data minimization, and breach reporting
HIPAA: PHI discovery and access control
PCI-DSS: Cardholder data detection
SOC 2: Data security and privacy controls

How it helps:

Keep a record of where sensitive data is stored
Alert on unencrypted or publicly exposed data
Integrate into audits and risk assessments

6. Cost Optimization and Management

Macie is priced by:

S3 object count for inventory
GB scanned for sensitive data

Cost Control Tips:

Filter jobs using object prefixes or age
Use object tags to target sensitive data only
Avoid scanning buckets with logs or non-sensitive data

Example CLI to scan only tagged buckets:

aws macie2 create-classification-job \
  --job-type ONE_TIME \
  --s3-job-definition 'IncludeCriteria={TagValues=[{Key="data",Value="sensitive"}]}' \
  --name "TargetedSensitiveScan"

7. Real-World Use Cases and Scenarios

Preventing Data Leakage in a SaaS Company

A SaaS company stores tenant data in S3. A misconfigured bucket policy exposed it publicly. Macie:

Flagged the bucket as public
Discovered PII data (email, phone numbers)
Sent a finding to EventBridge
Triggered a Lambda that:
- Locked down the bucket policy
- Sent a Slack alert to SecOps

Financial Institution Detecting Secrets in Logs

Logs from various systems were stored in S3. Macie detected AWS Access Keys in raw logs:

Created an alert
Lambda quarantined the file
IAM role was rotated
Finding pushed to Security Hub

Company with EU customers needs to map all PII across S3:

Recurring Macie job scans tagged buckets monthly
Reports sent to DPO for compliance
Alerts for any new unencrypted or public data

8. Hands-On: How to Get Started with Macie

Step 1: Enable Macie

aws macie2 enable-macie --status ENABLED

Step 2: View Your S3 Inventory

aws macie2 list-s3-resources

Step 3: Create a Discovery Job

aws macie2 create-classification-job \
  --job-type ONE_TIME \
  --s3-job-definition 'BucketDefinitions=[{AccountId="123456789012",Buckets=["my-bucket"]}]' \
  --name "InitialSensitiveScan"

Step 4: Review Findings

aws macie2 list-findings

9. Best Practices and Common Pitfalls

Tag data at source: Helps target scanning jobs
Use custom identifiers wisely: Avoid overly broad regexes
Monitor costs: Don’t scan unnecessary buckets
Review false positives: Tune identifiers based on feedback
Limit access: Use IAM conditions to restrict who can view Macie findings

10. Macie in AWS Security Specialty Certification

Macie is part of the Domain 4: Data Protection in the AWS Security Specialty exam.

Key topics:

How Macie identifies PII in S3
Integration with other services
Role in compliance strategy
Types of findings

Sample Scenario:
"You are notified that sensitive data may be publicly accessible. How can Macie help in this case?"

You should know: Bucket inventory + discovery job + EventBridge automation

11. Conclusion

Amazon Macie is not just a checkbox for compliance — it’s a powerful engine for discovering, classifying, and protecting sensitive data in AWS. For security teams, architects, and auditors alike, it provides essential visibility and control.

Getting started is simple, but using Macie effectively requires planning, scoping, and integration. With the right setup, Macie can be your automated watchdog, silently scanning and defending your data perimeter.

Stay secure. Stay smart.

Mastering AWS Security - Post 5: Amazon Macie – Classify and Protect Sensitive Data

1. Introduction

2. How Amazon Macie Works

Process Overview:

3. Key Features and Capabilities

S3 Bucket Inventory and Risk Analysis

Sensitive Data Discovery Jobs

Custom and Managed Data Identifiers

Managed Data Identifiers (MDI)

Custom Data Identifiers (CDI)

Example: Employee ID Custom Identifier

Why Use Custom Identifiers?

Findings and Alerts

Scalability and Multi-account Support

4. Integration with Broader AWS Security Stack

Macie + EventBridge + Lambda (Automated Remediation)

Macie + GuardDuty

Macie + Security Hub

5. Compliance and Governance Use Cases

6. Cost Optimization and Management

7. Real-World Use Cases and Scenarios

Preventing Data Leakage in a SaaS Company

Financial Institution Detecting Secrets in Logs

8. Hands-On: How to Get Started with Macie

Step 1: Enable Macie

Step 2: View Your S3 Inventory

Step 3: Create a Discovery Job

Step 4: Review Findings

9. Best Practices and Common Pitfalls

10. Macie in AWS Security Specialty Certification

Key topics:

11. Conclusion

Subscribe to my newsletter

Suman Thallapelly

Suman Thallapelly

Mastering AWS Security - Post 5: Amazon Macie – Classify and Protect Sensitive Data

1. Introduction

2. How Amazon Macie Works

Process Overview:

3. Key Features and Capabilities

S3 Bucket Inventory and Risk Analysis

Sensitive Data Discovery Jobs

Custom and Managed Data Identifiers

Managed Data Identifiers (MDI)

Custom Data Identifiers (CDI)

Example: Employee ID Custom Identifier

Why Use Custom Identifiers?

Findings and Alerts

Scalability and Multi-account Support

4. Integration with Broader AWS Security Stack

Macie + EventBridge + Lambda (Automated Remediation)

Macie + GuardDuty

Macie + Security Hub

5. Compliance and Governance Use Cases

6. Cost Optimization and Management

7. Real-World Use Cases and Scenarios

Preventing Data Leakage in a SaaS Company

Financial Institution Detecting Secrets in Logs

Global Enterprise with GDPR Obligations

8. Hands-On: How to Get Started with Macie

Step 1: Enable Macie

Step 2: View Your S3 Inventory

Step 3: Create a Discovery Job

Step 4: Review Findings

9. Best Practices and Common Pitfalls

10. Macie in AWS Security Specialty Certification

Key topics:

11. Conclusion

Subscribe to my newsletter

Suman Thallapelly

Suman Thallapelly