Mastering AWS Security - Post 5: Amazon Macie – Classify and Protect Sensitive Data


1. Introduction
In today’s cloud-first world, data is your crown jewel—and your greatest liability if not protected properly. From personal identifiable information (PII) to intellectual property, the data you store in AWS must be secured against leaks, breaches, and compliance failures. Enter Amazon Macie.
Amazon Macie is a fully managed data security and data privacy service that uses machine learning (ML) and pattern matching to discover and protect your sensitive data in AWS. It’s purpose-built for identifying sensitive data at scale, especially in Amazon S3, and integrates seamlessly with other AWS services for alerting and remediation.
Whether you’re just getting started in cloud security or preparing for the AWS Security Specialty certification, this blog will walk you through how Macie works, its powerful capabilities, real-world use cases, and how to get the most out of it.
2. How Amazon Macie Works
At its core, Macie continuously scans Amazon S3 buckets to identify and classify sensitive data. It uses pre-trained ML models and pattern matching to detect:
PII: Names, addresses, phone numbers, national IDs
Financial data: Credit card numbers, bank account details
Credentials: Access keys, secrets
Custom data patterns you define
Supported Sources: Currently, Macie only supports scanning Amazon S3. It doesn't work with EBS, RDS, DynamoDB, or other AWS data stores.
Process Overview:
Macie evaluates your S3 inventory for security risks (e.g., unencrypted or publicly accessible buckets).
You define discovery jobs to scan buckets for sensitive data.
Macie classifies the data and generates findings.
Findings can be forwarded to AWS Security Hub, EventBridge, or processed with Lambda.
3. Key Features and Capabilities
S3 Bucket Inventory and Risk Analysis
Macie gives a high-level view of all your S3 buckets, highlighting those with potential risks:
Public access
Unencrypted data
Access control policies
This is your first checkpoint to understand where to focus.
Sensitive Data Discovery Jobs
Discovery jobs are how Macie scans data:
One-time: Great for audits or initial scans.
Recurring: For continuous monitoring.
You can scope jobs by:
Bucket names
Object prefixes (like folders)
Object age (e.g., only files created in the last 90 days)
Tags (e.g., tag sensitive workloads with
data:sensitive=true
)
Custom and Managed Data Identifiers
Before Macie can detect any sensitive data, you must configure what data types it should look for. This is done through Managed Data Identifiers (MDIs) and Custom Data Identifiers (CDIs).
By default, Macie does not start scanning with any data identifiers after enabling the service. You must create a classification job and explicitly define which MDIs or CDIs to use.
Managed Data Identifiers (MDI)
Managed Data Identifiers (MDIs) are pre-built detection rules provided by AWS. These identifiers use a combination of machine learning, context-based logic, and pattern recognition to find common types of sensitive data like:
Email addresses
Credit card numbers
Social Security numbers (SSNs)
Passport numbers
AWS credentials
IP addresses and MAC addresses
Important: MDIs are not enabled automatically when you enable Macie. You must choose which ones to include during classification job creation.
To include all MDIs in a job using the AWS CLI:
aws macie2 create-classification-job \
--job-type ONE_TIME \
--name "FullScanJob" \
--s3-job-definition 'BucketDefinitions=[{AccountId="123456789012",Buckets=["my-bucket"]}]' \
--custom-data-identifier-ids [] \
--managed-data-identifier-ids ALL
Or, to specify a subset:
--managed-data-identifier-ids "CreditCardNumber" "EmailAddress"
These identifiers are backed by machine learning and contextual analysis to reduce false positives. They're regularly updated by AWS to reflect real-world data formats and are ideal for:
Compliance-driven scans (PCI-DSS, HIPAA, GDPR)
Broad coverage of universally sensitive data
Quick deployments when you need fast insights
You can select which managed identifiers to include or exclude in a job, giving you control over scan scope and cost.
Custom Data Identifiers (CDI)
While managed identifiers cover most common data types, there are cases when your organization deals with proprietary or industry-specific data. That’s where custom data identifiers come in.
Custom identifiers allow you to define specific patterns using:
Regular expressions (Regex): Match complex, structured data
Keywords: Additional context to improve match precision
Proximity rules: How close keywords must be to a regex match
Example: Employee ID Custom Identifier
Say your internal Employee ID format is EMP123456
. You can create a custom identifier as follows:
{
"Name": "EmployeeID",
"Regex": "EMP[0-9]{6}",
"Keywords": ["employee", "staff"],
"MaximumMatchDistance": 50
}
Why Use Custom Identifiers?
Detect internal formats like customer account numbers, case IDs, or contract codes
Tighten precision for proprietary data detection
Avoid false positives in noisy datasets
The best practice is to combine both. Start with managed identifiers for wide coverage, and layer in custom identifiers to align Macie to your specific environment and risk profile.
Findings and Alerts
When Macie completes a discovery job and identifies sensitive data or risk indicators, it generates findings. These findings contain rich metadata including:
Data type found (e.g., credit card number, AWS key)
S3 object metadata (name, bucket, region, etc.)
Severity (low/medium/high)
Resource permissions (e.g., public access, cross-account access)
By default, Macie stores all findings in its own dashboard. However, sending those findings to other AWS services requires explicit configuration:
Amazon EventBridge: Auto-enabled
Macie automatically sends all findings to EventBridge without extra setup.
You can build custom automation using EventBridge rules and targets (e.g., trigger a Lambda).
AWS Security Hub: Requires manual enablement
You must explicitly enable integration between Macie and Security Hub in each account/region.
Once enabled, Macie findings appear in Security Hub alongside GuardDuty, Inspector, and more.
Amazon GuardDuty: Does not ingest Macie findings directly
There is no native direct integration.
However, both services can be correlated in Security Hub or via custom automation.
NOTE: Currently, Macie findings are not pushed to services like AWS Config, CloudTrail, AWS Detective (indirect correlation only if using Security Hub)
So, if you need centralized insight and correlation, Security Hub is your best option, and EventBridge is your go-to for automating responses.
Be sure to enable these integrations explicitly where needed for full visibility and automated protection workflows.
Scalability and Multi-account Support
Macie integrates with AWS Organizations to manage multiple accounts.
Use a delegated admin account to manage Macie across org units.
Centralize findings and discovery job configurations.
4. Integration with Broader AWS Security Stack
Macie + EventBridge + Lambda (Automated Remediation)
Step-by-step:
Enable Macie and start a discovery job.
Create a rule in Amazon EventBridge to catch Macie findings:
Trigger a Lambda function that:
Notifies security via SNS
Quarantines the S3 object
Tags the file for review
Example AWS CLI Setup :
## Enable Macie
aws macie2 enable-macie --status ENABLED
aws events put-rule \
--name "MacieSensitiveDataFound" \
--event-pattern file://macie-event-pattern.json \
--region us-east-1
#Add Target
aws events put-targets \
--rule "MacieSensitiveDataFound" \
--targets "Id"="1","Arn"="arn:aws:lambda:us-east-1:<account-id>:function:MacieQuarantineLambda"
#Grant Permissions to EventBridge to Invoke Lambda
aws lambda add-permission \
--function-name MacieQuarantineLambda \
--statement-id EventBridgeInvoke \
--action "lambda:InvokeFunction" \
--principal events.amazonaws.com \
--source-arn arn:aws:events:us-east-1:<account-id>:rule/MacieSensitiveDataFound
macie-event-pattern.json
{
"source": ["aws.macie"],
"detail-type": ["Macie Finding"]
}
lambda_function.py
import boto3
import json
def lambda_handler(event, context):
s3 = boto3.client('s3')
# Extract bucket and object key from Macie finding
detail = event['detail']
bucket = detail['resourcesAffected']['s3Bucket']['name']
key = detail['resourcesAffected']['s3Object']['key']
# Example action: Add a tag to the object for quarantine
s3.put_object_tagging(
Bucket=bucket,
Key=key,
Tagging={
'TagSet': [
{
'Key': 'quarantine',
'Value': 'true'
}
]
}
)
return {'status': 'tagged', 'bucket': bucket, 'key': key}
Macie + GuardDuty
Macie findings about credentials or sensitive data exposure can be correlated with GuardDuty to detect threats like:
Compromised access keys
Data exfiltration attempts
For example:
Macie detects unencrypted PII in a publicly exposed S3 bucket
GuardDuty simultaneously detects suspicious access to that same bucket from an unusual IP address
Security Hub aggregates both findings to help analysts prioritize response
Macie + Security Hub
Macie findings appear as Security Standards in Security Hub, enabling:
Centralized visibility
Compliance scoring
Cross-service automation
5. Compliance and Governance Use Cases
Macie helps meet compliance for:
GDPR: Right to access, data minimization, and breach reporting
HIPAA: PHI discovery and access control
PCI-DSS: Cardholder data detection
SOC 2: Data security and privacy controls
How it helps:
Keep a record of where sensitive data is stored
Alert on unencrypted or publicly exposed data
Integrate into audits and risk assessments
6. Cost Optimization and Management
Macie is priced by:
S3 object count for inventory
GB scanned for sensitive data
Cost Control Tips:
Filter jobs using object prefixes or age
Use object tags to target sensitive data only
Avoid scanning buckets with logs or non-sensitive data
Example CLI to scan only tagged buckets:
aws macie2 create-classification-job \
--job-type ONE_TIME \
--s3-job-definition 'IncludeCriteria={TagValues=[{Key="data",Value="sensitive"}]}' \
--name "TargetedSensitiveScan"
7. Real-World Use Cases and Scenarios
Preventing Data Leakage in a SaaS Company
A SaaS company stores tenant data in S3. A misconfigured bucket policy exposed it publicly. Macie:
Flagged the bucket as public
Discovered PII data (email, phone numbers)
Sent a finding to EventBridge
Triggered a Lambda that:
Locked down the bucket policy
Sent a Slack alert to SecOps
Financial Institution Detecting Secrets in Logs
Logs from various systems were stored in S3. Macie detected AWS Access Keys in raw logs:
Created an alert
Lambda quarantined the file
IAM role was rotated
Finding pushed to Security Hub
Global Enterprise with GDPR Obligations
Company with EU customers needs to map all PII across S3:
Recurring Macie job scans tagged buckets monthly
Reports sent to DPO for compliance
Alerts for any new unencrypted or public data
8. Hands-On: How to Get Started with Macie
Step 1: Enable Macie
aws macie2 enable-macie --status ENABLED
Step 2: View Your S3 Inventory
aws macie2 list-s3-resources
Step 3: Create a Discovery Job
aws macie2 create-classification-job \
--job-type ONE_TIME \
--s3-job-definition 'BucketDefinitions=[{AccountId="123456789012",Buckets=["my-bucket"]}]' \
--name "InitialSensitiveScan"
Step 4: Review Findings
aws macie2 list-findings
9. Best Practices and Common Pitfalls
Tag data at source: Helps target scanning jobs
Use custom identifiers wisely: Avoid overly broad regexes
Monitor costs: Don’t scan unnecessary buckets
Review false positives: Tune identifiers based on feedback
Limit access: Use IAM conditions to restrict who can view Macie findings
10. Macie in AWS Security Specialty Certification
Macie is part of the Domain 4: Data Protection in the AWS Security Specialty exam.
Key topics:
How Macie identifies PII in S3
Integration with other services
Role in compliance strategy
Types of findings
Sample Scenario:
"You are notified that sensitive data may be publicly accessible. How can Macie help in this case?"
- You should know: Bucket inventory + discovery job + EventBridge automation
11. Conclusion
Amazon Macie is not just a checkbox for compliance — it’s a powerful engine for discovering, classifying, and protecting sensitive data in AWS. For security teams, architects, and auditors alike, it provides essential visibility and control.
Getting started is simple, but using Macie effectively requires planning, scoping, and integration. With the right setup, Macie can be your automated watchdog, silently scanning and defending your data perimeter.
Stay secure. Stay smart.
Subscribe to my newsletter
Read articles from Suman Thallapelly directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Suman Thallapelly
Suman Thallapelly
Hey there! I’m a seasoned Solution Architect with a strong track record of designing and implementing enterprise-grade solutions. I’m passionate about leveraging technology to solve complex business challenges, guiding organizations through digital transformations, and optimizing cloud and enterprise architectures. My journey has been driven by a deep curiosity for emerging technologies and a commitment to continuous learning. On this space, I share insights on cloud computing, enterprise technologies, and modern software architecture. Whether it's deep dives into cloud-native solutions, best practices for scalable systems, or lessons from real-world implementations, my goal is to make complex topics approachable and actionable. I believe in fostering a culture of knowledge-sharing and collaboration to help professionals navigate the evolving tech landscape. Beyond work, I love exploring new frameworks, experimenting with side projects, and engaging with the tech community. Writing is my way of giving back—breaking down intricate concepts, sharing practical solutions, and sparking meaningful discussions. Let’s connect, exchange ideas, and keep pushing the boundaries of innovation together!