Tagging, Optimizing, and Managing Docker Images in Amazon ECR


Hello everyone!
Are images piling up in your ECR repositories? Is storage reaching numbers you're not comfortable with? Don't worry! Let's look at some tips that can help us keep our Docker images under control. This post is aimed at people who are starting with Docker around image generation and AWS's great service, ECR. If you're already experienced, you probably know much of what you'll read here.
What is Docker?
Docker presents itself as a platform designed to help developers build, share, and run containerized applications. Basically, it helps mitigate that famous phrase we've all heard at some point: "It works on my machine" - sounds familiar?
What is ECR?
Amazon presents ECR as a secure, scalable, and reliable container image registry service managed by AWS. The best part is that it supports both public and private repositories and supports Docker images, Open Container Initiative, and OCI-compatible artifacts. It's like having your own DockerHub but with AWS's security and scalability!
Why is this important?
When using cloud services, we must be aware of their costs (nobody wants surprises in their bill!). In Amazon ECR's case, the cost is associated with incoming and outgoing image transfers, as well as their size. Therefore, it's important to consider this in the projection and planning of our projects.
And this is where optimization plays a super important role. Let's see how we can do it!
Optimization Tips
Let's use lightweight and specific base images, minimalist versions like alpine or slim to reduce image size. Always making sure they are official images.
Minimize the number of layers whenever possible:
Combine instructions to reduce the number of layers created:
RUN apt-get update && apt-get install -y \ curl
Avoid separating redundant instructions:
RUN apt-get update RUN apt-get install -y curl
Instead of the above, combine them into a single line.
Use
.dockerignore
to reduce the build context, speeding up builds and excluding unnecessary files:.git node_modules
Don't forget to set the working directory to avoid problems with relative paths:
WORKDIR /app
Use
COPY
instructions instead ofADD
, as it is more predictable and secure. Only useADD
if you need to extract a compressed file or download something from a URL:COPY . .
Remove unnecessary dependencies. Install and remove packages within the same layer to avoid incorporating them in the final image.
Properly tag the image, using
latest
for the most recent version along with semantic versioning (Learn more here).Only expose what's necessary (e.g., expose only the ports you need):
EXPOSE 4002
Use tools like
trivy
,snyk
, and thedocker scan
command to detect security issues. You can also leverage AWS:aws ecr start-image-scan \ --repository-name my-ecr-repo \ --image-id imageDigest=sha256:5a587965e4428d4fe318113e402d25145db6c261eb3a64ec13dbe186367ebf8b
The output should look something like:
{ "registryId": "012345678910", "repositoryName": "my-ecr-repo", "imageId": { "imageDigest": "sha256:5a587965e4428d4fe318113e402d25145db6c261eb3a64ec13dbe186367ebf8b" }, "imageScanStatus": { "status": "IN_PROGRESS" } }
Then check the results with:
aws ecr describe-image-scan-findings \ --repository-name my-ecr-repo \ --image-id imageDigest=sha256:5a587965e4428d4fe318113e402d25145db6c261eb3a64ec13dbe186367ebf8b
This will provide detailed security analysis results, including detected vulnerabilities, severity, and recommendations.
What to do with the scan results?
The security scan results are pure gold for keeping our images secure. Here's what to do with them:
Identify critical vulnerabilities and prioritize their solution.
Update your Dockerfile dependencies to secure versions.
Integrate scanning into your CI/CD pipeline to prevent deployments with vulnerabilities.
Let's Automate Tagging!
Now, we mentioned tagging, what else can we do? As I always say: why not invest a bit more time in automating something that would take us less than a minute? It's not a very complex script, but by including it in our continuous deployment pipelines, we can tag our images without worries. Automation is your friend!
#!/bin/sh
# Exit immediately if a command exits with a non-zero status
set -e
# Trap errors for better debugging
trap 'echo "Error occurred at line $LINENO. Aborting."; exit 1;' ERR
# Ensure required tools are available
command -v jq >/dev/null 2>&1 || { echo "jq is required but not installed. Aborting." >&2; exit 1; }
command -v aws >/dev/null 2>&1 || { echo "AWS CLI is required but not installed. Aborting." >&2; exit 1; }
command -v docker >/dev/null 2>&1 || { echo "Docker is required but not installed. Aborting." >&2; exit 1; }
# Read the version property from package.json
VERSION=$(jq -r '.version' package.json)
[ -z "$VERSION" ] && { echo "Version could not be retrieved from package.json. Aborting."; exit 1; }
# Define environment variable (change this to match your deployment environment)
ENVIRONMENT="staging"
# Declare variables for tagging the image
REGION="my-region"
ECR_REGISTRY="111111111.dkr.ecr.$REGION.amazonaws.com"
IMAGE_NAME="my-ecr-image"
# Log in to Amazon ECR
echo "Logging in to Amazon ECR..."
aws ecr get-login-password --region "$REGION" | docker login --username AWS --password-stdin "$ECR_REGISTRY"
# Build the Docker image
echo "Building Docker image..."
docker build -t "$IMAGE_NAME" .
# Tag and push the Docker image with all tags
for TAG in latest "$ENVIRONMENT" "$VERSION"; do
FULL_TAG="$ECR_REGISTRY/$IMAGE_NAME:$TAG"
echo "Tagging image as $FULL_TAG"
docker tag "$IMAGE_NAME" "$FULL_TAG"
echo "Pushing image $FULL_TAG to ECR..."
docker push "$FULL_TAG"
done
echo "Docker image successfully pushed to ECR."
Version Control and Lifecycle
Now, let's imagine the following scenario (quite common, by the way):
Given our current git flow, when we incorporate changes to our develop branches, the images will be tagged and uploaded to ECR. And so on... the result? If we don't have adequate policies, our repository will have n versions of our project (versions that have probably gone through breaking changes or maybe not).
But don't worry! Implementing a lifecycle for images in Amazon ECR is easier than it seems. Let's see how to do it with Terraform:
resource "aws_ecr_repository" "api" {
name = "${var.app_name}"
}
resource "aws_ecr_lifecycle_policy" "api_lifecycle_policy" {
repository = aws_ecr_repository.api.name
policy = jsonencode({
rules = [
{
rulePriority = 1
description = "Retain only last 5 images"
selection = {
tagStatus = "tagged"
countType = "imageCountMoreThan"
countNumber = 10
}
action = {
type = "expire"
}
}
]
})
}
Note: While this part only shows how to create a new resource, we can import existing resources using Terraform from version v1.5.0 or using the terraform CLI.
Conclusion
We've reached the end! Optimizing Docker images in ECR is not just a good practice, it's a necessity to keep our services efficient and costs under control. With the tips we saw today, we can:
Significantly reduce storage and transfer costs.
Improve the security of our images (super important!).
Automate those repetitive processes that bore us so much.
Maintain a clean and organized repository (your future self will thank you).
Additional Resources
Want to dive deeper? Here are some super useful resources!
Have you tried any of these tips before? Do you have any other tricks up your sleeve you'd like to share?
Share it in the comments! I'd love to hear about your experiences and learn from them.
Until the next post, let's keep coding and learning together!
Subscribe to my newsletter
Read articles from Julissa Rodriguez directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
