Welcome back to the AWS Essentials Series! In this blog post, we'll delve into Amazon Simple Storage Service (Amazon S3), a fundamental component of AWS that provides secure, durable, and highly scalable object storage. Amazon S3 is essential for a wide range of use cases, including data backup, content storage, and serving static websites.

What is Amazon S3?

Amazon S3 (Simple Storage Service) is a cloud-based object storage service offered by AWS that provides highly scalable, secure, and durable storage for objects and files. It is a storage service on which you can store and retrieve files, documents, images, videos, and other types of data. Think of it as having an online hard drive where you can upload and download your files from anywhere, at any time, using an internet connection. Amazon S3 provides a cost-effective solution for data storage and backup that scales automatically as the volume of data increases. It is advertised by AWS as an “infinite scaling” storage solution. S3 is known for its scalability, durability, and security. You can store virtually an unlimited amount of data, and it’s designed to provide 99.999999999% (11 9’s) durability of objects. This means if you store 10,000,000 objects on S3, you can expect to incur on average a loss of a single object once every 10,000 years. This high durability makes your data highly protected and available whenever you need it. Amazon S3 is widely utilized for a variety of use cases, such as backup and disaster recovery, big data analytics, content delivery, data lakes, media hosting, and static website hosting. It is easy to use, and integrates seamlessly with other AWS services, making it a powerful tool for businesses of all sizes.

In Amazon S3, data is stored in containers called buckets and you cannot store any data on S3 without having created a bucket first. Think of a bucket as a folder where you can store and manage your files. Each bucket has a globally unique name (unique across all AWS regions and accounts in the same partition), and you can have multiple buckets to organize your data in different ways. When creating a bucket, two details must be specified; the bucket name and the region in which you want the bucket to be created because buckets are defined at the regional level.

There are naming conventions (rules) for S3 buckets. The naming rules are:

Bucket names must be between 3–63 characters long.
Bucket names can only be made up of lowercase letters, numbers, dots, and hyphens. So bucket names cannot be made up of uppercase letters and underscores.
They must begin and end with a letter or number.
Bucket names cannot be IP addresses.
They must not start with the prefix xn — or end with the suffix -s3alias
And as we have already seen, a bucket name must be globally unique.

I hinted in the introductory paragraph that S3 is a cloud-based object storage service. What this means is that in S3, data is stored as objects. An object is the fundamental unit of storage and it represents a file, which can be any kind of data: a text file, an image, a video, a database backup, or any other type of file. Each object within an S3 bucket is uniquely identified by its key, which is a string that can be up to 1024 characters long. The key serves as the object’s name and also provides a way to organize and retrieve objects within a bucket. Individual objects stored in S3 can have a size of up to 5TB (5,000GB). It is important to note that in S3, objects are stored in a flat structure, meaning that there are no actual folders or directories like in a traditional file system. However, S3 allows you to create a logical hierarchy of objects using prefixes in object keys. For example, if you upload an object with the key “folder/subfolder/object.txt”, S3 will create a prefix “folder/subfolder/” that behaves like a folder or directory. You can use the AWS Management Console, the AWS CLI, or an SDK to create and manage these prefixes as if they were directories.

S3 Security

The security posture of any IT resource or service is very important and it is something that must be taken seriously at any point in time. Amazon S3 has different ways in which it enforces security. You can enforce user-based security by making use of IAM policies or enforce resource-based security by using: bucket policies, object Access Control Lists (ACLs), and bucket Access Control Lists. We are only going to talk about S3 bucket policies because it is the most widely used.

S3 Bucket Policies

S3 bucket policies are JSON-based documents that specify permissions for S3 buckets and their contents (i.e. the objects within them). They allow you to define a set of rules that allow or deny access to your S3 bucket and its contents for specific users, groups, or even public access. They can be used to control various actions that can be performed on an S3 bucket, such as read, write, or delete actions. These policies can be applied at the bucket level or the object level within a bucket, giving you granular control over access to your data. Below is an example of a sample S3 bucket policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::examplebucket/*"
        }
    ]
}

The policy above allows anyone to retrieve objects (files) from the “examplebucket” S3 bucket. The Principal field is set to "*" to indicate that any AWS account or user can perform the s3

action. The Resource field specifies the ARN (Amazon Resource Name) of the bucket and its objects, and the Sid field provides a name for the policy statement to help identify it.

S3 Versioning

It is a feature of Amazon S3 that allows you to keep multiple versions of an object in the same bucket. Versioning is a setting that is enabled at the bucket level. When versioning is enabled for a bucket, any object uploaded to that bucket will have a unique version ID associated with it. You can upload a new version of an object by simply uploading a file with the same key (name) as the existing object. Each version of the object will have a unique version ID and you can access and manage all versions of an object using the S3 API or the AWS Management Console. If versioning is enabled after an object is uploaded, that version of the object (i.e. the object uploaded before versioning was enabled) will have a version ID of “null”. It is best practice to enable versioning for your S3 buckets as it helps prevent unintended deletes by giving you the ability to restore deleted objects and also allows you to roll back to a previous version of an object. An S3 bucket can be in one of three states at any given point; unversioned (the default), version-enabled, or version-suspended and once you version-enable a bucket, it can never return to an unversioned state. You can however suspend versioning.

Note: Enabling versioning may increase storage costs, as each version of an object is stored as a separate object in the bucket. So keep this in mind when enabling versioning.

S3 Replication

It is an S3 feature that allows you to automatically replicate objects in one S3 bucket to another bucket in the same or a different AWS region. Replication helps improve the durability and availability of data and also allows you to comply with regulatory requirements for data replication and disaster recovery. To carry out S3 replication, versioning must be enabled on both the source and destination S3 buckets because replication relies on the presence of version IDs to determine which objects have changed and need to be replicated. S3 supports both same-region replication (SRR) and cross-region replication (CRR).

Cross-Region Replication is used to replicate objects automatically and asynchronously across different AWS regions. With cross-region replication, you can create a replica of your S3 objects in a different region than the source bucket for data redundancy, disaster recovery, and lower latency access to data.
Same-Region Replication allows you to automatically and asynchronously replicate objects between buckets that reside in the same AWS region. SRR helps improve the durability and availability of data by creating an exact copy of objects in a different bucket within the same region. It can be useful for a variety of use cases, such as creating backups for disaster recovery purposes, distributing content to multiple locations for better access times, or complying with regulations that require data redundancy.

Here are some key points to take note of about replication:

After replication is enabled, only new objects uploaded to the bucket are replicated. If you would like to replicate existing objects, use S3 Batch Replication.
There is no chaining of replication. This means that replication from one source bucket to a destination bucket cannot be further replicated to another bucket. In other words, if replication is enabled on a source bucket, it can only replicate to one destination bucket and cannot replicate to any other bucket beyond its initial destination.

S3 Storage Classes

An S3 storage class refers to a way of categorizing S3 into different types of data storage based on usage patterns, access frequency, durability, and availability requirements. Amazon S3 offers a range of storage classes to help customers optimize their costs and performance based on the access patterns of their data. These storage classes include S3 Standard, S3 Standard Infrequent Access (IA), S3 One Zone-Infrequent Access, S3 Intelligent Tiering, S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, and S3 Glacier Deep Archive. We are now going to explore each of these storage classes in detail as well as look at some use cases for each one of them.

S3 Standard: This is the default storage class for S3. It is designed for frequently accessed data that requires low latency and high throughput. S3 Standard is ideal for use cases such as web applications, content distribution, and data analytics.
S3 Standard Infrequent Access (IA): This storage class is designed for data that is accessed less frequently but still requires rapid access when needed. S3 Standard-IA is ideal for use cases such as backups, disaster recovery, and long-term storage of data that is infrequently accessed but must be retrieved quickly when needed.
S3 One Zone-Infrequent Access: This storage class is similar to S3 Standard-IA but stores data in a single Availability Zone (AZ) instead of multiple AZs. This makes it a lower-cost option for infrequently accessed data that does not require the same level of durability and availability as S3 Standard-IA. S3 One Zone-IA is ideal for use cases such as storing secondary backups, data that can be easily re-created, or data that does not need to be highly durable.
S3 Intelligent Tiering: This storage class automatically moves data between two access tiers: frequent access and infrequent access based on changing access patterns. S3 Intelligent-Tiering is ideal for use cases where data access patterns are unpredictable and changing, such as data lakes, data analytics, and machine learning.
S3 Glacier Instant Retrieval: This storage class is designed for data that is rarely accessed but requires immediate access when needed. S3 Glacier Instant Retrieval is ideal for use cases such as long-term storage of data that is rarely accessed but must be retrieved quickly when needed.
S3 Glacier Flexible Retrieval: This storage class is designed for data that is rarely accessed and can tolerate retrieval times of minutes to hours. S3 Glacier Flexible Retrieval is ideal for use cases such as archival storage, regulatory compliance, and long-term storage of data that is rarely accessed but must be retained for long periods.
S3 Glacier Deep Archive: This storage class is designed for data that is rarely accessed and can tolerate retrieval times of 12 hours or more. S3 Glacier Deep Archive is the lowest-cost storage option for long-term retention of data that is rarely accessed but must be retained for regulatory or compliance purposes.

S3 Performance Optimization

Amazon S3 is designed to provide high performance, but there are some best practices that can help you optimize the performance of your S3 storage. These include:

Use of Multipart Upload: Multipart upload is a feature that allows you to upload large objects in smaller parts, which can be uploaded in parallel to improve upload performance. This is especially useful for uploading large files or data sets.
Prefixing: Using a logical hierarchy of prefixes in your object keys can help distribute load evenly across S3 partitions, which can improve performance for read and write operations.
Request Rate Optimization: Amazon S3 automatically scales to handle high request rates, but there are some best practices you can follow to optimize request rates, such as spreading read and write requests evenly across your S3 buckets and using random or hashed prefixes for object keys to distribute load evenly.
Use of Amazon CloudFront: Amazon CloudFront is a content delivery network (CDN) that can be used to cache and deliver content stored in S3 to users around the world with low latency and high throughput. Using CloudFront with S3 can help improve the performance of content delivery for web applications and other use cases.

Frequently Asked Questions (FAQ)

1. What is Amazon S3?

Amazon S3 (Simple Storage Service) is a cloud-based object storage service provided by AWS. It is highly scalable, durable, and secure, allowing you to store and retrieve any amount of data from anywhere on the web.

2. What is a bucket in Amazon S3?

A bucket is a container in Amazon S3 where data is stored. Each bucket has a globally unique name, and you can create multiple buckets to organize your data.

3. How is data stored in S3?

Data is stored as objects within buckets. Each object consists of the data itself, metadata, and a unique identifier (key).

4. What is S3 Versioning?

S3 Versioning is a feature that allows you to keep multiple versions of an object in the same bucket. It helps protect against accidental deletions and overwrites.

5. What are S3 Storage Classes?

S3 Storage Classes are different types of data storage offered by S3, optimized for different use cases such as frequent access, infrequent access, and archival storage.

6. How does S3 Replication work?

S3 Replication automatically replicates objects from one bucket to another within the same or different AWS region. It helps improve data durability and availability.

7. What is S3 Intelligent-Tiering?

S3 Intelligent-Tiering is a storage class that automatically moves data between frequent and infrequent access tiers based on changing access patterns, optimizing storage costs.

8. How can I optimize S3 performance?

You can optimize S3 performance by using Multipart Upload for large files, using logical prefixes in object keys, optimizing request rates, and integrating S3 with Amazon CloudFront for content delivery.

9. Is my data secure in S3?

Yes, Amazon S3 provides multiple security features, including IAM policies, bucket policies, ACLs, encryption at rest and in transit, and more to ensure data security.

10. Can I host a static website on S3?

Yes, you can host a static website on S3 by enabling static website hosting for your bucket and uploading your website files.

AWS Essentials 3: Exploring Amazon S3 - Simple Storage Service

Table of contents