AWS S3 Performance Optimization

JaisonJaison
2 min read

Hello there, continuing with our series, lets look into AWS S3 Performance optimization w.r.t uploads

Single PUT upload

This is the S3 Default upload method

The data is transferred in a single stream to S3

A file becomes an object and is uploaded using the PUT object API and happens in a single stream.

  • The problem: If the stream fails, then the entire upload fails and the upload operation has to restart from beginning

    resulting in wastage of internet bandwidth and time.

  • Whenever anything is downloaded, it is done on multiple streams

  • Single stream of data is not reliable when data is transferred over long distances

  • Speed and reliability are the limitations of a single stream of data.

When the data is transferred over two points the lowest of the speeds are selected.

Data transfer protocols like bit-torrent have been developed for speedy, distributed transfer of data.

If a single PUT upload is used, only 5GB data could be transferred.

The solution – multipart upload – improves speed and reliability, by data into individual parts

Multipart Upload

  • Minimum size for multi-part upload is 100MB

  • A multipart upload can be split into a max of 10000 parts and each part can be of size between 5MB to 5GB

  • The last part is leftover and can be < 5MB

  • Multipart upload is so effective because each part is treated as an individual upload.

If the part fails then only the failed part needs to be restarted. The risk significantly reduces.

The transfer rate of the entire upload is the sum of all individual parts.

S3 Transfer Acceleration

Distributed teams around the world can make use of the public internet to upload data to a bucket in any AWS region and we have no control over the path taken by the data as it can take an indirect path.

Transfer Acceleration uses network of AWS Edge locations

S3 bucket needs to be enabled for transfer acceleration.

By default, its switched off. There are some restrictions for enabling it.

Bucket name cannot contain periods and names should be DNS compatible

So the data is transferred to the nearest AWS edge location and from there the data is transferred over the AWS Global network, which tend to be direct links

The internet is a multipurpose public network built for flexibility and resilience not for speed.The AWS network is built to connect from region to region – much faster and lower latency (delay)

0
Subscribe to my newsletter

Read articles from Jaison directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Jaison
Jaison