Data Engineering — AWS Data & Analytics — Storage Classes in Amazon S3
Amazon S3 is an object storage service which is used in all kind of applications like big data processing (building data lakes), cloud native applications and mobile apps. It is highly durability, high availability and encrypt the data by default.
With cost effective storage classes and easy to manage features, all most every storage solution S3 is used and recommended.
Key features
1. Amazon S3 can store any kind of data whether it is structure, semi structure or unstructured data.
2. The total volume of data and number of objects you can store in Amazon S3 are unlimited. Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB. For objects larger than 100 MB, customers should consider using the multipart upload capability. This will improve the transfer process; however, it also depends upon network bandwidth of the customer.
3. Customers can use a few mechanisms for controlling access to Amazon S3 resources, including AWS Identity and Access Management (IAM) policies, bucket policies, access point policies, access control lists (ACLs), Query String Authentication, Amazon Virtual Private Cloud (Amazon VPC) endpoint policies, service control policies (SCPs) in AWS Organizations, and Amazon S3 Block Public Access.
4. By default, you can create 10,000 access points per Region per account on buckets in your account and cross-account. Unlike S3 buckets, there is no hard limit on the number of access points per AWS account. Visit AWS Service Quotas to request an increase in this quota.
For more details, please see the AWS S3 FAQ document — https://aws.amazon.com/s3/faqs/?nc=sn&loc=7
In this blog we will discuss different storage classes available in S3.
Amazon S3 offers a range of storage classes that user can choose from based on the data access, resiliency, and cost requirements of your workloads. S3 storage classes are purpose-built to provide the lowest cost storage for different access patterns. S3 storage classes are ideal for virtually any use case, including those with demanding performance needs, data residency requirements, unknown or changing access patterns, or archival storage. Each S3 storage class charges a fee to store data and fees to access data. In deciding which S3 storage class best fits your workload, consider the access patterns and retention time of your data to optimize for the lowest total cost over the lifetime of your data.
S3 Storage Classes can be configured at the object level and a single bucket can contain objects stored across all of the storage classes. You can also use S3 Lifecycle policies to automatically transition objects between storage classes without any application changes.
The Amazon S3 storage classes will allow user to reduce the cost of data storage and choose the right tier based on data’s characteristics.
Amazon S3 standard
1. For active, frequently access data.
2. Millisecond access
3. S3 Standard is designed for performance-sensitive uses cases, such as data lakes, cloud-native applications, dynamic websites, content distribution, mobile and gaming applications, analytics, and machine learning models.
4. S3 Standard is designed for 99.99% data availability and durability of 99.999999999% of objects across multiple Availability Zones in a given year.
5. User can use S3 Lifecycle policies to control exactly when data is transitioned between S3 Standard and lower costs storage classes without any application changes.
6. The data in S3 standard are replicated >= 3 AZs.
Amazon S3 Intelligent Tiering
1. This storage class is good for use case where data with changing pattern.
2. Millisecond access
3. The data is replicated in >=3 Azs.
4. It automatically moving data to the most cost-effective access tier based on access frequency, without performance impact, retrieval fees, or operational overhead.
5. S3 Intelligent-Tiering delivers milliseconds latency and high throughput performance for frequently, infrequently, and rarely accessed data in the Frequent, Infrequent, and Archive Instant Access tiers.
6. For a small monthly object monitoring and automation charge, S3 Intelligent-Tiering monitors the access patterns and moves the objects automatically from one tier to another.
7. There are no retrieval charges in S3 Intelligent-Tiering, so user won’t see unexpected increases in storage bills when access patterns change.
8. S3 Intelligent-Tiering can be used as default storage class for virtually any workload, especially data lakes, data analytics, machine learning, new applications, and user-generated content.
9. There is no minimum object size for S3 Intelligent-Tiering, but objects smaller than 128KB are not eligible for auto-tiering. These smaller objects may be stored in S3 Intelligent-Tiering, but will always be charged at the Frequent Access tier rates, and are not charged the monitoring and automation charge.
(See the link — https://aws.amazon.com/s3/faqs/?nc=sn&loc=7#S3_Intelligent-Tiering for more details)
Amazon S3 Standard — Infrequent Access
1. Amazon S3 Standard-Infrequent Access (S3 Standard-IA) is an Amazon S3 storage class for data that is accessed less frequently but requires rapid access when needed.
2. S3 Standard-IA offers the high durability, throughput, and low latency of the Amazon S3 Standard storage class, with a low per-GB storage price and per-GB retrieval charge. This combination of low cost and high performance make S3 Standard-IA ideal for long-term storage, backups, and as a data store for disaster recovery.
3. The S3 Standard-IA storage class is set at the object level and can exist in the same bucket as the S3 Standard or S3 One Zone-IA storage classes, allowing you to use S3 Lifecycle policies to automatically transition objects between storage classes without any application changes.
Amazon S3 One Zone — Infrequent Access
1. The S3 one zone IA is suitable for use cases for re-creatable and less accessed data.
2. The data is residing in one zone. Because S3 One Zone-IA stores data in a single AWS Availability Zone, data stored in this storage class will be lost in the event of Availability Zone destruction.
3. Same low latency and high throughput performance of S3 Standard
4. S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes
Amazon S3 Glacier Instant Retrieval
1. Amazon S3 Glacier Instant Retrieval is an archive storage class that delivers the lowest-cost storage for long-lived data that is rarely accessed and requires retrieval in milliseconds.
2. With S3 Glacier Instant Retrieval, user can save up to 68% on storage costs compared to using the S3 Standard-Infrequent Access (S3 Standard-IA) storage class, when data is accessed once per quarter.
3. S3 Glacier Instant Retrieval delivers the fastest access to archive storage, with the same throughput and milliseconds access as the S3 Standard and S3 Standard-IA storage classes.
4. S3 Glacier Instant Retrieval is ideal for archive data that needs immediate access, such as medical images, news media assets, or user-generated content archives.
5. User can upload objects directly to S3 Glacier Instant Retrieval, or use S3 Lifecycle policies to transfer data from the S3 storage classes.
Amazon S3 Glacier Flexible Retrieval (Formerly S3 Glacier)
1. S3 Glacier Flexible Retrieval delivers low-cost storage, up to 10% lower cost (than S3 Glacier Instant Retrieval), for archive data that is accessed 1–2 times per year and is retrieved asynchronously.
2. For archive data that does not require immediate access but needs the flexibility to retrieve large sets of data at no cost, such as backup or disaster recovery use cases, S3 Glacier Flexible Retrieval (formerly S3 Glacier) is the ideal storage class.
3. S3 Glacier Flexible Retrieval delivers the most flexible retrieval options that balance cost with access times ranging from minutes to hours and with free bulk retrievals.
4. This is an ideal solution for backup, disaster recovery, offsite data storage needs, and for when some data occasionally need to be retrieved in minutes, and user don’t want to worry about costs.
5. S3 Glacier Flexible Retrieval is designed for 99.999999999% (11 9s) of data durability and 99.99% availability by redundantly storing data across multiple physically separated AWS Availability Zones in a given year
6. S3 PUT API for direct uploads to S3 Glacier Flexible Retrieval, and S3 Lifecycle management for automatic migration of objects
Amazon S3 Glacier Deep Archive
1. S3 Glacier Deep Archive is Amazon S3’s lowest-cost storage class and supports long-term retention and digital preservation for data that may be accessed once or twice in a year.
2. This is designed for customers — particularly those in highly-regulated industries, such as financial services, healthcare, and public sectors — that retain data sets for 7–10 years or longer to meet regulatory compliance requirements.
3. S3 Glacier Deep Archive can also be used for backup and disaster recovery use cases, and is a cost-effective and easy-to-manage alternative to magnetic tape systems, whether they are on-premises libraries or off-premises services.
4. S3 Glacier Deep Archive complements Amazon S3 Glacier, which is ideal for archives where data is regularly retrieved and some of the data may be needed in minutes.
5. All objects stored in S3 Glacier Deep Archive are replicated and stored across at least three geographically dispersed Availability Zones, protected by 99.999999999% of durability, and can be restored within 12 hours
6. S3 PUT API for direct uploads to S3 Glacier Deep Archive, and S3 Lifecycle management for automatic migration of objects
S3 on Outposts
1. Amazon S3 on Outposts delivers object storage to on-premises AWS Outposts environment.
2. Using the S3 APIs and features available in AWS Regions today, S3 on Outposts makes it easy to store and retrieve data on organization Outpost, as well as secure the data, control access, tag, and report on it.
3. S3 on Outposts provides a single Amazon S3 storage class, named ‘OUTPOSTS’, which uses the S3 APIs, and is designed to durably and redundantly store data across multiple devices and servers on Outposts.
4. The S3 Outposts storage class is ideal for workloads with local data residency requirements, and to satisfy demanding performance needs by keeping data close to on-premises applications.
5. Transfer data to AWS Regions using AWS DataSync
6. It supports S3 Lifecycle expiration actions
More details on different storage classes of S3 can be found — https://aws.amazon.com/s3/storage-classes/