Simple Storage Service (S3)
Table of contents
Amazon Simple Storage Service (S3) lets you store and retrieve unlimited amounts of data from anywhere in the world at any time. You can use S3 to store any kind of file. Although AWS calls it storage for the internet
you can implement access controls and encryption to restrict access to your files to specific individuals and IP addresses. S3 opens up a variety of uses, both inside and outside of AWS. Many AWS services use S3 to store logs or retrieve data for processing, such as with analytics. You can even use S3 to host static websites!
Objects and Buckets
S3 differs from Elastic Block Store (EBS) The Core Compute Services. Rather than storing blocks of raw data, S3 stores files, or, as AWS calls them, objects on disks in AWS data centers. You can store any kind of fi le, including text, images, videos, database files, and so on. Each object can be up to 5 TB
in size. The file name of an object is called its key and can be up to 1,024 bytes
long.
An object key can consist of alphanumeric characters and some special characters, including the following:
! - _ . * ( )
S3 stores objects in a container called a bucket that’s essentially a flat file system. A bucket can store an unlimited number
of objects. When you create a bucket, you must assign it a name between 3 and 63 characters long, and the name must be globally unique across AWS. Within a bucket, each object key must be unique. One reason for this is to make it easier to access objects using an S3 endpoint URL.
For example, the object text.txt stored in a bucket named benpiper would have a URL of https://benpiper.s3.amazonaws.com/text.txt .
Although bucket names must be globally unique, each bucket—and by extension any objects in that bucket—can exist in only one region. This helps with latency, security, cost, and compliance requirements, as your data is stored only in the region in which you create the bucket. You can use cross-region replication to copy the contents of an object from a bucket in one region to a bucket in another, but the objects are still uniquely separate objects. AWS never moves objects between regions.
Even though S3 functions as a fl at fi le system, you can organize objects into a folder structure by including a forward slash ( / ) delimiter in the name. For example, you could create an object with the name production/database.sql and another object named development/database.sql in the same bucket. Because the delimiter is part of the name, the objects are unique.
As you might expect, S3 comes with a cost. You’re not charged for uploading data to S3, but you may be charged when you download data. For example, in the US East region, you can download up to 1 GB of data per month at no charge. Beyond that, the rate is $0.09 or less per gigabyte. S3 also charges you a monthly fee based on how much data you store and the storage class you use.
S3 Storage Classes
All data is not created equal. Depending on its importance, certain fi les may require more or less availability or protection against loss than others. Some data you store in S3 is irreplaceable, and losing it would be catastrophic. Examples of this might include digital photos, encryption keys, and medical records.
Durability and Availability
Objects that need to remain intact and free from inadvertent deletion or corruption are said to need high durability, which is the percent likelihood that an object will not be lost over the course of a year. The greater the durability of the storage medium, the less likely you are to lose an object.
To understand how durability works, consider the following two examples. First, suppose you store 100,000,000,000 objects on storage with 99.999999999 percent durability. That means you could expect to lose only one (0.000000001 percent) of those objects over the course of a year! Now consider a different example. Suppose you store the same number of objects on storage with only 99.99 percent durability. You’d stand to lose 10,000,000 objects per year!
Different data also has differing availability requirements. Availability is the percent of time an object will be available for retrieval. For instance, a patient’s medical records may need to be available around the clock, 365 days a year. Such records would need a high degree of both durability and availability.
The level of durability and availability of an object depends on its storage class. S3 offers six different storage classes at different price points. S3 charges you a monthly storage cost based on the amount of data you store and the storage class you use. Table 8.1 gives a complete listing of storage classes.
S3 Storage Classes
With one exception that we’ll cover in the following section, you’ll want the highest degree of durability available. Your choice of storage class then hinges upon your desired level of availability, which depends on how frequently you’ll need to access your files. You may find it helpful to categorize your files in terms of the following three access patterns:
- Frequently accessed objects
- Infrequently accessed objects
- A mixture of frequently and infrequently accessed objects
Storage Classes for Frequently Accessed Objects
If you need to access objects frequently and with minimal latency, the following two storage classes fit the bill:
STANDARD
This is the default storage class. It offers the highest levels of durability and availability, and your objects are always replicated across at least three Availability Zones in a region.
REDUCED_REDUNDANCY
The REDUCED_REDUNDANCY (RRS) storage class is meant for data that can be easily replaced, if it needs to be replaced at all. It has the lowest durability of all the classes, but it has the same availability as STANDARD. AWS recommends against using this storage class but keeps it available for people who have processes that still depend on it. If you see anyone using it, do them a favor and tell them to move to a storage class with higher durability!
Storage Classes for Infrequently Accessed Objects
Two of the storage classes designed for infrequently accessed objects are suffixed with the “IA” initialism for “infrequent access.” These IA classes offer millisecond-latency access and high durability but the lowest availability of all the classes. They’re designed for objects that are at least 128 KB in size. You can store smaller objects, but each will be billed as if it were 128 KB:
STANDARD_IA This class is designed for important data that can’t be re-created. Objects are stored in multiple Availability Zones and have an availability of 99.9 percent.
ONEZONE_IA
Objects stored using this storage class are kept in only one Availability Zone and consequently have the lowest availability of all the classes: only 99.5 percent. An outage of one Availability Zone could affect availability of objects stored in that zone. Although unlikely, the destruction of an Availability Zone could result in the loss of objects stored in that zone. Use this class only for data that you can re-create or have replicated elsewhere.
GLACIER
The GLACIER class is designed for long-term archiving of objects that rarely need to be retrieved. Objects in this storage class are stored using the S3 Glacier service, which you’ll read about later in this chapter. Unlike the other storage classes, you can’t retrieve an object in real time. Instead, you must initiate a restore request for the object and wait until the restore is complete. The time it takes to complete a restore depends on the retrieval option you choose and can range from 1 minute to 12 hours. Consequently, the availability of data stored in Glacier varies. Refer to the “S3 Glacier” section later in this chapter for information on retrieval options.
Storage Class for Both Frequently and Infrequently Accessed Objects
S3 currently offers only one storage class designed for both frequently and infrequently accessed objects:
INTELLIGENT_TIERING
This storage class automatically moves objects to the most cost-effective storage tier based on past access patterns. An object that hasn’t been accessed for 30 consecutive days is moved to the lower-cost infrequent access tier. Once the object is accessed, it’s moved back to the frequent access tier. Note that objects less than 128 KB are always charged at the higher-cost frequent access tier rate. In addition to storage pricing, you’re charged a monthly monitoring and automation fee.
Access Permissions
S3 is storage for the internet, but that doesn’t mean everyone on the internet can read your data. By default, objects you put in S3 are inaccessible to anyone outside of your AWS account.
S3 offers the following three methods of controlling who may read, write, or delete objects stored in your S3 buckets:
- Bucket policies
- User policies
- Bucket and object access control lists
Bucket Policies
A bucket policy is a resource-based policy that you apply to a bucket. You can use bucket policies to grant access to all objects within a bucket or just specific objects. You can also control which principals and accounts can read, write, or delete objects. You can also grant anonymous read access to make an object, such as a webpage or image, available to everyone on the internet.
User Policies
Identity and Access Management (IAM) user policies. You can use these policies to grant IAM principals access to S3 objects. Unlike bucket policies that you apply to a bucket, you can apply user policies only to an IAM principal. Keep in mind that you can’t use user policies to grant public (anonymous) access to an object.
Bucket and Object Access Control Lists
Bucket and object access control lists (ACLs) are legacy access control methods that have mostly been superseded by bucket and user policies. Nevertheless, you can still use bucket and
object ACLs to grant other AWS accounts and anonymous users access to your S3 resources. You can’t use ACLs to grant access to specifi c IAM principals. Due in part to this limitation, AWS recommends using bucket and user policies instead of ACLs whenever possible.
You can use any combination of bucket policies, user policies, and access control lists. They’re not mutually exclusive.
Encryption
S3 doesn’t change the contents of an object when you upload it. That means if you upload a document containing personal information, that document is stored unencrypted. Using appropriate access permissions can protect your data from unauthorized access, but to add an additional layer of security, you have the option of encrypting objects before storing them in S3. This is called encryption at rest . S3 gives you the following two options for encrypting objects at rest:
■✓ Server-side encryption—When you create an object, S3 encrypts the object and saves only the encrypted content. When you retrieve the object, S3 decrypts it and delivers the unencrypted object. Server-side encryption is the easiest to implement and doesn’t require you to keep track of encryption keys. Amazon manages the keys and therefore has access to your objects.
■✓ Client-side encryption—You encrypt the data prior to uploading it to S3. You must decrypt the object when you retrieve it from S3. This option is more complicated, as you’re responsible for encryption and decryption. If you lose the key used to encrypt an object, you won’t be able to decrypt it. Organizations with strict security requirements may choose this option to ensure Amazon doesn’t have the ability to read their encrypted objects.
Versioning
To further protect against accidentally deleting or overwriting the contents of your important fi les, you can use versioning. To understand how versioning works, consider this example. Without versioning, if you upload an object with the same name as an existing object in the same bucket, the contents of the original object will get overwritten. But if you enable versioning on the bucket and then upload an object with the same name as an existing object, S3 will simply create a new version of that object. The original version will remain intact and available.
If you delete an object in a bucket on which versioning is disabled, the contents of the object aren’t deleted. Instead, S3 adds a delete marker to the object and hides it from the S3 service console view.
Versioning is disabled by default when you create a bucket. You must explicitly enable versioning on a bucket to use it, and it applies to all objects in the bucket. There’s no limit to the number of versions of an object you can store. You can delete versions manually or automatically using object life cycle confi gurations.
AWS Storage Classes
- S3 storage classes are used to assist the concurrent loss of data in one or two facilities.
- S3 storage classes maintain the integrity of the data using checksums.
- S3 provides lifecycle management for the automatic migration of objects for cost savings.
S3 contains four types of storage classes:
- S3 Standard
- S3 Standard IA
- S3 one zone-infrequent access
- S3 Glacier
S3 Standard
- Standard storage class stores the data redundantly across multiple devices in multiple facilities.
- It is designed to sustain the loss of 2 facilities concurrently.
- Standard is a default storage class if none of the storage class is specified during upload.
- It provides low latency and high throughput performance.
- It designed for 99.99% availability and 99.999999999% durability
S3 Standard IA
- IA stands for infrequently accessed.
- Standard IA storage class is used when data is accessed less frequently but requires rapid access when needed.
- It has a lower fee than S3, but you will be charged for a retrieval fee.
- It is designed to sustain the loss of 2 facilities concurrently.
- It is mainly used for larger objects greater than 128 KB kept for atleast 30 days.
- It provides low latency and high throughput performance.
- It designed for 99.99% availability and 99.999999999% durability
S3 one zone-infrequent access
- S3 one zone-infrequent access storage class is used when data is accessed less frequently but requires rapid access when needed.
- It stores the data in a single availability zone while other storage classes store the data in a minimum of three availability zones. Due to this reason, its cost is 20% less than Standard IA storage class.
- It is an optimal choice for the less frequently accessed data but does not require the availability of Standard or Standard IA storage class.
- It is a good choice for storing the backup data.
- It is cost-effective storage which is replicated from other AWS region using S3 Cross Region replication.
- It has the same durability, high performance, and low latency, with a low storage price and low retrieval fee.
- It designed for 99.5% availability and 99.999999999% durability of objects in a single availability zone.
- It provides lifecycle management for the automatic migration of objects to other S3 storage classes.
- The data can be lost at the time of the destruction of an availability zone as it stores the data in a single availability zone.
S3 Glacier
- S3 Glacier storage class is the cheapest storage class, but it can be used for archive only.
- You can store any amount of data at a lower cost than other storage classes.
- S3 Glacier provides three types of models:
- Expedited: In this model, data is stored for a few minutes, and it has a very higher fee.
- Standard: The retrieval time of the standard model is 3 to 5 hours.
- Bulk: The retrieval time of the bulk model is 5 to 12 hours.
- You can upload the objects directly to the S3 Glacier.
- It is designed for 99.999999999% durability of objects across multiple availability zones.
Performance across the Storage classes
S3 Standard | S3 Standard IA | S3 One Zone-IA | S3 Glacier | |
---|---|---|---|---|
Designed for durability | 99.99999999% | 99.99999999% | 99.99999999% | 99.99999999% |
Designed for availability | 99.99% | 99.99% | 99.5% | N/A |
Availability SLA | 99.9% | 99% | 99% | N/A |
Availability zones | >=3 | >=3 | 1 | >=3 |
Minimum capacity charge per object | N/A | 128KB | 128KB | 40KB |
Minimum storage duration charge | N/A | 30 days | 30 days | 90 days |
Retrieval fee | N/A | per GB retrieved | per GB retrieved | per GB retrieved |
First byte latency | milliseconds | milliseconds | milliseconds | Select minutes or hours |
Storage type | Object | Object | Object | Object |
Lifecycle transitions | Yes | Yes | Yes | Yes |