This article will help you in understanding important concepts for Identity Access Management (AWS IAM) and AWS S3. This will helpful for users who are
- Preparing for AWS certified solutions architect exam
- Already using IAM and S3 but need to understand features available
- Would like to deep down into S3
Identity Access Management (IAM) consists of the following:
- Users
- Groups
- Roles
- Policies
We apply the policy to our users, groups, and roles. If we apply a policy to our group and we have got users within that group. They’re going to inherit that policy automatically.
We can’t actually use AWS without knowing IAM. So it is really important to understand the core concepts of IAM and what users, groups or roles and policies are.
Some important points for Identity Access Management (IAM)
- IAM is universal
- The root account has complete administrator access
- New users have ‘NO’ permissions when first created. So whenever we create a new user, that user is not going to have any rights or any privileges until we grant them privileges.
- An Access key ID and Secret Access Key assigned to new users, When first created.
- These are not the same as a password
- You cannot use the Access Key ID and Secret Access Key to log into the console
- You use it to access the AWS via the APIs and the command lines, however
- You only get to view your Access Key ID and Secret Access key
- If you lose them, you have to regenerate them.
- So make sure you save them in a secure location.
- For your AWS account always set up multi-factor authentication
- On your route account, you can also create and customize your own password rotation policies.
Important points for AWS S3
- S3 is an object base. It allows you to upload files and those files can be zero bytes all the way up to five terabytes
- AWS S3 bucket has unlimited storage and it’s basically just a folder in the cloud
- S3 is a universal namespace, so that means that your buckets must be unique
- S3 is object based storage, so it’s not suitable to install an operating system on or a database or anything like that.
- When you upload objects to S3 or files to S3, your browser is always going to get an HTTP 200 status code that the upload has been successful.
- By default, all newly created buckets are private.
- You set up access control to your bucket using bucket policies and bucket policies
- You can also use access controls and these can go down to the individual files or objects in your bucket.
- AWS S3 buckets can be configured to create access logs, and these can be sent to another bucket in the same AWS account or even another bucket in another AWS account
Key fundamentals of S3
- Key: So basically for S3, you’ve got a key. And this is simply the name of the object
- Value: You’ve then got a value.
- So sometimes people refer to S3 as a key-value pair.
- Version ID: We then have the version ID, which we saw when we turned on versioning.
- Metadata: We have metadata, so we have data about data that we’re storing and we do that through tags.
- Sub resources: S3 has Sub resources such as an access control list and then torrents as well
S3 Consistency model
- It is read after, right, consistency of PUTS of new objects
- Eventual consistency for overwrite PUTS and DELETES
- What does that mean in everyday language?
- If you put an object into S3, you’re gonna be able to read it immediately
- If you overwrite an object or delete an object in S3 and you read it instantly after that, you could get the old version or you could get the new version.
- So sometimes you will get eventually the right version, but sometimes it takes some time to propagate
Different Amazon S3 classes of storage:
- 99.99% Availability
- 99.999999999% Durability
S3 standard:
- That’s what most people use, S3 infrequently access
- So this is where you want the same sort of durability and availability as Standard S3
- Stored redundantly across multiple devices in multiple facilities, and is designed to sustain the loss of 2 facilities concurrently
S3 Standard IA:
- Infrequently Accessed
- It is the best choice for the data which is accessed less frequently but required rapid access when needed
- Its cost is lower than S3, but you are charged a retrieval fee
S3 Intelligent Tiering:
- S3 intelligent tearing uses machine learning to move your objects around depending on how often you use them
- Designed to optimize cost by automatically moving data to the most cost effective access tier, without performance impact and or operational overhead
Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)
- This is use full from where you want a lower-cost option for infrequently accessed data, but do not require the multiple Availability Zone data resilience
- It’s only gonna be in one availability zone. So you have to plan for that
- But it is a much lower cost
Amazon S3 Glacier:
- It is for data archival and you can configure your retrieval time from minutes to hours
- The Glacier is a secure, durable, and low-cost storage class for data archiving
S3 Glacier Deep Archive:
- This is basically Amazon’s lowest cost storage class
- But you will have a retrieval time of 12 hours
Arrange all S3 types from higher to lower price:
- S3 Standard (Most expensive)
- S3 – IA
- S3 – Intelligent Tiering
- S3 One Zone – IA
- S3 Glacier
- Se Glacier Deep Archive (Lowest cost)
Encryption in S3
- SSL/TLS is used for Encryption in transit is achieved by using SSL/TLS
- All the traffic, all the files that you’re uploading are going to be encrypted
- S3 also has encryption at rest and this is achieved by both the server-side encryption as well as client-side encryption
- On the server-side, it’s achieved in three different ways.
- S3 managed keys (SSE-S3): This is where S3 just handles all our encryption for us. We don’t have to worry about it
- AWS key management service or KMS (SSE-KMS)
- Server side encryption with customer provided keys (SSE – C)
- Client side encryption
- So this is where you encrypt the objects and then you upload them to S3.
AWS Key Management Service (SSE – KMS) limits
- If you are using SSE KMS to encrypt your objects in S3, you have to keep in mind the KMS limits
- Uploading and downloading will count towards the KMS quota
- It is region specific. However, it is either 5500, 10000 or 30000 requests per second
S3 Object Lock
- We use this to store objects using a right once read many or a WORM model.
- The term WORM means S3 object logs and these can be applied on individual objects or applied across the bucket as a whole
- Object locks come in two modes
- Governance mode
- compliance mode
- Governance mode is where users can override or delete an object version or alter its lock settings
- Unless they have special permission with compliance mode.
- However, a protected object’s versions can’t be overwritten or deleted by any user, including the root user for your AWS account
S3 Glacier Vault Lock
- This allows you to easily deploy and enforce compliance controls for individual S3 Glacier Volts
- So it’s very similar to S3 Object Lock except it’s applied across Glacier
- You can specify controls such as a write once, read many model (WORM) in a volt lock policy and lock the policy from future edits.
- Once locked, the policy can no longer be changed
To increase S3 performance
- Use multipart uploads to increase the performance when uploading files to S3
- As a rule, you should probably do this many files over 100 MB and it must be used for any files over 5 GB.
- Use S3 byte-range fetches to increase performance when downloading files to S3
AWS S3 Select
- This is used to retrieve only a subset of data from an object using simple SQL expressions
- You can get data by rows or columns using your simple SQL expressions
- It helps in saving money on data transfer because you’re only getting the data that you need. And that’s why it will also increase the speed
Some best practices with AWS organizations
- Always remember to enable multi-factor authentication on the root or master account
- Always use strong and complex passwords on that root account
- The paying account should be used for billing purposes only. Do not deploy resources into paying account
- Enable/disable AWS services using Service Control Policies (SCP) either on organizational units or on individual accounts
Three different ways to share S3 buckets across accounts
- We can do this using bucket policies and Identity Access Management. And because it’s a bucket policy, it applies across the entire bucket and it’s programmatic access only
- We can do this using bucket ACLs and Identity Access Management. And because we’re using ACLs, this is on individual objects. And again, this is programmatic access only
- Cross-account IAM roles and this is programmatic and console access.
AWS S3 Cross Region Replication
- It’s a way of replicating objects across regions, but you can also replicate them within the same region
- So in order for cross region replication to work, you need versioning to be enabled on both the source and the destination buckets
- Files in an existing bucket are not replicated automatically, so if you turn it on for an existing bucket and then create a new bucket, those files will not be copied automatically
- However, All subsequent updated files will be replicated or automatically
- Remember that your delete markers are not replicated. So if you delete an object in one bucket, not going to be deleted in the other and then deleting individual versions or delete markers will also not be replicated.
AWS S3 Lifecycle policies
- Basically it automates moving objects between the different storage tiers.
- They can be used in conjunction with versioning and they can be applied to current versions as well as previous versions.
AWS S3 transfer acceleration
- We have our users there all around the world. We have our edge locations. Our users will upload their files to the locations first, and then those files will get uploaded to S3.
- So if you do need to increase the performance for file upload users look at S3 transfer acceleration.
AWS DataSync
- This is used to move large amounts of data from on-premise to AWS
- It is used with NFS and SMB compatible file systems
- Replication can be done either hourly, daily or weekly and all you need to do is install the DataSync agent to start the replication
AWS Snowball
- It is a big desk that you can use to move your data in and out of the AWS cloud. And Snowball can be imported into S3 so you can import data into S3.
- You can also use Snowballl to move large amounts of data out of S3.
Storage Gateway
- We have different types of storage gateway
- File Gateway
- This is used for flat files that are stored directly on S3 and that is the NFL
- Volume Gateway
- we have two different types of volume gateways
- Stored volumes
- This means that your entire data set is stored on site and is asynchronously backed up to S3
- Cached Volumes
- Entire data set is stored on S3 and the most frequently accessed data is stored on site
- Gateway Virtual Tape Library
- This is used for backups and uses popular backup applications like NetBackUp, Backup Exec, Veeam, etc.
Athena Tips
- It is an interactive query service
- Athena allows you to query data located in S3 using standard SQL
- It is Serverless
- Athena commonly used to analyze log data stored in S3
Macie Tips
- Macy uses AI to analyze data in S3 and helps to identify personally identifiable information (PII)
- It can also be used to analyze cloud travel logs for suspicious API activity
- Macy includes dashboards, reports, and alerting
- It is great for PCI-DSS compliance and preventing ID theft
- So Athena is used for running secret queries. Macie is used as a security service to look for PII
Would like to understand Basics of cloud computing? Click here