9. S3 Version Control
Okay, so here I am in the AWS console. Now, before we get started, I just want you to go over to Services, go down to Storage, and click on Glacier. Glacier is obviously used for data archiving. It’s not available in every region, however.
So if you click on Northern Virginia, you can see here that it’s not available for Singapore or South America. So if you are in either of those regions, please change over to somewhere where it is available. Otherwise, you won’t see the same screens as the rest of us. Now, if we click over to Services, we’ll go down to S Three. I’m going to go ahead and create a new bucket. I’m going to create a bucket, and we’ll just call this a Cloud Guru Lifecycle Policy. Let’s see if somebody stole this one. Go ahead and hit “next.” No, no one has, and that’s great. Click on versioning.
I’m going to enable versioning for this. You can turn versioning on or off for this lab. It’s entirely up to you. I’m just going to have it on so that you’ve got the full, complete experience. Go ahead and hit “next.” I’m going to leave my permissions as default, so don’t give anyone any access to them. Go ahead and hit “create.” So that has now created our bucket of cloud guru lifecycle policies. Now we’ve got no objects in the bucket. What we might want to do is just go ahead and create the policies before we upload our objects. So click into management.
We proceed to the lifecycle. It says here to use lifecycle rules to manage your objects. You can manage an object’s lifecycle by using the lifecycle rule, which defines how Amazon S3 manages objects during their lifetime. Basically, it’s a way to automate the transition to tiered storage. So you can send it from S Three to S Three, which is infrequently visited, and then onto Glacier, or you can send it straight to Glacier. You can also use a lifecycle rule, which allows you to automatically expire objects based on your attention period.
So it really depends on what you’re using it for. But a lot of people do use lifecycle rules. Okay, so we’ll enter a rule name. We’ll just call it my life cycle rule. If I could type it, it’s a very creative name. We can add filters or tags here so that only specific objects with those tags or with those particular names would be able to have this lifecycle rule apply to them. I’m not going to do that. Let’s move on to the next step. You can now switch between the current and previous versions. So let’s start with the current versions. We’re going to add a transition in here, so we need to select the transition.
So I’m going to transition my current version of my object to standard, infrequently accessed storage after 30 days. So after it’s created 30 days later, it’s moved to a different storage class, which is infrequently accessed storage. Now interestingly, the object itself has to be over 128 KB in size in order to be allowed to transition, which I don’t know why that is, but that’s just the rule in here. We’ve got another transition. Assume we switch to Amazon Glacier 60 days after it is created. So we put our object in S, and after 330 days, it moves to Standard Infrequently Accessed. The object will then be moved to Amazon Glacier 60 days after it is created, which is another 30 days after it has been sitting in Standard infrequently accessed storage. Here we can do the same with previous versions. So we can add a transition. With previous versions, the object becomes a previous version a few days after it is created. This is a little bit tricky in its language.
So when do we want it to move over? And of course, with previous versions, you probably—I mean, we could be a little bit cautious and move it to “infrequently accessed” after 30 days. We can also then do exactly the same or move it down to the glacier after 60 days. So we’re always going to have a copy of our current version and the previous version in infrequently accessed storage on a 30-day cycle. And then, after two months, or after 60 days, it’s going to move over to Glacier. So let’s move on to the next step. We can configure expiration as well. So when do you want the version or the object to expire? You could say the current version could expire after 425 days from object creation. And in previous versions, you could also say the same thing. I don’t know why it defaults to 425. It’s an interesting number. In here, it says you cannot enable cleanup for expired object deletion markers if you enable expiration. So just bear that in mind. Go ahead and hit “next.” And that’s it. You can see your lifecycle rules here. This is a new UI; the old UI used to show that it was actually really cool; it was used to create this really awesome graphic and show you exactly how it all works. But you can understand it by just reading it here. Go ahead and hit save, and then that is it. I mean, that’s the lifecycle rule. So it’s not very difficult. There are a few hints and tips that you should really consider.
10. Cross Region Replication
Started. What is a CDN? Well, a CDN is a content delivery network, and it’s a system of distributed servers or a network of them that deliver web pages and other web content to a user based on the geographic location of that user, as well as the origin of the web page and a content delivery server. So what does that actually mean? Well, let’s take this world map, for example, and let’s assume that I’m running a website out of the UK and that I’m serving this website to users all over the world. So here are my users. Now, when the users go to access this web page, they’re making the request to my web server, which is in the UK. and you’re going to get different latencies. So people in Australia are going to have a much longer latency than, say, someone in India, someone in the United States, or even someone in South America. And don’t forget that the way the internet backbones work is that they’re not direct, straight lines to individual countries. They go across these undersea cables, which add even more latency.
So it used to be that South Africa would have terrible latency, but they’ve run internet backbones that make it a lot quicker now to connect to the UK. So this is how it works. Without a CDN, we’ve got our users around the world; they’re trying to access a web page or maybe a movie file in the UK. And they’re having to go directly to the UK, pull that file down, and then bring it back to their home country. So let’s go over some key Cloud Front terminology. And the first thing you have to know about cloud fronts is what an edge location is. And we talked about this in one of the previous sections. But an edge location is the location where content will be cached. Now, this is separate from an AWS region or availability zone. So there are edge locations all around the world at the moment, and they’ll be in areas where there might not be a region, but there will be an edge location. And you can see where the edge locations are by going to the AWS website. I think there are over 50 currently. Now, what’s an origin? Well, an origin is the source of all files that the CDN is going to distribute. So in our case, the origin was that web server that was in the UK.
Now, in terms of AWS, an origin can either be an S3 bucket, it can be an EC2 instance, it can be an elastic load balancer, or it could even be Route 53. But in most common cases, it’s going to be a S three bucket, or it’s going to be an EC two instance, or perhaps an elastic load balancer with EC two instances behind it. So our origin is simply where our original files are. Now, what’s a distribution? Well, this is the name that’s given to the CDN, which consists of a collection of edge locations. So when we create a new CDN sort of network with AWS, this is called a distribution. And so we’re going to create a distribution in the lab in the next section or in the next lecture of the course. So what is a CDN, and how does it work? So again, we’ve got our UK file server here, we’ve got our users all around the world, and now I’m going to have multiple users in multiple countries, and you’ll see why in a second. So we’ve also got our edge locations. Now, edge locations are spread all across the world. There are over 50 of them currently.
And let’s see what happens when the first user makes a request to get our content. That request is going to go to an edge location, and essentially you’re going to be using a distribution URL. And don’t worry if that doesn’t mean a lot to you; we’re going to do that in the next lab. But you’re going to make a request, and that’s going to be routed first to your edge location. Your edge location is then going to see whether or not that object is cached at that particular edge location. So let’s say it’s a video file. A user is trying to play a video file; they made a request, it’s gone to your edge location, and your edge location does not currently have that cached in that current location. So the edge location pulls it down from the S3 bucket, which is, let’s say, in Ireland. As a result, it pulls it down to its local edge location and caches it while it waits for the time to live, or TTL. And again, we’re going to have a look at that in the next lab. So you might think, “Well, that’s no quicker than when I gave my previous example,” and you’re right, it’s not quicker for the first user. However, when the second user then goes to access that file, that file is already downloaded to the edge location. It’s being cached at that edge location for the length of time that’s been specified by a TTL. So that user is now just pulling the file from their local edge location as opposed to pulling it down from the S3 bucket that’s located in Ireland.
And for this reason, it speeds up delivery of your image files and your video files a lot quicker, provided they’re being regularly accessed by different users. So it’s only your first user who suffers this sort of performance penalty. The rest of your users in those regions that have edge locations can now access this file a lot quicker because it’s been cached at that edge location. So Amazon CloudFront can be used to deliver your entire website, including dynamic static streaming and interactive content, using a global network of edge locations.
Requests for your content are automatically routed to the nearest edge location. So content is delivered with the best possible performance. Amazon’s Cloud Front is optimised to work with other Amazon Web services, like Amazon Simple Storage Service S3, and Amazon’s Elastic Cloud Compute Service EC2, which we’re going to cover in COVID in the next section of the course. Amazon’s Elastic Load Balancer Again, we’ll COVID that in the next section of the course, as well as Amazon’s Route 53. Amazon Cloud Front also works seamlessly with any non-AWS origin server that stores the original, definitive files of your versions. So you can actually have an origin that’s not hosted by AWS itself. You can have your own default origin server. And again, we’ll look at a little bit of that in the lab in the next lecture of the course.
So some key terminology going into the next lecture: you’re going to need to understand what a web distribution is. And web distribution is essentially used for websites; this is opposed to an RTMP distribution. An RTMP is used for media streaming. In particular, it’s used for Adobe Flash. So we will cover that in the next lecture of the course.
So my exam tips for the exam include understanding what an edge location is. This is the location where content is going to be cached. This is separate from an AWS region or availability zone. Remember that the world currently has over 50 edge locations. This is the origin of the files that the CDN will be distributing. This can either be an S-3 bucket, an EC-2 instance, or an elastic load balancer route 53.
Remember that an origin does not even have to be with AWS. You can have your own custom origin servers; “distribution” is the name that’s given to the CDN, which consists of a collection of edge locations. and we will create a new distribution in the next lab. Now, distributions consist of the following: You can have two different types of distributions. You can have a web distribution, which is typically used for websites and is the most common one, or you can use RTMP, which is used for media streaming. And in the next lab, we’re going to create our own web distribution.
And there are a few other things you should know going into the exam. Edge locations are not for read-only use. So in our example, we were talking about reading a media file. You can actually write new files to an edge location as well. So you can put an object at an edge location. That object will then be basically put on backup at the original origin server, so it will be replicated back to the origin. The other thing you should know is that objects are cached for the life of the TTL.
So you set a TTL on your objects, and that shows how long they’ll be cached at your edge locations. And if you do have an object that is cached at an edge location, that’s no longer relevant. You can clear the cache objects, but you are going to be charged for that. So objects will automatically expire after the TTL has finished. So if its TTL is 24 days, that object will then expire 24 days later. If, however, you’ve just done a big update to your video and you want everyone to access the new version of the object, you’re going to have to go in and clear the cache from the cloud front, and you’re going to be charged for that. But it is definitely possible. Okay, that’s it for this lecture, guys. If you have any questions, please let me know. If not, feel free to move onto the next lecture, where we’re going to create our first Cloud Front distribution.
11. S3 Lifecycle Management & Glacier
So let’s start with security and how you can secure your buckets. So, by default, all newly created buckets are private. And you saw that in one of the last labs when we created a bucket and uploaded a file to the bucket, and then we couldn’t access the file. We kept getting that error message—that XML error message—that said Access denied. So we had to go in there; we had to make that bucket public; or we had to actually make that file public. Now, there are a couple of things that you can do in order to set up access controls for your buckets.
The first one is using bucket policies. Now, bucket policies are bucket-wide, so the permissions are applied to the entire bucket. And then you can also use access control lists. Now, your access control list can actually drill down to individual objects. And again, you saw that in the lab. We made a particular object public, but the rest of the bucket was private. So, going into the exam, it’s really important to understand that you can secure your buckets using those two features. So, bucket policies and access control lists The other thing to remember is that S Three Buckets can be configured to create access logs that log all requests made to the S Three Bucket. And this can be done to another bucket, and it can even be done to another AWS account. And we’ll talk a bit about how to setup cross-account access later on in this course. So let’s move on to encryption. Now, encryption is a very important thing to note going into the exam. You will be expected to know the four different methods of encryption for S 3.
Now, there are two types of encryption. There’s data in transit, and then there’s data at rest. So InTransit is simply when you’re sending information to and from your bucket. From your computer to the bucket itself And then that’s secured using SSL or TLS encryption. For those that don’t know, TLS is basically the replacement for SSL. But those terms seem to be a bit interchangeable at the moment. So it’s just using HTTP; that’s how you encrypt your data in transit. And then at rest, there are four different methods for encryption at rest. So there’s server-side encryption, and then there’s client-side encryption. So we’ll start with server-side encryption. So there are three different methods. Under server-side encryption, there are three managed keys, and this is basically where each object is encrypted with a unique key, employing strong multifactor encryption. And then, as an additional safeguard, Amazon actually encrypts the key itself with a master key, which they regularly rotate that master key.
Now, Amazon will handle all the keys for you. It’s AES 256. So that’s the advanced encryption standard of 256 bits to encrypt your data. And basically, you don’t have to worry about it. The way you use that service is how we saw it in the lab, where you just click on the object and hit encrypt. So that’s the most common one. And it’s known as “server side encryption S3.” So the next one is server-side encryption with AWS Key Management Service managed keys. And basically, KMS is a key management service. And this is very similar to SSE S3, but it comes with a few additional benefits as well as some additional charges for using it. So there are separate permissions for the use of an envelope key. In S Three, an envelope key is simply a key that protects your data’s encryption key and provides additional protection against unauthorized access to your objects.
The other main advantage of SSE KMS is that it also provides you with an audit trail of when your keys were used and who was actually using the keys. It gives you that additional level of transparency as to who is decrypting what and when. And then you also have the option to create and manage encryption keys yourself. Or you can use the default key that’s unique to you and the service that you’re using, as well as the region that you’re working in.
And then finally, the other methodology is using server-side encryption with customer-provided keys. This is sometimes referred to as SSC Hype and C, and this is where you manage the encryption keys. And Amazon manages the encryption as it writes to disc and then the decryption when you access your objects. But the actual management of the key is done by you yourself. So those are the three server-side encryptions. And you really just need to remember “server side encryption,” “server side encryption,” and “client side encryption.” And this is effectively where you encrypt the data on your client’s side, and then you upload it to S Three.
So that’s it. Cloud Gurus It’s really pretty simple. Just remember going into the exam that you can secure your buckets using bucket policies and access control lists. You can enable logging to see what other people are doing in your buckets. And you can either log in to that bucket itself, to another bucket, or even to another AWS account entirely. You can secure data travelling to and from S3 using SSL. And then, in terms of when the data is stored in S3, you can encrypt the data using SSE Three, SSE, KMS, SSEC, and then also using client-side Side Encryption. If you have any questions, please let me know. If not, feel free to move on to the next lecture. Thank you.