5. Storage Basics Demo Part 2
So again, whatever you want to do, I’ll go over here and edit permissions, rename it, or whatever I would like to do. So again, that’s how you upload files: by creating a bucket. It’s extremely simple. Now let’s go over to, let’s see, back to storage here for one more thing. And if we go over here, yes, right there, the storage went too fast. So you’ve got the other types of storage that are available: SQL, cloud storage, data stores, for example, and big tables. So again, if I go to SQL, I have nothing set up in this instance. For the purposes of this class, we will not go over each option in detail in order to give you an idea. But again, you create an instance, and I could choose either. And remember, there’s a cost to this. So I’m using my free credits, and this will kill it if I forget it. So, if I go to my SQL, I can select the generation if I want.
I could use either the first or second generation. Now, the major differences are highlighted here. You only want to use the first generation of MySQL. If the application that you’ll be reporting on is using it, there’s no good reason to use it other than that. Because again, if you need any has or anything, then it’s better to go with the current generation, and with that instance ID, you would go ahead and enter, and I’m going to call it a GCP test. One, two, three. I could set a password here. Now, again, you don’t want to do this with no password. Even if it’s just being kept in the GCP cloud, it’s still a best practise to generate a password. Now, if you go here to generate a password, it will generate one for you. Now, location: be very conscious of the location.
So, if you’re a national company and the majority of your customers are on or near the east coast, you should use east in most cases. Now, again, there are tools out there like CloudHerman that you could use to figure out if, maybe, central would be better than, say, an east one, for example, or west, depending on your location. Again, there are four GEOS here for us, four zones, that is. So you could see Central 1, East 1, and West Well, that’s because in my SQL, this specific capability is only supported in certain zones and in certain regions. Another thing to be cautious of So, if I travel to the east coast of the United States, I intend to set up shop. But if my customers are mainly California-based, you again want to pay attention, and maybe I should use West. Should I use Oregon? Or, let’s say, go to configuration options. There are numerous options to consider. In this case, I don’t really care. There’s no need to get it, and we’re not training in SQL right now. We just want to give you an idea of how to get up and running. You can see that it is launching that SQL instance.
So if I go back, you can see now that this can take up to a minute; it really just depends on the latency and the workloads. This is a very small instance, so it’s not going to consume too many resources. So let’s go back and take a quick look at — actually, let’s go to CloudSpander — and see what we can find. So it says I have to enable the CloudSpanner API for my project. So what do we have to do? This is a good exercise. Let’s go ahead and create an instance, and you can see that the cost is, again, $79 in credit. I probably don’t want to do this right now, but play around with it. It’s all free if you want. And then the cost of storage is an additional capability. So again, take a look at that as well. So that’s cloud spanner, and this is essentially what you want to use for your SQL databases to enable the database service for transaction capabilities. And then you go here to learn more if you want. Okay, so I want to harp on that. And then lastly, let’s just go back to SQL; it’ll take a little bit of time, but it’ll come up. And then once it’s up, you can then select it and essentially take a look at the config essentially. Then you can see create instance if you want to go create another instance. And then, if you notice this icon here, see how it’s doing. That is a really cool circle. That’s essentially telling you that you can go ahead and not have to look over here because it’s actually still doing its process. So, once again, there is a lot of great storage capability available for COVID storage. You really need a whole day to really get into it deep. But for what we need to know, this is a good start for you.
6. Cloud Storage Overview Part 1 of 2
Now, let’s get started by discussing what cloud storage is. First, we’re going to talk about what object storage is and why it’s important. We’ll also make some comparisons to Amazon’s three. I hope this helps some of you who are familiar with AWS to relate to us and understand the capabilities of Google Cloud Storage in comparison to AWS.
When it comes to storage classes, I’m going to go ahead and talk about the classes of storage that you’re going to want to know and the terminology you need to know too. Now, this module here, if you do take either the Data Engineer course or the Cloud Architect course and decide to pursue either of those certifications, is certainly one of those modules you want to pay attention to. So, while this course may not be entirely focused on either of those, this module could certainly correlate to either of those exams. Now, cloud storage is unified object storage for developers and enterprises. You could go ahead and use object storage with Google Cloud for live data. You could use it for data analytics, machine learning, and data archiving. Essentially, it has a wide range of uses for cloud storage. the So just to clarify, for folks that don’t know what “immutable” means, basically, when we’re talking about object-oriented storage or actual development generally, “immutable” means that once it’s created, it can’t be destroyed.
Mutable is sort of the opposite, of course, where you can modify it and add metadata to it, et cetera. So that’s essentially the main difference between immutable and mutable when it comes to abstraction. We’re going to talk about what buckets and objects are here coming up. Essentially, these storage buckets that you’re going to create And we’re going to go through the demos and talk about all this fairly well in detail. But essentially, you’re going to create a bucket. You’re going to add objects to that bucket. And it’s no different than an Amazon, for example, when it comes to buckets and objects; essentially, it’s the same terminology. Now, when it comes to globally unique URLs, this is again a fairly similar approach. You go ahead and assign a bucket a URL; it’ll allow, for example, one-time use of that bucket to get objects or an object, whatever you need to use. This is basically common storage for the Google Cloud Platform.
I like to think of cloud storage as the entry point for your data services. In a lot of cases, it could be the dumping ground or the starting point. Depending on who you talk to, you may get a different viewpoint. But Google likes to refer to cloud storage as essentially the ingest point for object storage. And that object storage could be used to corestore unstructured data that is going to be used further down the data services lifecycle, essentially. We’re going to talk a lot more about this, essentially. Now, how do you access cloud storage? Rest APIs could be used to access this with client libraries. You could access it through the console. You could also access it through the GS Utilities as well. Now, in one of the notes up here, it says that cloud storage is not a file system, though. Cloud storage is not a file system. Third-party tools such as cloud storage fuse can be used to access it as one. Now, Amazon Web Services has essentially created an EFS system. It’s called Amazon EFS. Google does not achieve essentially EFS with EFS.
EFS is now a native file system. This is Amazon’s native file storage. Now, Google does not actually have native file storage. If you want to pursue file storage in Google Cloud, you have to use GSS, which is Google Cloud storage. And then you have to add what’s called the fuse adapter. That’s the Fuse adapter, which essentially mimics or emulates a file system. It allows you to mount file systems from cloud storage. And those buckets, et cetera, and folders effectively convert that into a file system. So you’re going to go ahead and add this adapter called Fuse, and then you’re going to go ahead and configure that adapter to essentially look like a file system. Now, there is a separate module on Fuse later in the class. So just be aware. We’ll cover that a little bit more. But I just want to clarify that because I wanted to add that in now because we’re going to talk more about Google object storage, which is cloud storage.
And that’s a common use for object storage as well: to again use GCs with the Fuse adapter. We’ll go ahead and sort of focus more on that in that module. Okay. Now, with cloud storage, you could go ahead and use cloud storage for online or offline imports. You go ahead and, of course, use all the storage classes, and with those storage classes, you’re going to use the same API, which is actually pretty convenient. It’s a very simple pricing model. Now, I’ll be honest, there’s a lot of talk on the Internet. Organizations such as Right Scale have done pricing comparisons. I just think it really depends on your use case. Is Google cloud storage less expensive than Amazon S3 or Microsoft Azure? It’s all an exercise in futility,
I think, in a lot of cases, because my experience with cloud architectures is that you’re going to experience what’s called “scope creep.” Now, scope creep, for those that aren’t familiar with what that means, is essentially that you design a cloud for a use case, and then over time, that use case sort of gets expanded on. For example, you design your Google Cloud to handle data analytics and big data services, and then you decide, “Okay, well, we’re going to go ahead and add to that.” And now we want to expand the cloud to also provide infrastructure as a service for our virtual machine desktops. And so what you’ve just done adds to the use case of that cloud, and that is considered essentially scope creep as far as ingress and egress. For those folks that aren’t familiar with withingress or egress, basically, when you’re putting data in the cloud, that is ingress. When you’re taking data out of the cloud, that is called egress. There are different charging schemes for each, but you generally get a certain amount for free. Once again, pricing is an exercise in futility based on what you’re trying to do. My whole sort of thought on pricing is that it’s important to get a good idea, but you should never expect to get it 100% correct in an enterprise. Again, if you’re running a WordPress site and that’s about it, that’s a different story. But if you’re an enterprise and you’ve got petabytes of data, terabytes of data, and thousands of virtual machines, it’s just a moving target. To talk about pricing There is a module on pricing as well.
We’ll talk more about that. Okay, let’s talk about cloud storage classes, and this is actually one of the more important areas if you do take any of the exams, specifically the architecture exam, but if you do take the data engineer exam, you’ll get a couple of questions on this as well. Now there are four storage classes. Essentially, when we talk about classes, it means exactly what it sounds like. There are different levels of service. Just like if you go on an aeroplane for an international flight, you generally have at least two, if not three, types of classes. You have first-class coaching and business, and now they have economy, super economy, and stupid economy. I think you get the point. So with object storage, it’s really no different. We have different levels of classes, and what this means is that there are different levels of availability, performance, cost structures, and capabilities, but also different use cases. So let’s talk about multiregional first. Multiregional is essentially georedundant storage. This is the highest level of availability and performance. This is ideal for general applications that require essentially very low latency. Another thing to bring up is that if you’re an international organization, this is probably going to be one of the choices you’re going to want to look at. It’s going to be distributed as well. Regional is also the highest level of availability and performance as well. Except for that, it’s in one region; it’s more localized. This is storage that you’ll want to keep in a more regional approach, such as in the United States, Asia, or Europe, or whatever region you’re using now that the nearline and coal lines are nearly complete, because it’s essentially very low cost. It’s certainly a lot cheaper, and again, I won’t cover pricing.
There is a pricing calculator at the end that I go over that you can experiment with. And also on the page, the pricing is there, but you can see that they do have estimated storage pricing. And when I go through the calculator, I also show you, for example, that the different regions and zones are going to have different pricing. So I like to really call it a moving target. Because again, if you’re an enterprise, you’re never going to get this correct; you’re never going to guess right. It’s just not going to happen. Now when it comes to, let’s see, nearline, Nearline, a good use case for that is data that needs to be available for a set period of time. But after that certain length of time, let’s say typically one month or two months, you will generally want to move it to the cold line if you’re not going to access it. And the reason for this is that, once again, there is a significant price difference in most cases, but it also doesn’t make sense to pay a higher price for that data if you’re not going to access it anyway. In many cases, cold storage is storage that is primarily used for compliance or archival purposes. Consider the storage classes to be a subset of Google Cloud storage. It’s essentially the same way that you’re going to use the same APIs to access the storage. The question is, what kind of availability do you need? What kind of performance do you need, and do you need to access it now? What kind of costing do you need when it comes to comparing cloud storage to other storage capabilities in Google Cloud?
As you can see, we’ve got Cloud, SQL, Data Store, BigTable, BigQuery, et cetera. We’re going to talk about each of these individually. That seemed like the best way to COVID each one individually. So cloud storage is measured in petabytes. Again, you could store pretty much endless amounts of data. And what’s good about cloud storage is that you don’t have to worry about managing storage arrays or managing whether you have enough space. Google is going to handle that for you. You just have to manage it. Do you have the right credit card or PO to deal with it? Now, I’m going to talk here about AWS and how cloud storage compares to AWS as well. What I did want to point out for those folks taking either exam is that both exams are going to essentially challenge your knowledge on whether or not you know the right use case for the right storage service. For example, if you get a test question about scaling SQL horizontally, you should look at doI use, and cloud spanner isn’t here, but it didn’t appear on the slide here. But again, do you use “Cloud sequel” or “Cloud spanner”? For example, if you need a data warehouse that you can use SQL with, you may need to look at BigQuery, if it’s a datawarehouse that you want to use SQL with, for example.And there is just so much more to talk about. But for this course, I’m going to focus on storage and make sure that we compare the different types for you. So let’s finish up cloud storage and continue on. Okay, for our AWS folks, let’s go ahead and make sure you understand the similarities and differences between AWS and Google Cloud Platform storage. So AWS has a solution called S Three. This is object storage. Google Cloud has cloud storage. Object storage, as we know it.
As far as availability, it’s the same SLI, essentially. Now, Google does have hot storage, coal storage, and cool storage as well, and you can see that. Now, this is from WriteSkill. If you don’t know who Writeskill is, they’re one of my favourite cloud organisations out there, vendors in that sense, because they have this platform that allows you to manage all of these different platforms in one view and gives you control over everything from pricing to services to scalability. And it gives you, like I said, that really nice aggregation of your services, if that’s what you need. But they do a tonne of great research, reporting, and surveys. They have this annual report called, basically, the annual Cloud Report. There’s a name that they give it; I don’t remember it off the top of my head, but I’ll leave the link for that as well. Now, at RightScale, they do a lot of good research, and this is from RightScale, so I have to give them credit for that. And, as you can see, they compare quite well. You can see, for example, size limits, and this is still true today. It’s five terabytes per object, but you can still scale indefinitely. So you could keep on dumping data into cloud storage, and it really doesn’t matter. So when we talk about archive storage, we have AWS Glacier, and then we have the Cold Line. We discuss cool storage, which is infrequently accessed, and you contrast it with nearline storage, which is highly available storage. Your hot storage will be S-3 standard, and that will be your GCs storage. Typically, it will be regional or multiregional.
7. Cloud Storage Demo Part 1 of 2
I’m over here at the Google Cloud Platform dashboard. What I’d like to do now is let’s go ahead and create a storage bucket, and we’ll talk about some of the features and functions that are available when you do create a storage bucket in the Google Cloud platform. As with most Google services, there are several paths to take when it comes to cloud storage.
The first option is to navigate to Products and Services on the left hand side, then to Storage and select Storage from the drop-down menu. Alternatively, we can go over here to the search capability and include cloud storage as well. And you can see how all of that leads us there. Also, when I type in “cloud storage,” you can see that it brings up the APIs for cloud storage. Let’s select cloud storage. We are now in the cloud storage dashboard. When we look at the dashboard, we can see that there are already two storage buckets. This is for one of my app engine applications that I created—just a Hello World app. By looking at the buckets that have already been created, you can see that we have a name. We have the default storage class, which is regional. We have the location, which is us, east one. We have the lifecycle. Remember that the lifecycle is where you can specify specific actions that will manage that object. So for example, if you want to downgrade the storage class, you could do that. Let’s say anything over six months or a year is something you want to downgrade or have essentially deleted. Whatever you’d like to do, you go ahead and create a lifecycle rule for that.
Labels are important to create as well, especially in a production environment. This will allow you to search and find and also be able to utilise some of the services more efficiently, so the requester pays well. This one’s interesting. Now with this one here, if you set up a cloud storage share, you have the ability to have the requester for those files pay for that service. This is good for a subscription service, for example. Let’s proceed over there and take a look at some of the options. On the left side, we have the browser. By default, this is where we are. We have transferred. Now this is a transfer service where you could transfer from Amazon Web Services, essentially, your files that are in s three.Or you could also get them from other buckets. For example, in Google Cloud, let’s say you want to do some kind of cross-regional migration. You could do that also. You could set up applications in house as well to use this service; you have the Transfer appliance. This is an additional service that you could essentially sign up for and subscribe to. This is primarily intended for transferring large amounts of data. We’re talking generally about petabytes. In most cases, you could certainly use it for less. But that, of course, is your call. We’ll talk more about transferring data in that module. And then, lastly, we have settings. This is where you could specify project access. Remember that when you’re creating a bucket, you’ll generally need to be aware of the project that you’re working on. Remember that these resources are again more concerned with where you’re placing them, which is the project. And be aware of the project that you’re working on.
interruptability and interoperability as well. You can see that it picks up the project. Again, this is a default project. This is the one that we’re logged into. You can create a new key as well for storage access keys. This is of course important when you’re creating applications that may want to drop or pickup storage, essentially object storage that’s in cloud storage, from, say, an external application. You’ll probably want to use your own keys. Let’s go over to the browser and create a new bucket. When we go over to the right side of the interface, you can see that it says “Browser.” We have several options at this point. We have created buckets and refreshed. Now again, if I hit refresh, nothing happens. But let’s say if I select this staging, you can see that I have an option that has now been enabled or highlighted that says delete. In this case, I don’t want to delete that. I’d like to create a new bucket. So let’s go ahead and select “Create Bucket.” As you are aware, there are various storage classes, and you must again decide which storage class to use based on your availability, access, and costing requirements. But before we select the storage class, let’s go ahead and really talk about bucket names.
Now the bucket name needs to be, again, unique. This is a name that will essentially be unique not only to you but also on a global scale. So when we type in, let’s say, “test,” you can see that that bucket is already in use by someone; someone took it. It also reminds me that bucket names must be unique globally. So just be aware of that. So I’m going to go ahead. Let’s go ahead and call this something that would probably be unique. This is what I’m going to call it; it should also be in lower case. Let’s go ahead. GCP Test Bucket Storage Course: Let’s go ahead and go course. Then there was one. So, once again, the name will be simple: GCP Test Bucket Storage Course One. Now, one of the things I’ve noticed is that typically, the shorter the name, the more likely it’ll be taken. Also, you’ll want to pay attention when you create your buckets. You want to identify that bucket in a way that makes sense for your organization. For example, let’s say this is for your Hadoop ingestion processes. So let’s say you’re using DataProc and you want DataProc to again be utilised and you want to use other services for big data—Hadoop clusters, whatever.
You go ahead and name it after the specific application use case, for example, “Big Data One” or “Mobile Application One” in an Oracle database dump. Let’s just say you’re dumping metadata or something. Whatever makes sense in your situation, just pay. Of course, you wouldn’t use this for your Oracle databases, but it’s possible that you just want to save some redo logs, backups, or anything else that can be saved as an object. Now a couple of other things are too important to point out before we proceed. As a best practice, you should also pay attention to how you name something and try to name it. Typically, at least for me, I like to put a number because what will happen is that if you set up object versioning, this will make incrementing easier as well, at least from a visual perspective. And also, this could be used for your lifecycle management as well. Again, you are not required to do so. But for me, I just feel it’s easier when it comes to the default storage class, multiregional. This is typically used for production data that you don’t want to lose access to, and you want higher availability and geo-redundancy. Essentially. Multiregional, regional. Again, if you’re going to select this, then you should be aware that, for example, you could go ahead and set this up in, let’s say, Iowa or South Carolina in one of the zones, and you could of course replicate that between that region and those zones.
Essentially, again, the cost of this is good for data that is regional, just like it infers, right? If your customers are based in the US or Canada, then it probably makes sense to go regional. However, if your customer base is more international, then you probably want to choose multiregional. Once again, this is where you want to plan your storage classes pretty cautiously because, as you’re likely to find that the cost for multiregional is somewhat higher than regional, you’re likely going to select regional or multiregional for your production data. Again, the difference between the two really depends on where the use case is. Is it more national or international? Is it distributed or not? Then we have what’s called “nearline,” and we have “coal line.” For those who are familiar with Amazon, this is Google’s approach to Glacier. Now with Nearline, this is good as you’re aware of data that you’re not going to use for, let’s say, less than 30 days, which is usually the rule of thumb because of the cost to get access to the data if you need it. Basically, Google calls it the retrieval of the data, and there is a cost to that. The same is true for coal lines. Again, if you’re going to use this anywhere, this is required. This basically means less than once a year. Personally, based on what I’ve seen, you could certainly use cold line for, say, six or seven months in many cases. Again. Generally, I like to call her Coldline. Basically, your tape archive is what it is. You don’t want to have to restore this if you don’t need to. Of course, it’s not personal, but just be aware. It sort of acts like that. Now, you can see that when you select the different options, it says “Location.” Now, pay attention to this as well. If most of your users are in Northern Virginia, then you probably don’t want to select Europe. You probably don’t want to select West, for example.
As you can see, too, not all zones are covered. Just be aware of that as well. If I select Nearline, this is where you get to select the region. If I select Multiregional, you can see that it’s going to let you select the location. United States. If I select regional, I’m going to select us. East 1, let’s say, or let’s say I select East 4, whatever. So let’s say it’s South Carolina. Or do I want Northern Virginia? So let us just lean over here. Let’s specify labels. Now, labels are really important to identify ahead of time, if you can, because this could make your processes, your management, and your monitoring more efficient. Again, just go ahead and type it in. Let me go ahead and think of something. Type it in here. I guess I could just say, “Test the app bucket,” and then I could assign a value. One of the things about choosing a value is that you can choose pretty much anything you want. You could choose a number, add a word, or do whatever you want. And then you can see that it added this bucket here with this value. And you can keep on adding the key values as you so choose. Let’s go ahead and create it and then get rid of it. And then I go create. You can see that it now takes you to the bucket I just made.