Student Feedback
DP-203: Data Engineering on Microsoft Azure Certification Video Training Course Outline
Introduction
Design and implement data storag...
Design and implement data storag...
Design and implement data storag...
Design and Develop Data Processi...
Design and Develop Data Processi...
Design and Develop Data Processi...
Design and Develop Data Processi...
Design and Implement Data Security
Monitor and optimize data storag...
Introduction
DP-203: Data Engineering on Microsoft Azure Certification Video Training Course Info
Gain in-depth knowledge for passing your exam with Exam-Labs DP-203: Data Engineering on Microsoft Azure certification video training course. The most trusted and reliable name for studying and passing with VCE files which include Microsoft Azure DP-203 practice test questions and answers, study guide and exam practice test questions. Unlike any other DP-203: Data Engineering on Microsoft Azure video training course for your certification exam.
Design and implement data storage – Basics
12. Lab - Authorizing to Azure Data Lake Gen 2 - Access Keys - Storage Explorer
Hi and welcome back. Now in this chapter, I just want to show you how you can use a tool known as your Storage Explorer to explore your file accounts. So if you have employees in an organisation that only need to access storage accounts within an Azure account itself, it's actually logging into the Azure Portal. If they only want to look at the data, they can actually make use of the Azure Storage Explorer. This is a free tool that is available for download. You can go ahead and download the tool. It's available for a variety of operating systems. I've already gone ahead to download and install the tool. It's a very simple installation. Now, as soon as you open up Microsoft Azure Storage Explorer, you might be prompted to connect to your resource. So, here, you can actually log in using the subscription option in case you don't get the screen. If you can just see what is the AzureStore Explorer, this is what it looks like. You can go on to the Manage Accounts section over here and click on Add an Account, and you'll get the same screen. I'll choose a subscription. I'll choose as your I'll go on to Next. You will need to sign onto your account. So I'll use my account information as your admin account information. Now, once we are authenticated, I'll just choose my test environment subscription. I'll hit "Apply." So I have many subscriptions in place. Now under my test environment subscription, I can see all of my storage accounts. I can see my block containers if I go to Data Store 2000. I can go on to my data container. I can see all of my image files. If I go onto Data Lake 2000, onto that storage account, onto block containers, onto my data container, onto my raw folder, I can see my JSON file; I can download the file. I can upload new objects into the container. So the Azure Storage Explorer is an interface that allows you to work with not only your Azure Storage accounts but also with your Data Lake storage accounts as well. We've now logged in as your administrators. There are other ways you can authorise yourself to work with storage accounts. One way is to use access keys. See, here we are seeing all of the storage accounts. But let's say you want a user to only see a particular storage account. One way is to make use of the access keys that are linked to a storage account. If I go back onto my Data Lake Gen 2 Storage account here, if I scroll down onto the Security and Networking section, there is something known as access keys. If I go on to the access keys, let me go ahead and just hide this. I click on "Show keys," and here I have key one. So we have two keys in place for a storage account. You have key one, and you have key two. A person can actually authorise themselves to use the storage account using this access key. So here I can take key one, copy itonto the clipboard in as your storage explorer. I'll go back on to manage accounts. Here, I'll add an account. I'll select the storage account; the account name and key are shown below. I'll proceed to the next one. I'll paste in the account key. I need to give the account's name. I'll go back to zero. I can copy the account name from here. I can place it over here, the same as the display name. Go on to Next and hit Connect. Now, here in the local and attached storage accounts, I can see my Data Lake Gen 2 storage account, so I can still have a view of all of my storage accounts that are part of my account as your admin over here. But at the same time, I can see only my data lake Gen 2 storage account. If I go onto my block containers, onto my data container, onto my raw folder, I can see my JSON file. If you want, you can even download the JSON file locally so you can select the location, click on the select folder, and the file will be transferred from the data lake to your Gentle Storage account. So this is one way of allowing users to authorise themselves to use the data lake Gen 2 storage account.
13. Lab - Authorizing to Azure Data Lake Gen 2 - Shared Access Signatures
Now in the prior chapter, I've shown how we could connect to a storage account that is basically our daily storage account using access keys. As I mentioned before, there are different ways in which you can authorise a data lake storage account. Now, when it comes to security, if you look at the objectives for the exam, the security for the services actually falls in the section of "design and implement data security." But at this point in time, when we look at your synapses, we are going to see how to use access keys and share access signatures to connect and pull out data from an Azure data lake generation 2 storage account. That is why, at this time, I'd like to demonstrate how we can use shared access signatures to authorise ourselves to use and as your data lake Gen 2 storage account. So going back onto our resources, I'll goon to our data lake storage account. Now here, if I scroll down, in addition to access keys when it comes to security and networking, we also have something known as a shared access signature. I'll go on to it. Let me go ahead and hide this. Now, with the help of shared access signatures, you can actually grant selective access to the services that are present in your storage account with an access key. So, remember how in the previous chapter we connected to a storage account using an access key? Now with the access key, the user can go ahead and work with not only the block service but also with file shares, queues, and the table service as well. So these are all the services that are available as part of the storage account, but if you want to limit the access to just a particular service, let's say that you are going to get the shared access signature onto a user and you want that user to only have the ability to access the block service in the storage account. So with that, you can actually make use of shared access signatures. What you'll do is that in the allowed servicesyou will just unselect the file queue and thetable service so that the shared access signature canonly be used for the block service. In the allowed resource types I need to give access ontothe service itself, I need to give access for the userto have the ability to see the container in the Blobservice and also have access onto the objects itself. So I'll select all of them. In terms of the allowed permissions, I can go ahead and give selective permissions. So in terms of the permissions, I just want to use it to have the ability to list the blocks and read the Blobs as your data lake generation 2 storage account. I won't do anything or give permissions when it comes to enabling deletion of versions. So I'll leave it as is with the shared access signature. You can also specify a start and end date. That means that after the end date and time, this Shared Access signature will not be valid anymore. You can also specify which IP addresses will be valid for this shared access signature. At the moment, I'll leave everything as it is. I'll scroll down here. It will use one of the access keys of the storage account to generate the shared access signature. So here I'll go ahead and click on this button for Generate SAS and Connection String." And here we have something known as the connection string, the SAS token, and the Blob Service SAS URL. The SAS token is something that we are going to use when we look at connecting onto the Diesel Gen 2 storage account from Azure Synapse. At this point, let's see how to now connect to this as your "datalike" Gentle Stores account using a Share Access signature. If I go back onto the Storage Explorer, what I'll do first is just right-click on the attached storage account, which we had done early on via the Access key. I'll right-click on this and click on "Detach." So I'll say yes. Now I want to again connect to the storage account, but this time using the Shared Access signature. So I'll go on to manage accounts. I'll add an account. Here I'll choose the storage account. And here I'll choose to share access by signature. I'll proceed to the next one. I need to give you the SAS connection string. So I'll either copy the entire connection string or go ahead and copy the SASUARL block service. Let me go ahead and copy the service's SAS URL. I'll place it over here. I'll just paste it. You can see the display name. I'll go on to next and I'llgo ahead and hit on Connect. So here you can see now, in terms of the data lake, that I am connected via the SAS Shared Access signature. And here, you can see that I only have access to the block containers. I don't have access to the table service, the queue service, or the File Share service. As a result, we're restricting access to the Blob service at the same time. Remember that I mentioned that this particular Share Access signature will not be valid after this date and time. So if you want to give some sort of validity to this particular Shared Access signature, this is something that you can actually specify over here. So I said the main point of this particular chapter was to explain to students what the concept of a shared access signature is. So there are different ways in which you can authorise yourself to use a storage account. When it comes to Azure Services, there are a lot of security features that are available for how you can access the service. It should not be the case that the service is open to everyone. There has to be some security in place. And there are different ways in which you can actually authorise yourself to use a particular service in Azure. Right? So this marks the end of this chapter. As I mentioned before, we will be looking at using a shared access signature in later chapters when we look at you as your synapses.
14. Azure Storage Account – Redundancy
Hi and welcome back. Now in this chapter, I want to go through the concept of Azure Storage account redundancy. So when it comes to Azure Services, they always build the service with high availability in mind. And the same is true when it comes to the Azure Storage account. When you store data in an Azure Storage account, for example, using the Blob service, multiple copies of your data are actually stored. This actually helps to protect against any planned or unplanned events. You can see your data when you upload it to an Azure Storage account. In the end, it's going to be stored on some sort of storage device in the underlying Azure data center. The data centre holds all of the physical infrastructure for hosting your data and providing services. And no one can guarantee that all physical infrastructure will be available at all times. Something can go wrong. Something can actually go down because there are points of failure. There could be a network failure; there could be a hard drive failure; there could be a power outage. There are so many things that can actually happen. So in such events, there are different redundancy options to keep your data in check. We had actually seen this redundancy option when you are creating an Azure Datalake Gen 2 Stores account. So if I go back onto Azure quickly, if I go ahead and create a new resource, I'll scroll down and choose a storage account. So here, when it came to redundancy, there were many options in place. You had locally redundant storage, georedundant storage, zone redundant storage, and geo-zone redundant storage. so many options in place. And I'm going to give an overview of what all of these options mean. So, first, we have locally redundant storage. When you have an Azure Storage account, let's assume the storage account is in the Central US. Location. When you upload an object onto the storage account, three copies of your data are made. All of this data is within one data center. So this helps to protect against server rack or drive failures. So if there is any sort of drive failure, So let's say one storage device were to go down within the data center; the other storage devices would still be available and have copies of your data. which means that in the end, your data is still available. So the lowest redundancy option that is available is locally redundant storage. But obviously, companies are looking for much more redundancy when it comes to critical data. So that's why there are other options that are also in place. So one option that is available is zone-redundant storage. in locally redundant storage. What happens if the entire data centre were to go down? That means your object will not be available. But in the case of zone-redundant storage, your data is replicated synchronously across three availability zones. Now, an availability zone is just a separate physical location, which has independent power, cooling, and networking. So now your object is actually distributed across different data centers. These data centres are displayed across these different availability zones. So now, even if one data centre were to go down, you would still have your object in place. But now let's say that the entire region goes down to the central US. That means, again, all your babyzones are no longer available. And as I mentioned, for companies that are hosting critical data, it is very important for them to have their data in place all the time, so they can opt for something known as geo-redundant storage. Now, what happens is that your data is replicated in another region altogether. So, if your primary location is in the central United States, three copies of your data are created using the LRS technique. That's the locally written and stored technique. Simultaneously, your data is copied to another paired location. So over here, the Central US location is actually paired by Azure with the East US location. So now your data is also available in another region. And over here again, in this secondary region, your data is copied three times using the LRS technique. As a result, even if the central US location went down, you could still access your data in the east US location. So in the background, the storage service will actually do a switchover from the central US location to the east US location. So we have a lot of replication options and redundancy options in place. But remember, in all of these options, cost is also a factor. Over here, you'll be paying twice the cost for storage. So by storing your data in a prime location and storing your data in a secondary location, you will also be paying for bandwidth costs. So the data transfer that is happening from the primary to the secondary location is something that you also need to pay for. When I said that for large organisations that require data to be available at all times in order to function properly, the benefit of having this in place far outweighs the cost of having geo-redundant storage in place. So it all depends on the business requirements. Now, another type of geo-redundant storage is read-access geo-redundant storage. Here, the primary difference is that if you look at plain georedundant storage, the data in the secondary location is only made available if the primary region goes down. When you look at Access Geo Redundant Storage, your data is available in both the primary and secondary location at the same time. So your applications can read data not only from the primary location, but from the secondary location as well. So this is the biggest difference. And then we have something known as Geozoned Redundant Storage and Read Access GeoZone Redundant Storage In GeoZone redundant storage, the primary fact is that in the primary region itself, your data is disputed across different Availability Zones. If you actually look at plain geo-redundant storage here in the primary region, your data was copied three times using LRS. But in the zone of redundant storage in the primary region, your data is copied across multiple availability zones. So over here, the data is actually made much more available in the primary region, whereas in the secondary region it is again just replicated using LRS. So again, there are different options when it comes to data redundancy. So I said that if you go onto your storage account, you can actually go ahead and basically choose the redundancy option that you want for an existing storage account. If I go on to all resources, I go onto my viewif I go on to my dealerlike Gen Two storage account. So currently, the application technique is locally redundant storage. If I go ahead and scroll down, if I actually go on to the configuration, this is under settings over here. In terms of the replication, I can change it onto either geo-redundant storage or read-axis geo-redundant geo redundant Storage.Over here, I can't see zone redundant storage because there are some limitations when it comes to changing from one replication technique to another. There are ways in which you can actually accomplish this, but at this point in time, when it comes to our data in a Gentle Storage account, these are the options that we have when it comes to changing the replication technique. Right, so in this chapter, I justwant to actually go through data redundancy.
15. Azure Storage Account - Access tiers
Hi and welcome back. Now, in this chapter, I want to go through the Access Tier feature, which is available for storage accounts. So if I go out and create a storage account So please note that it's also available for Deer Lake Gen 2 storage accounts. If I go ahead and scroll down and choose storage account, if I go on to the advanced section, and if I go ahead and scroll down here, we have something known as an Access Tier feature. Here we have two options: We have the hot access tier. This is used for frequently accessed data. And then we have the cool access tier. This is used for infrequently accessed data. We also have a third option that is known as the Archive Access tier. So this is good for archiving your data. This is not available as a basic feature at the storage account level. This is available at each individual blob level. So if I go on to our existing "Deederlike Gentle" storage account, yeah, if I go on to my containers, if I go on to an existing container, if I go on to Directory, if I go on to one of the files that I have here, I have the option of changing tier. And in the tier, I have the Hot, the Cool, and the Archive Access Tier. So this is an additional tier that is available at the blob and storage account levels; if I return to the storage account, go to the configuration settings for the storage account, and scroll down here in the blob access tier, the default is the hot. I told you we could go ahead and select the Cool access tier. So what exactly are these different access tiers that are actually available for this particular storage account? So when it comes to your storage account, and I said that this is also applicable when it comes to your data lake In a Gen Two storage account, one thing that you actually pay for is the amount of storage that you actually consume. Now, here I am showing a snapshot of the pricing page that is available when it comes to storing your objects in an Azure storage account. Here you can see the different access tiers, and you can also see that the price becomes less when you are storing objects in either the code or the archive access tier. In fact, it's very low when it comes to the archive access tier. And when it comes to a data lake, remember that I mentioned that companies will stow lots of data. So you're probably talking about terabytes and even petabytes of data in a storage account. And storage becomes critical at that point. The storage cost becomes very important. So that's why you have these different access tiers in place where companies can actually go ahead and look at reducing their storage costs. If they have an object that is not accessed that frequently, they can actually change the access tier of that object to the Cool Access Tier. And if they feel that the object is not going to be accessed at all but they still need to have a backup of the object in place, they can go ahead and choose the Archive Access Tier for that particular object. And I mentioned that the Archive Access Tier can only be enabled at the individual Blob level. So then you might ask yourself, why can't we just archive all of our objects? because the storage cost is less. And that is because of a caveat that is there if you store an object in the Archive Access Tier: if you store an object in the Archive Access Tier, if you want to go ahead and access that object again, you must perform a process known as rehydration. so you have to rehydrate that file in order to access the file. So you need to go ahead and actually change the access tier of the file either to the Hot or the Cool Access Tier. And it takes time to rehydrate the file. So if you need the file at that point in time, you should not choose the Archive Access Tier. You should choose either the Hot or the Cool Access tier. Next, when it comes to the pricing of objects in the Hot, Cool, or Archive access to you. When it comes to your storage account, there are different aspects to the cost. One aspect is the underlying storage cost. The other aspects are the operations that are performed on your objects. For example, over here again I am showing a snapshot of the documentation page. When it comes to the pricing here, you can see that when it comes to the read operations, the read operation of an object in the Cool Access Tier is much higher than an object in the Hot Access Tier. And it gets even worse for objects in the Archive Access tier. Next is a concept known as the early deletion fee. Now, the Cool Access Tier is only meant for data that is accessed infrequently and stored for at least 30 days. If you have a block in the Cool Access Tier and switch to the Hot Access Tier before 30 days, you will be charged an Early Deletion fee. The same thing goes for the archive access tier. This is used for rarely accessed data that is stored for at least 180 days. and the same concept goes over here. If you have a blob in the archive access tier and you change the access tier of the blob earlier than 180 days, you are charged an early deletion fee. So when you're deciding upon the access tier of a particular object, you have to decide based on how frequently that object is being used. If the object is being used on a daily basis, you should choose the Hot Access tier. If your objects are not being accessed that frequently, you can go ahead and choose the cool access tier. If you want to archive objects, you can do so by selecting the Archive access tier. Now let's quickly go on to Azure. I'll go on to my daily Lake Gen Two storage account. I'll go on to my containers, I'll go on to my data container, I'll go on to my raw folder, I'll go on to my object, and here I'll just change the tier to the ArchiveAccess tier and I'll go ahead and click on Save. So, remember that we are now saving money on storage costs for the file. But here you can see that this blob is currently being archived and can't be downloaded. You have to go ahead and rehydrate the blob in order to access it. So here you are if you want to go ahead and access the file. Because even if I go on to edit, I will not be able to see the contents of the file. So I have to go back into my file, and here I have to go ahead and change the tier. I have to change the tier to either the Hot or the Cool Access tier. If I choose either tier, you can see that the remainder has a rehydrate priority. You have "Standard" and you have "High. In Standard. The object will take some time to be converted back to the Cool access tier. You can see that it may take up to 7 hours to complete. If you choose "High," then it could be completed at a much faster pace. But in either case, it will still take time. So if you have an object that needs to be accessed at any point in time, don't choose the archive access tier. So I'll just go ahead and cancel this. So in this chapter, I want to go through the different access tiers that are available for the block service in your storage accounts.
16. Azure Storage Account - Lifecycle policy
Now, in the last chapter, I talked about the different access tiers when it comes to objects in an Azure storage account. Now in this chapter, I want to go through a feature that is available when it comes to data management, and that's lifecycle management. Now, when a company has millions of objects in a storage account, if they want to ensure that some of those objects are basically moved from the hot access tier to, let's say, the cool access tier, they don't want to actually manually go and change the access tier of each object individually. They also don't want to have the burden of creating a script that will go ahead and check when the object was denied access and then change the access tier. What they can do is actually make use of this lifecycle management feature. This Lifecycle Management feature can actually automatically check when an object was last accessed. And based on that access time frame, it can go ahead and change the access tier of that particular object, and it will do this on a regular basis. So you don't need any manual intervention to change the access tier of a particular object. And this is very useful if you have a lot of objects in your daily and gentle storage account, which you most probably will have.I'll just show you what actually goes into creating a rule in lifecycle management. So here I'll add a rule. Yeah, I can give a name for the rule in terms of the rules and apply the rule to all blobs in a storage account. In terms of the Blob type, I can leave it as it is in terms of block blocks; these are the blocks that we actually upload onto our storage account. When it comes to blocks, you can actually create versions, different versions of your blocks, and even snapshots. And these rules can apply both to your baseblock and to your snapshots and versions. I'll go on to the next, and here is where you can add that particular logic. So here, you can say that if the blob was not modified during the last 30 days, that means the blob is not being frequently accessed. Then what we can do is that we can have a logic saying, "Please move that blob onto the cool accesstier," and then we can go ahead and click on Add. And likewise, you can have multiple rules in place. You could have one rule: go out and probably move the Blob onto the archive access tier. If you go on to the code view here, you will see the code representation of that particular rule. Here you can see that the action is to move that block onto the cool access tier if it has not been modified in the last 30 days. So the lifecycle management rules just offer you a better, much easier, and more automated way to move your objects from one axis tier on to the next.
Pay a fraction of the cost to study with Exam-Labs DP-203: Data Engineering on Microsoft Azure certification video training course. Passing the certification exams have never been easier. With the complete self-paced exam prep solution including DP-203: Data Engineering on Microsoft Azure certification video training course, practice test questions and answers, exam practice test questions and study guide, you have nothing to worry about for your next certification exam.