1. Data Geo-Replication
So one of the big advantages of cloud computing is its global nature of it. When you create a virtual machine, you can create it in any region of the world. And in fact, you can create your multiple virtual machines in multiple regions of the world, and you can really get global distribution for your applications and even your storage files and other things using cloud computing. It’s very difficult if you’re running your own data centre within your own company to achieve that kind of scale unless you’re a global multinational and you’ve got that kind of money. Even if you work with datacenters and hosting providers, you need to work with multiple data centres and multiple hosting providers.
And, without a doubt, some coordination is required. One of the difficulties of being a global application is having your data, your databases, located and replicated globally. This is the concept of geographic data storage. Now, even with storage accounts, which are not necessarily for data storage but for file storage, storage accounts give you the option of having your files globally replicated. The advantage of this is that if there’s a regional outage—let’s say there’s a massive storm—it knocks out a significant part of the United States in terms of electricity, power, the Internet, and stuff like that. If you’re using georedundant storage, your files are located in another region of the world, which is hopefully not affected by that localised storm and power outage. And then presumably, you can get up and running again and get your applications up and running in the new region while you’re waiting for the eastern United States to come back online.
The same ability to do redundant storage across storage accounts is available for Azure SQL Database and Cosmos DB as well. So, what can Azure SQL Database do for you? is that you can specify a second, third, fourth, or even more regions around the world where Azure will keep a copy of your entire database. And then Azure will also replicate that data from your primary location to these backup locations. same thing with Cosmos DB. You choose your primary location. You can choose where to put the secondary and primary locations, and Azure will take care of keeping them synchronised with each other. You can do this in a multimaster. They call it a multimaster configuration where the North American Cosmos DB receives data, inserts it, and stores the table, and the European Cosmos DB also does that. And Azure takes care of making sure that all data is replicated everywhere and handling conflicts. If that does happen, we can see on the screen an application design that shows the type of configuration. Now, this is an active standby configuration. At the top of the screen, we have an active region that contains a Web service. It contains a web app, a queue, a function app, a redis cache, and it also contains a SQL database and a Cosmos DB.
Now that’s located in one region. Like we said, if we have a regional outage—some sort of massive power outage that takes out that whole region or some other bad deployment—then we want to have a standby region, which is handled through traffic management on the front end, where customers coming in are going to get directed into the standby region. If the active region is no longer accessible, and the databases. Now, the app service plan and the function app are pretty easy because you’re not changing that code very often. When you do, you can just deploy to both locations. Databases, however, are constantly changing by nature, and so keeping them replicated is not something you can do yourself.
You must configure that geo replication within SQL Database or Cosmos DB so that we can see it. Once you’ve configured that, you have your active region and your standby region, and when that active region fails, the data has already been replicated up until the moment where it fails. Now with Traffic Manager, it does take a few minutes for Traffic Manager to recognise the failure and switch people over to the new standby region. So you may experience a five- or ten-minute downtime, but hopefully you’re not losing any data because your active region is no longer available. There’s going to be a very low amount of downtime and lost data for that. So you can do your failover from your active region to your standby region, and your standby region just picks up and continues on with the SQL database up-to-date, a Cosmos DB that’s up-to-date, etcetera.
2. Data Encryption
So we all know how important security is in this day and age, especially when it comes to data. And so in this section of the course, we’re going to be talking about data protection strategy. One of the first things that comes to mind when we talk about data protection is how we are encrypting the data. And so this is what we’re trying to talk about: encryption strategy. Now, Microsoft does a pretty good job already with Azure SQL Database and Cosmos DB, and the data is encrypted on the disc already. So this is what’s called “transparent data encryption,” or TDE. What that means is that you execute an insert statement into the Azure SQL database. You send that data over the wire, unencrypted, even though you’ll do it over an HTTPS connection. But you send that data in plain text over to Azure SQL Database, and Azure SQL Database will store that in its data files. And those data files are encrypted as they sit natively on the disc natively.
So if someone was able to grab a copy of that hard drive, find one in a dumpster, or take a copy of it, it’s not going to do much good because all of those files are encrypted. They’re going to need a copy of the decryption key, which they don’t have. Microsoft keeps that decryption key separate. Now, there is a way for you to control the encryption key. So as a higher level of security, you’re going to probably want to set up your SQL databases, Cosmos DB, and even your storage accounts such that you control the encryption key. You put that key into the Azure key vault, and then, that way, that key is under control. Not even Microsoft can decrypt your files; only you can with your Azure key vault. So that’s data at rest and data in transit. You should always use HTTPS. And so you’re going to want to set a setting to say that we only ever connect to Azure SQL Database, Cosmos, DB, and storage accounts using an SSL connection. We will not even allow a non-SSL connection.
That way, the data travelling between Azure and your own client network, your own application, is going to be encrypted and can’t be intercepted along the way. And so you’ve got your data stored at rest, encrypted using transparent data encryption, and anyone who interacts with the data that’s encrypted between Azure and their client can see it. Now, for another whole level of paranoia, there is this thing called “Always encrypted.” To read data from a SQL database, for example, always encrypted data necessitates the use of a special client. But in that sense, the data is never unencrypted until it gets onto your computer. And so if you’re using the special “Always Encrypted” setting, it goes from your computer to Azure encrypted, then to the database encrypted. If you want to read it, it comes back from the database over the wire. It never sits in memory or goes over the wire in any kind of unencrypted state. It necessitates the use of a specialised client, such as Isaid, but this is the always-encrypted setting. It does require a client.
Now, another layer of security fordatabases is called dynamic data masking. And so if there are select fields, if you’re looking at a data table and you’re like, “Well, that’s the credit card number,” we never want to show that credit card number. You can set up dynamic data masking on that field, except for administration purposes or in extremely rare circumstances. Now, I’m going to step back and say, “Never ever store a credit card number in a data table in plain text.” It’s probably a bad practise in any way, shape, or form, but let’s call it their email address. So do you want your email addresses to be exposed when someone makes a selective statement? Or should it always be masked with dynamic data masking? And only in certain circumstances do we need to unmask that. So for sensitive fields, you should not always store credit cards this way, but for really sensitive fields, you may want to enable dynamic data masking at the data table level so that certain fields are hardly ever exposed, even to people running queries in the query window or to the applications.
3. Data Scaling
So, next up, we’re going to want to talk about scaling. How do you scale databases like SQL Database and Cosmos DB? Now, this is a graphic taken from the Azure website. It’s kind of inaccurate because there’s a lot of overlap between the standard and premium M tiers. And so really, those lines have to run side by side and not be one after the other. However, you can start with a simple 5-DTU database and work your way up. All the way up on the right, the graphic shows 4000 DTUs, which is 800 times more powerful than the basic 5-DTU database. Now, as I said previously, DTU’s is not the only pricing model for SQL databases. You can also get V cores without paying for the actual hardware. But we can see that, basically, the SQL database is designed to be scaled. So if you’re sitting at one of these levels, say the P 4500 DTU level, and you want to get to the next level, you go to P 6, which is 1000 DTU. That should be double the performance.
Now, this is a little bit of a disruptive operation, in which case you go from 500 to 1000. There’s going to be a small amount of downtime while the server switches over. So it’s going to be working at 500 at one point, and then blah, blah, and it goes to 10. So it is a little bit of a disruptive operation. However, Azure SQLDatabase can be scaled by simply switching plans. Now, this is a manually scaled system, not an automatic one. So you actually have to go and make the decision that you want to go from 500 to 10. And it’s not like a sequel; it’s not like an app service where it can detect CPU utilisation and automatically grow and shrink. Another popular method of scaling a database is to use the read scale out concept rather than simply growing to a larger plan. That means you have a copy of the database; we’re just talking about Georgundancy. So you have a geo-redundant version of the SQL database, and you use that database whenever you know you need to read from the database. As a result, you can focus your rights, updates, and inserts into the master database. You can also direct your reads to the secondary database.
And in this way, you can actually take some pressure off the master database. You’re always reporting your displaying of values via the home page to the web pages. You just need to do a select statement, which involves a lot of applications and is quite frequent. You leave the primary data store and go to the backup secondary data store. And this is something you need to do within your application. So you’re going to need to store both URLs in your application, and you’re going to need to make that determination. This is my reading server. And this is my writing server. Another option for scaling databases is what’s called “sharding.” Now, the concept of sharding means that you’re going to make a logical division in the data and store some of it in one database. and store another set of data in another database.
So, for instance, you can store all of your North American customers in your North American database and all of your European customers in your European database, et cetera. And the advantage of that is that you’ve got these multiple servers, each of them storing, hopefully, part of the data. And if your application is intelligent and knows it’s dealing with a North American user, it goes to the North American database. Or it knows to go to the other one. The other idea is to have some kind of mapping where you can put your customer numbers in an account and it will just tell you where the rest of their data is stored from that point forward. So breaking up your database into multiple databases and then storing those in other geographic regions or even just in a secondary data store is another way of extending the performance of your database without having to scale it.
4. Data Security
Now, it’s all well and good to have your data protected, encrypted, and scaled. But how are you going to access this securely, right? How are you, as someone working on client applications, going to allow your application to get access to it while protecting it from other applications that are trying to get access to it when they’re not authorized, hiking, protected from hackers, et cetera? So one option to do this is to realise that you don’t need to have your data on the Internet, right?
So even though Azure SQL databases by default have a public endpoint, if you don’t anticipate ever needing to access that public endpoint, it’s actually a security risk to have it exposed. And so you might want to investigate this concept called “Virtual Network Service Endpoints,” where you’re basically restricting your Azure SQL Database or CosmosDB to a specific virtual network. And so you’re basically adding your database as a resource on that network. And with NSG security settings, only resources on that network or those who have access to cross the NSG threshold can get access to that SQL database. So if you have a web app and you have data in an Azure SQL database, both of those things can be attached to a virtual network, and you can ensure those two things have secure access to each other while no one outside of the network has access.
That could be a scenario. Another way to protect data is with a SQL database, which uses a firewall concept both at the server level and at the database level. And so you can basically blacklist all access from anywhere in the world and only whitelist certain IP ranges. So you can say, “Well, we’re in this office; our traffic always comes from here.” That’s the only IP address allowed to access this. That’s not perfect security, but basically knocking off your SQL database from Access for 99.9% of the world does significantly improve security when you’re dealing with SQL. We know from the SQL Server world that there’s a difference between SQL authentication and Windows authentication. Within Azure, you don’t necessarily have Windows authentication, but you do have Azure AD. And so you can create an application that has a managed service identity, a service principle in effect within Azure, and then that user is what gets authenticated to use the database. And there are no user IDs or passwords at all. It’s just the managed service identity.
With SQL authentication, there’s a user ID and password, and that is traditional logging in with user X and password Y that gets stored in a connection string, et cetera. So your choice of authentication method could also affect the security of your data. SQL Server and SQL Database have a concept of “row level security” that’s pretty granular, right? So if you’re looking at a data table and you’re saying, “Well, user Joe only has access to customers in Ontario,” and so you’re writing a rule that says the province code must be on. And user Joe has granted read access to that and denied access to all others. That’s row-level security. I haven’t seen that a lot in practise, but in terms of when you’re designing your applications, are you letting users into your database and relying on row-level security to block their access? To me, it’s a bit of an odd set up.You probably want to have an API that sits in the middle, and that API makes the decisions about who sees the data. A lot of people don’t know this, but Azure has a fancy security protocol for databases called Advanced Threat Protection, or ATP.
Now this is an additional option. You have to pay for it. It’s not free, and it’s available in a bunch of other products. But let’s say SQL databases are one of them. And what that does is check the traffic coming through against some known patterns of bad behavior. And so, for instance, as an example of what ATP can do, let’s say there’s a database of known hackers. So this group of hackers has been attempting to break into databases around the world, and that list of IP addresses has become known to Microsoft. Well, then they will block access from those IP addresses to your SQL database, for instance. So that’s a database list of known hackers. And that’s one way Advanced Threat Protection can protect you. Again, pattern matching and intelligence are involved. But basically, if you’re wanting Microsoft’s help in terms of stopping the bad guys from getting access to your databases, ATP could be one of the options. Now all of these databases—a single database and Cosmos DB—push off logs, and Azure Monitor integrates with them so that you can actually run reports on these things. You can put charts up and set alerts, so if you get five incorrect passwords in the past hour, it sends you a text message or something like that where you can see something weird going on. So go to Azure Monitor.
Azure Monitor has a single database hookup, and you can actually look at the diagnostic logs and the event logs that are coming off of that. That was stated in the previous video. However, make sure that all database and day options require an SSL connection. Really, there’s no excuse in 2019 or 2020 to allow insecure connections. There are very few places where HTTP traffic would not be allowed across the router or to leave the internet. So turn on SSL; that will stop people in the middle of attacks, or people along the way, from logging your traffic to be able to see what you’re passing back and forth. And we talked in the last video about using dynamic data masking to protect sensitive fields. The benefit of this is that if you have a low-level account, say, a standard user, you can run a report against the data table, but some of the columns will be marked with an asterisk and will be unavailable to you. You won’t be able to see the email address. You won’t be able to see the phone number. Maybe you can see the first name, but you can’t see the last name. And so this data masking gives them anonymity over your data that is not required for the person to do their job.
5. Data Loss Prevention (DLP)
So we’ll wrap up this section by talking about strategies to protect your data. In the AZ 301 requirements, Azure refers to this as Data Loss Prevention, to which I will sarcastically respond. Well, these are policies designed to prevent data loss, of course. So one tip to prevent data loss is to look at your data, catalogue it across your organization, and identify what is sensitive data. So if there is an anonymized log that just contains the times that people logged in and out but no identifying information, that might not be sensitive at all. Whereas if you have a table of customers’ names, emails, addresses, phone numbers, and a historical list of everything they’ve ever ordered, well, that could be extremely sensitive.
And so knowing what’s sensitive and what’s not sensitive will help you make the determination as to what steps you have to take to protect that data, et cetera. Now, once you’ve identified it, you may want to look at certain standards that exist. So personally identifiable information is basically anything that’s personally identifiable. If you look at a customer’s email address, well, that would identify them if you knew my email address and you knew who I was, whereas a postal code by itself is not personally identifying either at all. So we can look at different pieces of information and say, “Are they going to identify a person or are they not?” The other standard is the Payment Card Industry Data Standard (PCI-DSS), and so anytime you’re dealing with credit cards, you should be looking at the PCI standards for handling credit cards. I have a personal rule when I’m designing systems: I abhor the idea of storing a credit card. And so I think that through my career, I have successfully ever had to store a credit card number, even working with retail sites and some really big brands—even working with Visa, which I worked with for a couple of years—I didn’t have to create a system that stored the credit card number.
And so don’t store what you don’t need, right? Even if you’re going to have access to a lot of information, you want to be realistic about what it is that your application needs and what you may reasonably need in the future. However, some software has short expiration dates or does not even store passwords. I mean, this is the embarrassment of many system designers over the past ten or 20 years when we get a major leak. When you hear the news that some big company had its database leaked, the first thing that people ask about is whether the passwords were in plain text. And it’s a very, very sensitive subject. If your database was leaked, that’s bad enough. But if your passwords are properly encrypted and salted, and it’s going to be very difficult or impossible for them to reverse-engineer your passwords out of that encryption, well, good for you. You’re going to get small bonus points for having an impossible-to-crack encryption algorithm in that data. So at least the people’s email addresses and other information got released. But no one will ever know the passwords. Whatever you do, don’t keep the security key for your password encryption in your code, in your config file, or on the server somewhere. So if you’re going to encrypt something, that’s great, but set it up so that the key to decrypt it is not being stored alongside the data. In fact, when dealing with stuff like passwords, it might be even better to hash it instead of encrypting it.
And so hashing it is: when you use its one-way function, it can turn a string into another string, but it’s not possible to go back from what you’ve turned it into. And so with a password, you can hash it, store the hash, and then when someone tries to log in, you hash what they’re logging in, and you compare the two hashes together. So hashing it out is a good strategy. Security is best done on a “need-to-know” basis. The minimization of security is less important than the principle of least-preapproved. As a result, avoid giving people too many permissions. And we’ve mentioned it before in this course, but minimising permissions on a person-by-person basis using Azure’s Access Review function to ensure that everyone who has access to a resource requires it. And it might be preferable to allow temporary access for someone who needs to access a system but only for that afternoon. Well, don’t give them full permissions that never expire. Give them seven days to get access to that resource, and have it automatically expire after that time. So take advantage of a temporary permission and not a full-on escalation for every little time that they need one-time access to something. Azure has a technology called Azure Information Protection.
And Azure Information Protection is sort of like a DRM (digital rights management) system for your information, your documents, and your data. It does try to prevent you; if you set it up to disable email forwards, then within Outlook365, the person won’t be able to take that document and forward it on to another person. If it prevents printing, it can prevent printing. Azure Information Protection is basically a digital rights management system that can be attached to documents. You can look at this as another way of protecting your information. If you make it difficult to do something, then only really malicious people would go around that protection. Now that the GDPR has been in effect for a couple of years, We actually saw in the news the other day that the first cases are starting to come to court over GDPR. Basically, this means not taking the protection of data seriously. And so you need to have a data controller. There must be data processing, reporting requirements, and disclosure. So this is a law, and it’s starting to become more serious in terms of how Europe is handling people’s data.