4. Big Data (OBJ 1.8)
In this lesson, we’re going to discuss the impact of big data on enterprise security and privacy. So what is big data? Well, big data refers to data sets that are so large or so complex that traditional data processing applications are not sufficient to handle them. This data can be either structured or unstructured, and in our enterprise networks, we’re often buried in the amount of data that we could potentially collect in the operation of our businesses. When enterprises try to tackle the challenges associated with big data, they have to consider the three vs. Volume, velocity, and variety, because this does a good job of explaining the challenges that we’re going to face. The first V is volume. This refers to the sheer amount of data that’s being collected, processed, and stored by our systems.
The second v is velocity. This refers to the rapid influx of the data being collected, processed and stored by all of our systems. The third V is variety. This refers to the structured and unstructured nature of the data being collected, processed, and stored by our different systems. This can include text, audio, images, videos, or even encrypted data sources. Now, as big data moves through three main phases in its life cycle, it goes from generation to storage to processing. First, we have data generation, and this occurs from multiple sources and in multiple formats. Data generation can occur actively or passively. In active generation, the data owner will give the data to a third party.
But in passive generation, that data is simply generated due to the data owner performing normal online actions such as browsing a website or the website is collecting time spent on a given page or the buttons clicked, or other actions like that. To aid in the protection of privacy. During this phase, access restrictions and falsifying data techniques like anonymization can be used. Now, access restrictions can also be used by the data owner if they believe the data could uncover sensitive information that should not be shared. In this case, they’re going to refuse to provide that data to somebody else. For example, if you go to a website and ask if you want to allow only functional cookies or feature cookies or all cookies, you’re allowing or rejecting the level of cookies by setting those access restrictions.
If the data is being passively generated, though, the data owner is going to have to take additional actions to ensure their privacy. Things like using anti tracking extensions, script blockers, encryption tools, and anonymized. Browsing. Now, falsifying data involves distorting the data prior to it being received by that third party. For example, if you’re browsing the Internet and you’re using a Tour client, you’re going to be effectively distorting your real identity as the packets are routed and rerouted along different access points across the Tor network. Now, the second phase of big data lifecycle is going to occur when the data is stored. Storing big data used to be a big challenge before the migration to the cloud and the advancements in data storage technologies that have come up because of that.
Now, these days, our bigger concern that enterprises need to consider is how to best protect their big data stores, since a compromise of one of those stores could be truly disruptive because of the amount of personal information contained in most of these big data storage units. Often, big data is going to be segregated into different categories for storage, and this allows for a more distributed environment on several data sets, which does add to the complexity of implementing privacy protections for this data, but it does separate the data out into pieces. Now, to best protect the data, data owners need to implement file level data security schemes, database level security schemes, media level security schemes, and application level encryption schemes.
If you’re using the cloud to store your big data, you should also consider using attribute based encryption, homomorphic encryption, storage, path encryption, or rely on a hybrid cloud model for additional privacy. The third phase of the big data lifecycle occurs when data is being processed. Now, processing big data is usually done in one of four different system categories batch, stream, graph, and machine learning. Privacy protection is conducted using big data processing by dividing the protection into two phases. The first phase is to safeguard the information from an inadvertent or unsolicited disclosure, since the big data being processed could contain sensitive or private information about the data owner.
The second phase is to extract meaningful information from the big data being processed without violating the privacy of the data owner. Now, to achieve this, the process of deidentification can be used. Deidentification is used to protect an individual’s privacy by sanitizing the data with generalized information or suppressing some of the private values before you release the data for processing or data mining. Now, with deidentification, though, you do have to worry about reidentification if you’re not adequately deidentifying your bid big data as you begin to process it.
5. Blockchain & Distributed Consensus (OBJ 1.8)
Lesson. We’re going to discuss the blockchain and the concept of distributed consensus. Most of us have heard of the blockchain by now, as it seems to become the big buzzword over the last five years in security. When people hear the word blockchain, most of us are going to immediately conjure up thoughts of bitcoin and cryptocurrency. Now cryptocurrency is a modern implementation of cryptography that relies on the blockchain and it’s a digital asset designed to function as a medium of exchange, essentially a digital version of money.
Now cryptocurrency has become quite popular in recent years due to the fact that it is decentralized and does not rely on any one government or economy for its strength. Cryptocurrencies rely on a peer to peer network to process transactions through a distributed ledger known as the blockchain. Each piece of a digital currency is simply the result of an encryption process that relies upon public key cryptography. Now, it’s important to realize that the blockchain is used for more than just cryptocurrencies, though. But for right now, let’s take a quick look at how the blockchain works in cryptocurrencies. Before we explain how blockchain can be used in enterprises and other organizations, let’s assume that I want to send some money to you.
First, I’m going to create my transaction and it’s going to be packaged up as a block. Next, that block is going to be sent out over the peer to peer network to others on the network and they’re going to approve the transaction. This is why we call cryptocurrencies a decentralized currency, because each transaction is processed by other peers on the network, not a single central authoritative server. This is the decentralization of the transaction processing that’s going on in the real world. Now, the block that represents the money that I’m going to send from me to you is then added into the blockchain, which is this ledger of every transaction that has ever been made for this particular cryptocurrency.
This new block that we’re going to do is going to be added into the blockchain and it’s open for everyone to see. But its integrity is maintained throughout the blockchain because of this decentralized approval process. Now, once the transaction is added to the blockchain, the money that I sent to you goes into your digital wallet and it can be removed from my digital wallet. Now, this is a huge oversimplification of the process, but those are really the basic steps to it.
Each cryptocurrency utilizes its own method of calculation and addition to the blockchain based upon its architecture and algorithms. But as I said, the blockchain isn’t just for cryptocurrencies. The blockchain can be used to record any type of data we want because it’s essentially just a specific type of database. Instead of saving everything to a centralized database, though, we’re going to use the blockchain and it’s going to allow us to store data in blocks that are chained together in a chronological order and they’re considered immutable. Now, immutable simply means that anything stored on the blockchain is irreversible, so it’s going to exist there forever and it’s open to inspection and viewing by anybody else who has access to this blockchain. This becomes a type of permanent ledger. Because no single user has control of the ledger, it means that everything is decentralized, that all users collectively retain control and nothing can be changed. Now, because of this immutability, blockchain technology has been implemented in many different types of enterprise environments where data needs to be stored and become immutable. Now, for example, the IBM blockchain was created to add greater visibility and efficiency to supply chains for large logistic companies by keeping an immutable record of every shipment, its contents, its location, its destination, and more. Another common application of the blockchain is to use it to record smart contracts between different organizations. Because again, it’s immutable and irreversible.
It becomes this ledger of all the contracts that have ever been recorded. Now, one of the strengths of the blockchain is that there is no centralized database to hack into or to put at risk. Instead, the blockchain is protected against unauthorized intervention and fraudulent data manipulation because of this idea of distributed consensus. Also, the blockchain’s distributed architecture removes the need for trusted architecture by shifting that trust from a centralized server to a peer to peer group of participants. Distributed consensus is the process in which members of the group collectively achieve agreement about a single data value without the benefit of a centralized unit.
This fault tolerance mechanism is going to be used to achieve the necessary agreement on a single data value or single state of the network among the distributed processes or multiagent systems. Since the blockchain is built as a distributed system and it doesn’t rely on the single centralized authority, the peers need to agree on the validity of all the transactions by using a consensus algorithm. This agreement on validity usually comes in the form of proof of work, proof of stake, or proof of authority, depending on the algorithm being used by a particular blockchain application. When it comes to privacy, the blockchain can either support privacy or it can be at odds with it. To support privacy, the blockchain uses public and private keys to encrypt the data being stored within the blockchain for each user and each node.
But because the blockchain uses the distributed peer to peer network architecture, this actually places it at odds with the general Data protection regulation known as GDPR, because that regulation was written with a traditional centralized, controller based data processing model in mind initially and not one of these consensus building blockchains. Now, while there are some blockchain enthusiasts who claim that using public and private key encryption helps to preserve anonymity and privacy, this view of personal and private information is not really going to hold up under GDPR and similar regulations. This is because the public key that’s going to be available to anybody who’s appear in this blockchain architecture contains a tokenized form of personal information about the owner of that block lock, which is not truly considered anonymized data since it could be de anonymized by collecting additional information over time.
6. Passwordless Authentication (OBJ 1.8)
In this lesson we’re going to discuss password list authentication and an attack known as biometric impersonation. Now, password list authentication is an authentication mechanism in which a user can log into a computer system without entering and having to remember a password or other knowledgebased secret. There are many different types of password list authentications in use today, most relying on a public key cryptography infrastructure with a private key being stored on a user’s device. In general, passwordless authentication factors fall into two types of factors ownership factors and inherent factors. Now, ownership factors include something that the user has in their possession. This can be their smartphone, a onetime pin token, a smart card, or another type of hardware token. Inherent factors include something the user is and it’s usually going to be a biometric identifier such as their fingerprints, a retinal scan, a facial recognition, voice recognition, or other things like that.
While these are the most common replacements for a password, any type of authentication system that no longer relies on a password could be considered a passwordless authentication system. For example, some organizations have begun to implement a passwordless authentication system that relies on the user entering in their email address to log in. When they do this, the system sends them a one time use login link to their email that’s valid for only five minutes. If the user doesn’t click that link within the next five minutes, the link becomes invalid and it’s no longer usable. If they do click the link, it logs them in again. Here, the user is not required to enter a password to log in, only their email address, which in this case is acting as an ownership factor and this makes it a passwordless authentication system.
Now, passwordless authentication systems have many different benefits such as greater security, a better user experience, reduced it costs, better visibility into who’s using a specific Credential, and scalability. In general, passwordless authentication systems are more secure than password based authentication systems because you’re removing the knowledge factor that could easily be stolen or guessed by an attacker. Passwords have been determined to be the top attack vector for hackers, and they’re responsible for a tremendous amount of security breaches over the past few years. Passwordless authentication systems also provide a better user experience because the end user doesn’t have to remember a complicated password or renew or change those passwords at frequent intervals.
These systems can also lead to the reduction in it costs because there’s no longer a need for password storage or management because all of that is going to be taken away along with the back end auditing involved with validating the security of your password management program. Also, passwordless authentication is going to allow for better visibility in terms of your Credential use because the user’s credentials are more tightly paired with a specific device or an inherent attribute that is more likely to remain with the user at all times, like their fingerprints, because they’re always with the user. Finally, password list authentication also leads to better scalability, because the users can manage multiple logins without having to remember additional passwords, because a single token can be used across multiple systems, platforms or applications.
Now, unfortunately, there are a few downsides to relying on passwordless authentication though, such as higher implementation cost, the additional training and expertise needed, and the fact that this can provide a single point of failure since all of your systems are relying on a single passwordless authentication factor, like a token or a similar device. For example, if you’re relying on a biometric authentication factor for your passwordless authentication system, you’re now going to have to worry about biometric impersonation, as this could render all of our security useless. Biometric impersonation is the act of pretending to be another user to bypass a biometric based password list authentication system.
For example, if you’ve ever watched a spy movie, you’ve probably seen a spy try to collect an authorized user’s fingerprints from a can of soda or a glass. And then they try to recreate that user’s unique fingerprint with all of its ridges and valleys in order to bypass a fingerprint scanner and steal whatever secrets lie beyond the locked door. This is the Hollywood version of biometric impersonation. It’s essentially what we’re talking about here. If you’re going to rely on a passwordless authentication system with only a single factor, such as a fingerprint reader or a facial scanner or a retinal scanner, you need to ensure that that system has a very low false positive rate in order to prevent biometric impersonation from being successful. Overall though, using fingerprints and facial recognition is considered fairly secure. For example, if you consider the iPhone’s Touch ID and Face ID systems as compared to a standard Pin for authentication, you’re going to see they are decently secure.
A standard four digit Pin has a one in 100 chance of being guessed by an attacker. But if you’re using Touch ID with your fingerprint to unlock your device, the chance that somebody else’s fingerprint will unlock your device goes up to one in 50,000, making it five times more secure. If you’re using Face ID with its facial recognition scanning to secure your device, the chances of somebody else’s face unlocking your device goes up to one in 1 million. But for the most secure implementation of a password list authentication system, you should still combine two or more factors to unlock a device and create a multifactor authentication schema, or MFA. For example, if you combine facial recognition with a onetime Pin from an RSA key Fob, your overall security of your systems is going to increase tremendously, because MFA is much more secure than any single factor.