93. Around the Corner OpenStack
The final topic in this particular section is the open stack. And open. Stack was basically developed by NASA. And the Rocker space, which you can see here in 2010, and why they created it. Because they desired open source or open software to construct public and private e-scale clouds. So that was the reason to develop Open Stack. Okay, now what is good about OpenStack is that it is open-source software. So anyone can contribute to this. That is the good thing about Open Stack. Another point to note is that this Open Stack can be managed via a web-based dashboard or an Open Stack API. Okay? And finally, the third point is that the OpenStack implementation consists of the deployment of multiple services, which are developed in individual development projects. So that is the same thing that I told you: the companies can develop their own way or their own form of Open Stack. It is also a type of scalable cloud, such as public and private cloud. So we have some terms related to Open Stack. Now, when we have an AWS study, I told you that whatever cloud you study, no worries.
On all those clouds, you will find services related to network storage, compute, database security, and other stuff. But these 12345 services will be measured. And in all the clouds, you have at least these five services at least. So, we can see the keystone here providing identity service. Identity service will fall under Security Nova, which will provide the computed glance. This coordinates the image that will be used to boot virtual and physical devices. Then we have Neuron, which is actually very popular. Neuron controls networking resources to support the cloud. See neuronal networking. Then we have Swift, which offers data storage.
This is now in their storage. Then we have cinder. This also belongs to storage. Then we have Heat, which is used to manage the entire orchestration of infrastructure and applications. So this is something related to orchestration. Then we have Tube, which provides a scalable and reliable database. So you can see here that everything will fall under these criteria. Say network security, compute, storage, management, database, even monitoring. So all these applications are taken one by one. If you compare, you’ll find these acronyms that are used here or these services that are used here inside the Open Stack. Then we have Sahara, which implements a data-sensitive application. Then there’s Zakary, which provides multichannel cloud messaging and is built for secure storage.
Designate and provide DNS in AWS. We have something called Root 53 that provides the DNS. Here we have this designation. Then we have a Manila-built shared file system, Magnum Orchestrate, Linux, and containers. So we have service related to containers at our disposal. Then Congress can offer governance and compliance services. Finally, we have errors as a result of the workflow. So if you learn the entire cloud paradigm, the entire cloud infrastructure, and different types of vendors, say OpenStack or Amazon or Cisco or some other vendors, they are doing the same thing, but to achieve that same target, they have their own services. So maybe the use of this particular designation inside Open Stack will be exactly the same. That will be the use of, say, Route 53 in Amazon. Similarly, you can compare the various services offered by various service providers. Okay, so let us close here.
94 1.7 Software upgrade no disruptive
So far, you have seen that most of the sessions are long and lengthy because I just wanted to cover COVID each and everything related to it. This section is going to be shorter than the previous sections because here we have to understand the software update and its impacts. This section is divided into three parts: disruptive, nondestructive update or upgrade, and Epode and patches. So we are going to understand and learn this in this particular video and obviously in upcoming videos: what does it mean by software upgrade and hardware upgrade? And you can see that EPLD should appear in Section C, as should the patches. But anyway, I’m going to follow the sequence: one, seven, A, then C, and then B. Okay? So let’s start with disruptive and non-disruptive upgrades. Let’s begin with the best practices for software.
Before we begin, you should be aware that the Nexus operating system is designed in a modular manner. So what does it mean by “modular approach”? A modular approach means that whenever you are using those features, for example, any protocol in OSPF, then we have to go inside the box and enable that feature. Right now, since we are enabling that feature, that means that somehow they are fitted inside the operating system in a way that if it is not required, you don’t need to do anything. Now, in the diagram, you can see—and let me highlight as well—what the overall structure is. This is something called the “software architecture” of the Nexus operating system. So obviously, you have the kernel, you have the drivers, and you have the chassis manager. We are going to discuss chassis upgrades and line card hardware upgrades as well in the EPA section. However, you can see on top that you have protocols: L-2 protocols, L-3 protocols, storage protocols, and we also have other services. So what’s the key here?
This segment is crucial; this is manager Pasts. Overall, this is a broad topic, and if you work as a tag in Cisco, you’ll learn the big picture about what the MTS, PSS, manager, and so on are. When you have to go and do some sort of hardware-level troubleshooting, some sort of deep troubleshooting with respect to software, you have to understand all these transactions, messages, and transactions, et cetera. Now we’ll concentrate primarily on the topic agenda. So the agenda is telling us to describe the disruptive and no disruptive upgrades. Okay? So before going there, I have two or three slides just to make our fundamentals correct, to make things in such a way that we can understand those concepts correct. So it’s as if we’re building the framework, and within that framework, we’ll learn gesturing and non-gestational gesturing.
So let’s try to understand this, okay? So understand this: Assume you are using the OSPF feature; what will happen if your OSPF crashes? OSPF is one of the processes in the Nexus operating system that works as a patch or not as a patch, but as a protocol that is modular. So, if you look here, you can see that we have three protocols inside of OSPF, correct? So you have features or packages; you have software inside software, correct? OSPF is a feature and a package contained within the larger operating system, Nexus. What would happen if it crashed or if something happened to OSPF? That is to say, if your protocol stops performing the first or second thing, this is one of the concerns you may have if your protocols suddenly start working. So, obviously, we’ll go raise the issue and try to figure out why it’s happening, whether it’s due to memory, a hardware problem, a traffic problem, a bug, or something else. Suppose you have to upgrade the image due to any reason—maybe software end of life, maybe a bug, etc. So what will we do because this is a critical environment we have? This is a data center. It’s like you don’t have any turn down time.
That means if your core switches are down, they are rebooting, or they have any issue, your entire production revenue will go down, correct? So, in this case, that is how you will go about doing the software upgrade, and we will learn about the features that Cisco offers in terms of destructive and non-destructive upgrades. So one key thing we should understand is the persistent software service or persistent storage service, which is also one of the key things we have here. Actually, this is nothing but a database. So that’s why it’s in storage or something. Let me go here, and I just wanted to make sure that whenever we are talking about PSS, you think of this as a database or data store. Now, what is the use of a database? Database means that when you are doing any transaction, some ID will get generated, and then inside the database, you are tracking that ID. Correct? That is the purpose of the PSS persistence storage service, to act as a checkpoint. Suppose you’re reading a 100-page book, and then after the 37th page you put some sort of marker that is nothing but one type of checkpoint. Is it correct that you’re using a green or red marker on some pages?
Okay, so we understood this thing at this point in time—that PSS is nothing but a database. What’s the usability here, assuming you’re doing the upgrade? Your protocols are running; your services are running. There are two things, and I’m talking about them a lot in my SDWAN classes right now about the STM solution. So remember, you have two things. You have a control plane, and you have a data plane. Now, the actual movement of data is happening in the data plane with the line cardio module, et cetera. But you have your soup engines, which are working as a control plane. So what will happen is that if you’re using a nondestructive upgrade with the help of a persistent storage service, PSS, or database, your data plane will keep forwarding the data. You obviously have two soups, one and two. Again, we are going to discuss more, but your data plane will not have any issues. You are sending data in both the data plane and the control plane. Meanwhile, it’s doing the upgrade since you have two control planes.
So, first of all, the load will go to one control plane. As a result, one will function as a standby, while the other will function as an active. When the standby is upgraded, active becomes a standby, which becomes active again, and the upgrade continues until the upgrade is complete. And now, since there is no problem inside the data plane, that’s why it’s non-destructive. So while performing the non-destructive type of upgrade, obviously you need to soak, and in this case, you can see in the diagram that you have one active and one backup, and they will show up in their roles, correct? So that’s why you have the state full switch over. This is one of the techniques by which whatever state the transaction or the data plane is in will be corrected so that the rule from the active supervisor will be bypassed to the standby supervisor. Correct. Again, if your active will become standby, so that rule will get transferred, that is called a “full switch over.” And for that, obviously in terms of protocols, you have nonstop forwarding enabled; you can go and enable that. You have your PSS; that’s a database-simple reassignment with PSS assistance; some checking and bookmarking is taking place.
Correct. Now, what are the triggers for this? Triggers are obviously present if you are doing ISSU—that’s a software upgrade that you are doing—or if you are doing some manual switch over. Assume you need to soup, and this is active; this is a backup. But manually, there are CLI commands you want to make standby active, correct? So you’re doing some manual commands. Or, if your H is a policy-initiated switchover, imagine that your services are crashing and will crash three times before this tasteful switchover occurs. Now, since we have two supervisor engines, that means it is non-destructive; there is no problem in the data plane. Correct. Again, we’re talking about the same thing here, and we’ve had the same debate about it. Assume your BCP or OSPA crashes; you have the marker, you have your PSS who is doing the marker, but you also have a high availability manager. Correct. And with the help of Ha Manager (high availability manager), the switchover will happen, and there will be no problem because, behind the scenes, the PSS is working during the marking. While doing the switchover, a full switchover, the PSS will exchange the state. Okay, so we are talking about Issue. The issuing service/software upgrade will play the functional role of providing the capability to perform transparent software upgrades on the platform with the redundant supervisor. So this is the key to whatever we have talked about earlier in the last ten to twelve minutes. You can highlight this important line, and for the interviews also, it’s an important thing. So, if that’s what you’re doing, you should have a backup supervisor if you’re using the Nexus platform—Nexus 7, K, 5, and so on. Let’s focus on the Nexus 7 K and the modular box with two supervisor engines for now. So in that case, you know that you have two different types of images.
You have one system image, and you have one Kick Start image. Correct. Now you can see system and kick start here in Nexus 7, given that you have the two supervisor engines. Still, when you are doing the upgrade, you can give one CLA command to install the kick start, and then the boot flashes your image name, and then the system flashes your image name. Correct. So this upgrade will be well taken care of. So far, we’ve discussed the non-disruptive upgrade; before moving on to the destructive upgrade, let’s review what we’ve learned. So we are doing a service software upgrade, and we need a dual supervisor engine to do the failover or straightforward switch over. There are three techniques, and assuming that we are doing an Issue upgrade, that means we have initiated the Issue. So what are the steps you have from step one to seven? First of all, upgrade the BIOS on the active, standby, and line systems. First of all, your firmware will be upgraded.
Step number two: bring up a standby soup with a new image. Okay, so it is upgrading the standby soup. The third step is to activate the standby task. Obviously, if you have an active and a standby, first of all, the standby will get upgraded, and then the switchover will happen, and your active will become a standby, and the standby will become active, correct? So bring up the old active and replace it with the new image. When this becomes active, it will be upgraded once more. Correct. So now, at point number four, you can see that all your supervisors — two of them — will get upgraded, then upgrade the CMP, correct? CMP is nothing but a connectivity management process. So you are upgrading that CMP. The hit line card upgrade is then done one at a time. Obviously, there are some variations or different ways that the Nexus 7 and 9 upgrade.
But here’s the bottom line: That is seven K types of architecture or nine K types of module architecture. Both places. Once your supervisors get upgraded, in another way, your control plane will get upgraded. Then your data plane will be launched. The data plane is your line card, and then finally, your upgrade is done. This is the flow from point number one to seven, and all that we have discussed so far is related to a non-destructive upgrade. Correct? Assume you’re upgrading from version six to eight or nine or something to version six to sixteen. So in that case, you can go and do an ISSU non-destructive upgrade. And obviously you should follow the VPC’s best practices as well. So let’s just stop here. And in the following section, we’ll go over the destructive upgrade. In the event that you have only one supervisor engine, what will happen?
95. Disruptive & SMU or Patches
In this section, we are going to discuss the destructive and the patch upgrades. So let’s get this straight: so far, we have discussed nondestructive upgrade, where we have two soups. Now one question you may ask is, “Okay, what about the readiness?” Do we have any CLI command that will tell us how we can check the readiness of two of the soups? Yes, you can use a CLA command to issue system redundancy and then status. As a result, system redundancy is a status. This is one of the CLA commands that will tell you the readiness of your super engines and whether they are going to be ready for ISSU or not. Correct? That’s one very important command. Even if you have a problem and use this tactic, they will first run that command. Obviously, you should be familiar with the image, your super’s versions, and so on. But this is one of the commands that will give you everything—all the information—prior to issuing it. So let’s talk about the destructive upgrade. Consider the case where you only have one supervisor engine, which is true for switches with only one soup. So in that case, if you go and do ISSU, you’ll find that there will be a minimum of 120 seconds of lag or communication disruption between your control plane, that’s your soup to, and the data plane, that’s your line card. So, if we go ahead and use enhanced Issuer Alexis as a Linux container-based Issue, we can minimize or reduce disruption. Okay, so what does it mean?
Now you can think that when we are doing the upgrade into the physical supervisor engine, where you have soup one and soup two, obviously one is active, one is on standby, et cetera. So they are taking the role, correct? So the standby will become active, the active will become a standby, they will do a full switchover, et cetera. That is, they are determining the role and then performing the upgrade, and so on. But suppose we only have one soup engine, so you can think that we only have one soup or physical super PSM writer, then how will you upgrade this and this will become secondary, and so on? So, what happens in this enhanced ISU or Alex-based ISU if one of the supervisors, and you only have one, is an engineer? So what will happen? One virtual instance of supervisor will be created in some way; this is a virtualization technique that will drastically reduce the disruption between the control plane and the data plane, and this time it will be less than 6 seconds. So from 120 seconds, it will be reduced to 6 seconds or less. Because, obviously, when we’re doing virtualization and everything is lagging, they require some hardware that isn’t physically assigned to them; some sort of virtual capability is required. Now, while doing this, we have to go and set boot mode as an Alexey, and then you can go and save this. So that’s one very important thing.
Again, when we talk about the Nexus 9 K platform, they are not like the Nexus 7 K platform, and when we talk about upgrading the Nexus 9 K, they do it in patches, not in groups. That’s the correct word. So they are doing upgrades in groups. That means that you can see here in the list that one to six will get upgraded when we are doing the upgrade. The first half of the line will get upgraded. Then the first half of the fabric model Then the second half of the line card. The second half of the fabric model follows. The first system controller We know what scanning is scan all. Then a second system controller Correct. So they are doing some sort of parallel upgrade process. Nine K not only supports virtualization, but also parallel processing. And that’s the innovation. We know that all the time Cisco is improving their product, and whatever is the best and latest in the market, they are using it inside their products as well. Great. So the next thing that we have to learn in this session is about the patches. So let’s try to understand the patches.
Now, we have decided that we have to do it due to any reason—due to a software crash, due to some limitation, due to some feature, due to a bug, etc. Perhaps the Cisco attack was supported, but what about the fact that you need to upgrade and don’t have the downtime? Now, the question here is that you want to do an upgrade, and again, it’s an important thing, but your devices, your data center devices, are hitting just one bug or they are hitting some bugs in a way that it’s not a critical bug. Correct? Now, even if it is a critical bug as well, there may be use cases; maybe it is a critical bug to correct. But you have very little downtime, and even though the bug is affecting you, you are aware of its severity. Correct. It’s very difficult for you as a data center engineer to get that downtime, because once you tell your manager, “I want to upgrade, I need ten minutes of downtime,” then War Room will get a start, everyone will come on call, and everyone will explain why it is happening, why it was not happening, et cetera, et cetera, et cetera. So many things So you’re doing the full upgrade now, right? We can go with SMU or do some patching. SMU is nothing but patching. So what does it mean? SMU is nothing but a software maintenance upgrade. So, suppose you have a critical bug that is affecting your platform; do you have any SMU or tackle that suggests you not do a full upgrade? We have the patches and the software maintenance upgrade.
So as per the running image plus the bug that hit this image, this is the SMU that they will provide you. In that case, it’s another Hitler-style upgrade. You can do the SMU, and again, we’ll see the lifecycle and the usability, but we’ll go for a small patch rather than a complete operating system upgrade. Okay? So now, first of all, let’s see the SMU, or the patching lifecycle. We can go to the Cisco website, select our operating system, and then select the critical bug. According to that, we can go and select the SMU or the patch. Now, we can copy that patch to our device with the install add command. So once it is copied to the operating system—not operating system—but once it is copied to the device, it will correct the memory and process inside the device, and then we can go and activate that. So once you go and activate that, that means it’s ready to move. Finally, once committed, it begins to perform. Now, this is patched with the operating system, so OS plus SMU Okay, so now you have your patch ready, which will find the critical bug. As a result, you must require the install commit.
Now, due to any reason, after some time in the future, if you want to deactivate that, you can go and use the install deactivate command so that SMU will get deactivated, but you still have to do the commit. So commit for activate, commit for deactivate, and finally, if you want to throw it in the garbage, you can go and do install remove; there are some show commands as well. Install your package, whether active, committed, inactive, packaged for, or whatever, just to make sure it has these capabilities. Now, let’s quickly discuss the different types of SMU. There are others, but the two most important are the restart SMU and the ISSU SMU. So in-service software upgrades (SMUS) are there. That means patches will function similarly to ISSU SMU as schedule soup.
So you have to follow the ISSU process. It’s a single soup; you can go and reload it. Correct. Then we have to restart SMU as well. So this will go and restart the process, and it will go and restart the BDCs as well. The process has been restarted in all VDCs; we’re back up and running; however, patching for the next release is not available per VDC. So suppose if you have four VDCs, you have to go and enable them on the admin or default VDC, correct? If the VDC feature is not available, you don’t need to apply all the patches. Correct. So there are obviously patches (1234, whatever), and you don’t need to apply all the patches. But it is very important that suppose you have to use p. However, if you must install p1 before p2, this means that if p2 is dependent on p1, you must first install p1 and then p2. Correct? Some only have a single package; others may have multiple packages, resulting in dependencies. But the thing is that you should understand SMU, and if required, obviously, Cisco provides are there. You can go and raise Cisco’s case and taxes, and they can work as a proactive backup for you. Great. Finally, SMU is a tax-supported institution. That’s why I told you that tax support is there, and you should utilize it. Whenever we are upgrading in the data center, you should have one proactive case going on with Tag. Then SMUS is synced to the standby supervisor. Replacement patches will be synchronized. That’s a normal thing. And finally, SMU is not a feature.
Suppose you want some new feature in your data center. So for that, you have to check with your Cisco account manager. SMU is not something that is going to be added as a feature in the operating system. For feature-specific things, you can check with your Cisco account manager, and they will provide you with the valid path. According to that, you can get some other images with new features. Great. So we have completed two sections so far. In the previous section, we completed one seven, a disruptive non-disruptive. Then we have completed C, which is SME or patches. Now, in the following section, we’ll learn about Epode, which is also very interesting and has to do with hardware upgrades.
96. Hardware upgrade – EPLD
In this subsection, we are going to discuss Epode. And this is important because so far we have discussed the software upgrade. What about hardware upgrades? And first of all, what is this EPLD? Now you can see the full form of the Epldronic programmable logic device upgrade, but Apple is nothing but the ASIC. So what about the ASIC upgrade and the hardware upgrade as well? Now, while we are doing the hardware upgrade—that’s intuitively correct—that means we need the downtime. So be mindful that when you are doing this upgrade, you need downtime of at least 30 minutes. And then, if you want to check, are you ready to do this upgrade or not? Then you can go and run this command: “install module one Epode boot, flash, and the image correct.”
Obviously, you want to install it. Then you can go and press “yes” to continue, and then your installation will start. So, what are the main points about Epodes? Nothing but your ASIC Now that we are thinking about doing the ASIC upgrade, we want a new type of ASIC placed inside the datacenter that can support new features as well. Again, this will necessitate extensive planning because hardware upgrades are always expensive. So it’s a low-cost upgrade, and you should have multiple emails from your managers, directors, and so on confirming their approval for this. Once you have the approval, it has its own lifecycle. So the requirement comes first, then the cost estimation, then the approval, then the hardware will come to your location, and then you take the downtime, and then you will do the upgrade, correct? So everything will be linked. Here again, continuing with this hardware upgrade, there are multiple things or multiple types of hardware upgrade.
You want to do the line card upgrade. So, let’s say you have M one line card and F one line card, which means you have a line card mismatch. What can you do at that point, even though we have online line card insertion and removal? Yes, you have the option of a hard swap or a hard removal. You have the soft removal option. What does it mean? Simply taking out the module and the line cards is the difficult type of reset or removal. You can think of it like this: “soft” means you are doing some software command, and then you are shutting down that module or shutting down that interface with the CLI command. That is a gentle type of removal or instruction. Okay, so considering this fact, what’s important here is that OIR is supported for the line card, but it’s not recommended. What Cisco is telling you is that if you have to remove and upgrade the line card, what you do is, first of all, use this command-out-of-service module to remove it, then put a new line card inside the slot, and then start the upgrade.
Now there is one hidden command. You can see that if you are in mixed VPC mode, the hidden command is to go to the VPC domain and use the bypass module. Let me highlight this. So you can see that you have this bypass module check, which means that if you have a mix of hardware or line cards, you are bypassing your VPC domain and it will recognize it. However, once your removal and migration are complete, remove this command. This is not recommended, and obviously you can take Cisco Tech’s suggestions that you have this situation and that you need to perform this type of upgrade, as well as kindly verify this internal command.
Okay, great. So, what kind of hardware maintenance will you encounter? You will discover that the supervisor’s hardware has been upgraded, correct? And we talked about it a little bit in the previous section when we were doing Issue. That is a software upgrade, so you are not changing the physical thing, the physical entity. But suppose if you have an old supervisor and you want to change to a new supervisor, then how can you do it, correct? So that will come into your major hardware upgrade or maintenance, and then you have to physically remove it and insert it. So you should follow the proper life cycle because you are changing the control plan. So what type of hardware upgrade will come into the picture? Obviously, the supervisor hardware must be upgraded. Correct? Once again, you have the option of having two or one supervisor. Because we know that in the case of one supervisor engine, we have discussed in Issue that if you have one supervisor engine, how you can improve Issue, and so on, but you can see that they are linked. So you are doing a hardware upgrade, and in the future, after three to four months, if you have to do a software upgrade as well, you can use Epode to upgrade your firmware and your hardware, and in the future, you can do Issue or other methods to upgrade the software.
What other use cases do we have? We can go and do the complete chassis hardware upgrade, correct? Then we can go ahead and upgrade the fabric module’s hardware. Again, these upgrades will come with issues, and these upgrades will come with suggestions. If you have a major problem, if you have a lot of load on your data center devices, and you want to increase the fabric module, port density, module density, and so on, they will come with use cases, and then you have to do the hardware upgrade, correct? So suppose that in the case of a fabric module hardware upgrade, you should check that your fabric utilization is how much? Because once you remove the redundant fabric module, obviously if you have two fabric modules and you’re taking one out, then obviously in one fabric module, the load will increase. So before doing the upgrade, you don’t want the destructive upgrade, correct?
So before doing the hardware upgrade as well, you should check, you must check, what is your overall utilization in the case of the fabric module upgrade? The same thing will be true with the power supply hardware upgrade as well. While you are doing the power supply hardware upgrade, suppose you have four or five slots related to power modules. If you are removing one power supply hardware module, what will be the overall weight of the rest of the power module? And in this case, what? We can do that, we can check the environment power, and then we can estimate that if we are running this power slot, say in combined mode, and it is recommended that you run in combined mode because you are removing a few power modules and then updating the hardware. So during the migration, what commands will be supported, and after the migration, what commands do you want for the optimization? Correct? So each and every detail should be written down and taken care of before the actual EPL date. All right, great. So we’ve finished Section 1.
97. Section 1.8 Starts…
Now we reach Subsection 18, where we have to understand and implement network configuration management. Now, when we are talking about configuration management implementation, that means we have to know different types of things. That means how we are going to manage logins, like out of band management, and different types of other management protocols as well, like syslog messages, SNMP, and SNMP configuration. Because here we have to deal with the implementation. So not only should you know the theory of Syslog SNMP, but you should also know the implementation strips, correct? Then there’s the backup. What I have done here is take a few videos from one of my courses, the CCNP DC ACI. The paper code is 360, and then I want to add those videos that are already there in this course. So I’m going to upload it here in this session, and after this recording you will find eight videos related to configuration management. Okay? So you’ll find ACI management, management SNMP, syslog-related theory, and then the lab and backup as well. So please complete these eight videos, and then we’ll move to the next section. Section 1 will cover this.