8. Service Router Active/Standby Availability
In this video, we’ll take a look at the high-availability configuration of our edge cluster. And as you can see here, I’m logged into the NSX user interface, and again, I’m using the free hands-on labs available at Hol vmware.com. So let’s start by browsing to “system” and “fabric.” We’re going to click on nodes, and we’re going to take a look at our edge transport nodes. And in this lab environment, we’ve got four edge nodes, and we’ve got a single edge cluster, edge cluster one. And contained within edge cluster one are edge nodes one and two. This is all pre-built into the hands-on lab environment. And if we examine the clusters themselves over here on the far right, we can see a little link for the edge transport nodes, and we can view which edge nodes are part of which cluster. So now that we have a good idea of what kind of edge nodes we’ve got set up, let’s go back to the networking tab and browse over to our tier-zero gateway. And I’m going to click on the ellipsis to edit this Tier Zero gateway. And within the configuration of this NSX Tier 0 gateway, let’s click on Interfaces. And you can see right now that we currently have one external interface in service interface.
If I click on the link, this is uplinkone, and it exists on NSX Edge 1. So what would happen if we experienced a failure on the NSX Edge 1? Well, our only uplink would go down, and therefore we would lose north-south connectivity through this Tier 0 gateway. So I want to add a second interface on a different edge node, and I’m going to click on the Add interface button here, and I’m just going to name this interface uplinktoo, and I’ll give it an IP address. And you’ll notice in the hands-on lab environment that all of these steps are referenced in the lab manual here. So if you want to just follow along with the lab manual, you can do that as well. But I like to walk you through this and explain why you’re doing each step. Anyways, the configuration that I’m setting up here does align with the configuration that’s referenced in the lab manual.
So I’ll go ahead and assign an IP address to the second uplink here. And for my segment here, I’m going to pick my uplink segment, and the uplink segment is a VLAN-backed segment. So this is going to give my tier-zero gateway access to a segment that has connectivity to a VLAN, and that VLAN is going to give me connectivity to the external routers. And then I’ll choose the edge node that this interface is going to be created on. My first uplink is on edge 1. So for my second uplink, I’m going to put that on edge node two. So now my Tier Zero gateway is going to have one interface on edge one and a second interface on edge two. And here I’ll input the MTU, the maximum transmission unit. I’m going to make my MTU here, 1524. Now remember, this is not overlay traffic. This traffic is going to be flowing out on a VLAN segment. So I don’t need the extra MTU that I typically need with gene-encapsulated traffic. This is VLAN traffic. So I can set my MTU to a value below 1600 if I want to. And so that’s my configuration for my new interface. I’m using my Tier Zero gateway. I’m going to go ahead and click on Save here, and here we can see the resulting configuration for my Tier 0 gateway. I’ve got one uplink on edge 1, and a second uplink on edge 2. On the first uplink, I’ve set the MTU to my second uplink. It looks like it’s not set, and each of those uplinks has a unique IP address as well. So now that I’m finished, I’ll click close, and I’ll scroll down here and click close editing. And now I’m going to go to the desktop of my console, and I’m going to launch Putty. And if you watch some of my previous videos, I was connecting to the NSX Edge 1. I issued some commands against Edge 1.
Now I’m going to connect to NSX Edge 2. To log into the NSX edge, the password is VMware one exclamation point. So here, within the CLI of the edge, I’m going to issue the Get logical routers command. And here I can see that I’ve got my service router for the Tier Zero gateway running on this edge node. And I’ve also got the distributed router for my Tier 1 gateway and my Tier 0 gateway. And in a prior video, I created the Tier 1 gateway. So if you don’t see this in your hands-on lab environment, that’s okay. We don’t really need the Tier One Gateway for this particular demo. But I do see the service router and the distributed router for my Tier 0 gateway. But what I really wanted to get out of this command was the VRF. So here we can see my service router has a VRF of 1. So I’m going to start typing VRF space one. And basically, what I’ve done is change my command prompt. Now any subsequent commands that I issue will be carried out against the Tier 0 service router.
So this is the centralised north-south routing component that only exists on my edge nodes, and any future commands that I issue after putting in that VRF will be issued against that specific logical router. So let’s put in a quickcommand here: get BGP neighbour summary. And again, that command will be applied to my Tier 0 service router. And you can see at this moment that there are no BGP neighbors. Now, the reason that we don’t have any BGP neighbours here is because we have not completely configured this interface yet. So here is the user interface. I’m going to go to my Tier-Zero Gateway, and I’m going to click the little ellipsis next to it to edit the Tier-Zero zero gateway.and I’m going to scroll down to BGP. And you’ll notice here that it shows we have one BGP neighbor. I’m going to click on that. And here you can see the BGP neighbor.
This is my physical router. So I’m going to edit this and choose the source addresses that should establish a neighbour relationship with this particular BGP neighbor, 192168 101.I’m going to add my new interface there. So that’s the configuration that I needed to modify to establish this BGP neighbour relationship with 192168 101.from the second uplink interface. I’m going to go ahead and close this section, and then I’m going to close editing. And now I’ll switch back to my putty session. Let’s try to get the BGP neighbour summary one more time. And there it is. Now we see that neighbour added to the service router running on the second edge node. So at this moment I have interfaces on edge node one and edge node two for my tier-zero gateway. And both of those interfaces have established a BGP neighbour relationship with the physical router. And so what that now means is that if one of these edge nodes fails, the BGP routing protocol is going to be used to detect that and redirect that incoming traffic to a different edge node.
So now let’s verify that this is working properly. I’m going to do a trace route from the console of my desktop to 172 16 1011.That’s one of my web servers. And we can see right now which route it’s utilizing: 192, 1, 6, 8104. And if we take a look at our diagram herein of the network topology in the NSX user interface, let’s take a look at what we can see here. We’ve got tier-zero gateways: 192.168.1.1, 8103, and 104104, the service router running on edge node two. So let’s knock that one down. I’m going into my VSphere client here and I’m going to pull up my edge nodes; here’s edge node two. Let’s take that edge node, right-click it, and we’re just going to do a hard power off on that edge node and take the edge node down immediately. and let’s see what the end result of that is. Let’s do a trace route to 170 216 1011. And I’m going to give it a moment because it’s going to take a moment for the BGP routing protocol to detect the fact that 192.168.8.104 is no longer responding and redirect traffic around it to the surviving edge node. So, since the destination host is currently unavailable, I may need to wait for the routing protocol to converge and find the correct route in order to continue sending traffic to 170 216 1011. And there we go. Fast forward about two minutes. And now my pings appear to be responding again. So we have successfully failed. over to the remaining edge node. Let’s do a trace route and see what happens. The routing protocol has converged and is now sending traffic through 192.168.103.
9. Demo – High Availability Deep Dive
In this video, we’ll take a look at the high-availability configuration of our edge cluster. And as you can see here, I’m logged into the NSX user interface, and again, I’m using the free hands-on labs available at Hol vmware.com.So let’s start by browsing to “system” and “fabric.” We’re going to click on nodes, and we’re going to take a look at our edge transport nodes. And in this lab environment, we’ve got four edge nodes, and we’ve got a single edge cluster, edge cluster one. And contained within edge cluster one are edge nodes one and two. This is all pre-built into the hands-on lab environment. And if we examine the clusters themselves over here on the far right, we can see a little link for the edge transport nodes, and we can view which edge nodes are part of which cluster. So now that we have a good idea of what kind of edge nodes we’ve got set up, let’s go back to the networking tab and browse over to our tier-zero gateway. And I’m going to click on the ellipsis to edit this Tier Zero gateway. And within the configuration of this NSX Tier 0 gateway, let’s click on Interfaces. And you can see right now that we currently have one external interface in service interface.
If I click on the link, this is uplinkone, and it exists on NSX Edge 1. So what would happen if we experienced a failure on the NSX Edge 1? Well, our only uplink would go down, and therefore we would lose north-south connectivity through this Tier 0 gateway. So I want to add a second interface on a different edge node, and I’m going to click on the Add interface button here, and I’m just going to name this interface uplinktoo, and I’ll give it an IP address. And you’ll notice in the hands-on lab environment that all of these steps are referenced in the lab manual here. So if you want to just follow along with the lab manual, you can do that as well. But I like to walk you through this and explain why you’re doing each step. Anyways, the configuration that I’m setting up here does align with the configuration that’s referenced in the lab manual.
So I’ll go ahead and assign an IP address to the second uplink here. And for my segment here, I’m going to pick my uplink segment, and the uplink segment is a VLAN-backed segment. So this is going to give my tier-zero gateway access to a segment that has connectivity to a VLAN, and that VLAN is going to give me connectivity to the external routers. And then I’ll choose the edge node that this interface is going to be created on. My first uplink is on edge 1. So for my second uplink, I’m going to put that on edge node two. So now my Tier Zero gateway is going to have one interface on edge one and a second interface on edge two. And here I’ll input the MTU, the maximum transmission unit. I’m going to make my MTU here, 1524. Now remember, this is not overlay traffic. This traffic is going to be flowing out on a VLAN segment. So I don’t need the extra MTU that I typically need with gene-encapsulated traffic. This is VLAN traffic. So I can set my MTU to a value below 1600 if I want to. And so that’s my configuration for my new interface. On my Tier Zero gateway. I’m going to go ahead and click on Save here, and here we can see the resulting configuration for my Tier 0 gateway. I’ve got one uplink on edge 1, and a second uplink on edge 2. I’ve set the MTU on my second uplink on the first uplink.
It looks like it’s not set, and each of those uplinks has a unique IP address as well. So now that I’m finished, I’ll click close, and I’ll scroll down here and click close editing. And now I’m going to go to the desktop of my console, and I’m going to launch Putty. And if you watch some of my previous videos, I was connecting to the NSX Edge 1. I issued some commands against Edge 1. Now I’m going to connect to NSX Edge 2. To log into the NSX edge, the password is VMware one exclamation point. So here, within the CLI of the edge, I’m going to issue the Get logical routers command. And here I can see that I’ve got my service router for the Tier Zero gateway running on this edge node. And I’ve also got the distributed router for my Tier 1 gateway and my Tier 0 gateway. And in a prior video, I created the Tier 1 gateway. So if you don’t see this in your hands-on lab environment, that’s okay. We don’t really need the Tier One Gateway for this particular demo. But I do see the service router and the distributed router for my Tier 0 gateway. But what I really wanted to get out of this command was the VRF. So here we can see my service router has a VRF of 1. So I’m going to start typing VRF space one. And basically, what I’ve done is change my command prompt.
Now any subsequent commands that I issue will be carried out against the Tier 0 service router. So this is the centralised north-south routing component that only exists on my edge nodes, and any future commands that I issue after putting in that VRF will be issued against that specific logical router. So let’s add a quickcommand: get BGP neighbour summary. And again, that command will be applied to my Tier 0 service router. And you can see at this moment that there are no BGP neighbors. Now, the reason that we don’t have any BGP neighbours here is because we have not completely configured this interface yet. So here is the user interface. I’m going to go to my Tier-Zero Gateway, and I’m going to click the little ellipsis next to it to edit the Tier-Zero zero gateway.and I’m going to scroll down to BGP. And you’ll notice here that it shows we have one BGP neighbor. I’m going to click on that. And here you can see the BGP neighbor. This is my physical router. So I’m going to edit this and choose the source addresses that should establish a neighbour relationship with this particular BGP neighbor, 192168 101.I’m going to add my new interface there.
So that’s the configuration that I needed to modify to establish this BGP neighbour relationship with 192168 101.from the second uplink interface. I’m going to go ahead and close this section, and then I’m going to close editing. And now I’ll switch back to my putty session. Let’s try to get the BGP neighbour summary one more time. And there it is. Now we see that neighbour added to the service router running on the second edge node. So at this moment I have interfaces on edge node one and edge node two for my tier-zero gateway. And both of those interfaces have established a BGP neighbour relationship with the physical router. And so what that now means is that if one of these edge nodes fails, the BGP routing protocol is going to be used to detect that and redirect that incoming traffic to a different edge node. So now let’s verify that this is working properly. I’m going to do a trace route from the console of my desktop to 172 16 1011.That’s one of my web servers. And we can see right now which route it’s utilizing: 192, 1, 6, 8104. And if we take a look at our diagram herein of the network topology in the NSX user interface, let’s take a look at what we can see here.
We’ve got tier-zero gateways: 192.168.1.1, 8103, and 104104, the service router running on edge node two. So let’s knock that one down. I’m going into my VSphere client here and I’m going to pull up my edge nodes; here’s edge node two. Let’s take that edge node, right-click it, and we’re just going to do a hard power off on that edge node and take the edge node down immediately. and let’s see what the end result of that is. Let’s do a trace route to 170 216 1011. And I’m going to give it a moment because it’s going to take a moment for the BGP routing protocol to detect the fact that 192.168.8.104 is no longer responding and redirect traffic around it to the surviving edge node. So, since the destination host is currently unavailable, I may need to wait for the routing protocol to converge and find the correct route in order to continue sending traffic to 170 216 1011. And there we go. Fast forward about two minutes. And now my pings appear to be responding again. So we have successfully failed. over to the remaining edge node. Let’s do a trace route and see what happens. The routing protocol has converged and is now sending traffic through 192.168.103.
10. NSX-T Edge Node
So the edge nodes are there to run your network services that cannot be distributed to the hypervisors. Those are things like North-South connectivity and also centralised services, including Nat, DHCP, the Gateway firewall, a load balancer, layer 2, the Bridge service interface, and VPN. So all of these centralised services will run on the SR, or service router, component of a tier-0 or tier-one gateway. So as soon as any of these services are configured, or as soon as an external interface is identified on the Tier Zero gateway, when those features are enabled, a service router is automatically instantiated on the edge node. So here’s a little diagram that incorporates some of the edge nodes into our picture. On the left, we have a couple of ESXi hosts. Those are transport nodes. On the right, we have En One and En Two. Those are our edge nodes. And in a lot of ways, the edge node is actually pretty similar to the transport nodes. And it starts with the simple fact that, much like the transport nodes, the edge node is going to have a tap, a tunnel endpoint.
And the edge node can also belong to multiple transport zones. It will belong to at least two transport zones if you need northbound connectivity. So we’ve got an overlay transport zone that it’s going to belong to and one or more VLAN transport zones as well. The VLAN transport zones are usually used to connect to the physical network. So I’ve got a physical router in my data centre that is present on the same VLAN that my edge nodes have uplinks to. And the overlay transport zone allows the edge node to receive encapsulated traffic, decapsulate it, and send it northbound. So the edge nodes themselves can be deployed in different sizes. A small edge node uses four gigabytes of memory, two CPUs, and 200 GB of disc space. And the small edge node size is really used for proof of concept only; it doesn’t do things like load bouncing, for example.
So when you think small edge node, think this is just for proof of concept. A medium-edge node, on the other hand, has eight gigabytes of memory, four CPUs, and 200 gigabytes of disc space. and this is suitable for production services. It can do Nat; it can do the gateway firewall; we can do the load balancer; but for the medium-sized edge node, we should only be using the load balancer for proof of concept. If I actually want to run a production load balancer, I should be running it on a large edge node, which has 32 gigs of memory, eight CPUs, and again, 200 GB of memory. And this is for production-centered services, including the load balancer. And then finally, we can also run a bare metal edge as well. And again, this is suitable for production workloads. The Nat Gateway Firewall Load Balancer is typically deployed where higher performance is necessary and we need subsecond north-south convergence. So we need really fast failover in the event that an edge node goes down. But most of the time, you’ll be deploying these edge nodes as virtual machines running on an ESXi host, deploying them from OVA, OVF, or ISO files the same way you’ve deployed virtual machines for years. And the edge node does not need to run on an ESXi host that is prepared for NSX.
So if I’m running the edge node as a virtual machine, it can run on any ESXi host, even an ESXi host that’s not configured for NSX. Because remember, the edge node itself has a tap built right into it so that it can handle the encapsulation and decapsulation of traffic. It can encapsulate traffic and send it out on the correct port group with the right VLAN. So think of the edge node as its own little node that the service routers can run on. Okay, so now that we understand that the edgenode is its own little node with its own tap, let’s dig a little bit deeper here. The edge node has multiple interfaces. So the first interface on the edge node is Ethel. That interface is reserved for management. And also present on the edge node are these Ethernet interfaces. Think of these as the virtual nicks of the edge node. And so, just like any other virtual machine, the edgenode is going to have these virtual nicks that allow it to utilise the physical adapters of the ESXi host. And so, kind of underneath the surface, let’s assume that our edge node is a virtual machine running on an ESXi host. And so here in the diagram we see our physical ESXi host, and that physical ESXi host has these VM-nix physical adapters. And also created on this ESXi host is a Vsphere standard or Vsphere distributed switch. It doesn’t matter which type of virtual switch you use. But I have not configured NSX on this ESXi host, and I want to make a note of that. There’s no need to prepare this ESXi host to run within NSX or to have a tap or anything like that. And so now we add the NSX edge node itself, which runs as a Vsphere virtual machine on that ESXi host. And so the NSX Edge is going to have a management interface. This is going to connect to our management network, and this is what we’re going to use, as you can tell, to manage this NSX edge node. And then we’ve also got an FPEzero interface running on the NSX edge.
Remember, this is one of the nicks for this virtual machine. And so traffic that is flowing over the overlay network is going to be encapsulated by a TEP. The TEP itself exists on the NSX edge node, and then traffic will flow out of this NSX edge node or in, depending on whether the traffic is inbound or outbound. And the traffic will hit either the Vsfor standard or the Vsfor distributed virtual switch. on a certain VLAN, and then that traffic can flow out over the physical adapters of the host onto our underlay network and reach other hosts that are in the NSX domain. So, for example, let’s go back to a slide from a previous lesson. Remember, we have this transit segment where the distributed routers and the service routers can communicate with one another. Well, that’s an overlay segment. So my edge nodes need to be able to send traffic in and out over that overlay segment. So that’s an example of where the overlay segment of the edge node will be useful to do that. transit traffic between distributed routers and service routers. The answer is yes. How many of these tips will exist? Well, that depends on how we’ve configured our uplink profile. So when you instal an NSX Edge as a virtual appliance, you should use the default uplink profile. And for each uplink profile created for a VM-based NSX edge, the profile should specify only one active uplink and no standby uplink. So basically, stick with the default uplink profile option for your edge nodes.
And then we’ve also got some VLAN uplinks configured here that are connecting up to the physical network. So in this case, you can see we’ve got two interfaces with two different VLAN uplinks coming out of the NSX edge node. Maybe those two different VLAN uplinks are doing something like we see in this diagram here. So we’ve got a service router running on an edge node. We’ve got multiple uplinks connected to the different physical routers. Maybe these are on different VLANs. Okay, so in this diagram, I’ve made it very clear that this is on an ESXi host that has not been configured for NSX. Well, what if I want it configured for NSX? What if I want to run virtual machines on this ESXi host that are connected to layer-2 segments that have taps and that can communicate on the overlay network for NSX? And the short answer is that you absolutely can. What you’ll do is just set up one Vsphere virtual switch that is strictly for the NSX edge, and then you can have other virtual machines on other segments just like you could with any other host. So the fact that you’ve configured NSX doesn’t mean you can’t create other Vsphere standard or Vsphere distributed switches and use those for other things. And that’s basically all we’re doing here, right? We’re creating a virtual switch dedicated to the NSX edge, and then I can run NSX alongside it. The only thing you can’t do is run NSXT and NSXV on the same host. You absolutely cannot do that. So now we’re starting to think about all the things that these NSX edges can be doing in terms of the services that they can perform in terms of north-south routing. For example, these edges are very important.
They’re part of our traffic path; they’re part of the data plane. So I may want to create a cluster and join edge nodes to a cluster. And some of the edge nodes in a cluster could be bare metal, some of them could be virtual machines, or we could just have all virtual machines or all bare metal. Those different types of edge nodes can be in the same cluster, and we can have anywhere from one to ten nodes per cluster and up to 16 edge clusters running in an NSX domain. And yeah, the big thing that the NSX Edge node does is do north-south routing. That’s the really foundational thing that we need it for. Now there are other services as well, but north-south routing is probably the one that absolutely everybody uses it for. So here in this diagram, we see a bunch of transport nodes. These are all ESXi hosts. And then we’ve got these two edge nodes over here on the right. And let’s just assume that these are four different physical racks in my data center.
So here I’ve got a top-of-rack switch in each of these racks inside my data center. And again, yeah, I’ve got the edgenodes in different racks as well. And I’ve interconnected all of my top-of-rack switches with spine switches. So this is a traditional leaf-spine topology where all of my top-level rack switches have redundant physical connections to all of my spine switches. And the purpose of this sort of design is to say, “Hey, we can have a failure of any of these components.” And what we may really have is multiple top-of-rack switches as well. But let’s just assume that the aspirin switch goes down well. This spine switch still has connectivity to all of the top rack switches. Let’s assume that a physical connection goes well. This top rack switch still has a connection to the other spine. So this is a redundant design here.And so we’ve got our distributed router that’s going to be present on all of these nodes, including the edge nodes. But the service router is only going to exist on the edge. And so maybe for our edge nodes, we’ve got connectivity to an external physical network that represents our red lines. So we basically have east-west and north-south routing.
At this point, any traffic within my NSX domain—for example, if I’ve got a VM here that wants to communicate with a VM here—is going to hit this distributed router. It’s going to get routed east-west; it’s going to flow through the physical network and reach that virtual machine. That’s my east-west routing. But north and south, if the destination is out here on the Internet, we’re going to have to have that traffic flow out of the virtual machine. The first hop routing will be done by the distributed router saying, “Hey, this is destined for the internet.” East-west, the traffic is going to get encapsulated over a spine, hit one of these NSX edges, and then hit the NSX edge service router. And in the NSX edge node, we’re actually going to have a service router, and the service router is going to receive that traffic across that intertier transit link, and the service router has another hop out there on the physical network somewhere. So that’s kind of the main function of this north-south routing piece: to say, “Hey, if traffic is flowing out of a VM, we’ll north-south throughout it using the distributed router.” But the service router is going to run centrally on these edge nodes, but the service router is going to sit on the edge and handle the north-south routing of that traffic.