5. Demo – Configure North South Routing
In this video, we’ll learn about north-south routing in a multi-tier routing environment. And if you haven’t already done so, I recommend you watch the previous video where we configured east-west routing because we’re going to be building on that here. And all of these tasks are going to be demonstrated using the hands-on labs available at Hol vmware.com. So, as a first step, if we want to do tier-zero row routing, we have to have an edge node. And you can kind of think of the edge node as the border between our NSX environment and the outside world. And we can have multiple edge nodes in a cluster for scalability and high availability. And we can put them in active, active standby, or both configurations. And so at the top of our Nsxtuser interface here, let’s click on System, and then under Fabric, let’s click on Transport Zones. And you can see here that we’ve got a few transport zones already created already. The two that I want to focus on are the overlay and the VLAN transport zones. So the overlay transport zone is essentially where my segments reside.
And if we look at the networking area in the last video, we saw three segments that we had connected to the overlay transport zone that are now connected to my tier one gateway: LS app, LSDB, and LS web. And we actually created another one called LS Ric, which I’m actually going to very quickly delete here. So we’ve got all of these segments, and these segments all exist within my overlay transport zone. So let’s go back to System, back to our transport zones here, and notice that we’ve also got a VLAN transport zone established here as well. Now that the NSX edge node is going to participate in the overlay transport zone, it’s also going to participate in the VLAN transport zone. So the fact that the edge node exists within a VLAN transport zone gives it the ability to connect to the upstream physical network. It’s going to be on a VLAN, just like the systems in our physical network. So under Fabric here, let’s click on nodes, and under nodes, let’s click on edge transport nodes. And you can see here that we’ve got four edge transport nodes that are automatically created for us in this lab environment. And these particular edge nodes are deployed as virtual machines running on an ESXi host, but they could also potentially be a physical standalone server. And remember, all of these edge nodes are connected to the overlay and the VLAN transport zones.
And you can see here that the top two edgenodes are part of a cluster called edge cluster 1. So basically, we’re clustering together edges one and two, and that’s going to give us high availability. The wording of this sentence is a little different than the one used in the video. This is, of course, going to be the north-south router for the NSX domain. And so here’s our Tier Zero gateway. And if we go down to interfaces, we can see that there is one external and one service interface. This is an uplink interface, and this is going to connect to our VLAN. This is the interface that the Tier Zero Gateway is going to use to send traffic outbound to external machines and to the Internet. And so if we just expand this uplink interface here, you can see the edge node that this interface exists on and that the interface is connected to the uplink segment. And so if we close this out and go to segments, you can see here that we’ve got an uplink segment that is connected to the VLAN transport zone. And if we take a closer look at this uplink segment, we can see that it’s got VLAN 0 configured. I could just as easily configure some other VLAN here if I wanted to though. And so basically, the uplink of the tier-zero gateway is connecting to a VLAN-backed segment, and those VLAN-backed segments could have access to physical adapters on the transport node that it’s running on. So this edge node is running on an ESXi host, and so we can potentially send out traffic on a VLAN on that ESXi host out to the physical network. The term “independent” refers to a person who does not work for the government. And let’s take a look at a couple of the settings here. A BGP is enabled on this Tier 0 gateway. So that means that this tier-zero gateway could be dynamically exchanging routing information with some routers on the physical network. And as a matter of fact, if I click on the BGP neighbours here, you can see the IP address of the physical router that we’ve established a neighbour relationship with here. So my tier-zero gateway is exchanging routing table updates with a router in my physical network. So what is the Tier-0 gateway actually going to advertise to these external physical routers? Looking at route redistribution, we can see that we have configured it to send routes related to our load balancers, as well as routes related to Nat. And then we’ve also got tier-0 and tier-1 connected segments here. So for tier zero subnets, we are redistributing connected segments, external interface subnets, service interface subnets, and also the connected interfaces of our tier one gateway. So remember, we’ve got three segments currently connected to that tier-one gateway.
We’ve got apps, web, and DB. And in the last video, we were able to topper all of those from an external machine. That’s because the tier-zero gateway was redistributing those networks into BGP. So I’m just going to go ahead and close this here. So let’s use this network topology feature here and the user interface to kind of lay the groundwork for where we are. And you can see here that we have our Tier Zero gateway with this interface, 192.168.103. That’s its northbound interface that’s pointed to the uplink segment. And then on the southbound, tier zero has an interface ten, dot 64, two, eight, and dot four. This is what we call the “inter-tier transit segment” that it is using to communicate with the Tier 1 gateway. So as traffic flows through this tier one gateway from these multiple segments that are connected to it, the tier one router is probably going to have a default route, sending that traffic over this intertier transit segment up to the tier zero gateway so that it can remove northbound. Or if I were to have other tenants, that traffic could be routed to those other tenants. So what I now have at this point is basically two logical router ports. one that is being used for all of my northbound and southbound traffic, and one that is being used to communicate with the tier one gateway. And just one final note to refresh your memory.
The BGP neighbour that we saw was one nine, two, dot one, six, eight dots, 100, dot one. So that’s some physical router up here that this Tier Zero gateway has a BGP neighbour relationship with. And also bear in mind that we have configured the Tier 1 gateway to advertise its directly connected networks. So the Tier 1 gateway is saying, “Hey, I’ve got 170, 216, 1020, and 30.” Over this segment, I’m going to advertise all of those networks to the Tier Zero gateway. And this is an overlay segment. And so that’s how the Tier Zero gateway is learning about all of those directly connected networks. And as a reminder, that is not BGP between tier one and tier zero. That is an internal routing protocol that allows tier one to advertise those routes to tier zero. And this inter-tier segment—this was automatically created for me. So you’ll notice I never set up this segment to communicate between these two components. It was just automatically created for me. And if we click on these different routing components, we can see things like what interfaces are on this router and download the routing table and the BGP forwarding table, so we can get all kinds of information about these routers right from the network topology view. So the network topology view is a great way to visualise what you’ve configured here. So while we’re looking at this diagram, let’s think about the traffic that was inbound. So let’s say, for example, that there’s some traffic that’s bound for one of the virtual machines connected to my web segment. That traffic is going to flow in from the external network. It’s going to hit the Tier Zero gateway. And the tier-zero gateway is going to have a route for the 170-216-ten network. Basically, specifying the next hop is the Tier 1 gateway, and the next hop is reachable over this segment right here. So it’ll forward that traffic along to the Tier 1 gateway. And of course, the Tier One gateway will have a route table with all of its local, directly connected networks, including the LS web segment. And so the Tier One gateway will receive that traffic, look up the destination IP, and forward it to the appropriate segment where it can reach the virtual machine that it is destined for. and in the opposite direction. Let’s say traffic is coming out of WebOne, destined for something on the Internet.
Well, Web One is going to send that traffic to its default gateway. The default gateway was configured on the segment itself: 170, 216, and 10. But we saw that when we create that address on a segment and we associate it with a gateway, the corresponding interface is automatically created on that gateway. So if Web One is trying to send traffic out to the Internet, it’s going to send it over this segment to its default gateway, which is the Tier One gateway. The Tier One gateway is going to do a route table. Look up and say, “Hey, this is some traffic destined for the Internet.” The following hop will be 1064, two eight four. It’ll forward that traffic over this overlay segment towards the Tier 0 gateway so that it can reach the Internet. Okay, so now that we’ve covered some of the basic elements of how this is configured, let’s do a little bit of experimentation. So here I am at the desktop of the console in this lab environment, and again, assume that this desktop for this console is essentially an external machine, right? So this is some machine that is outside of my NSX domain.
And if I type in IP configuration here, you can see the IP addresses 192.168.1.1, 110.0.0.10, which are not connected to any of the segments of my NSX domain. So this is something outside of that NSX environment. And the default gateway for this machine is one of 192, 168, or 110 one. Let’s assume that that’s an interface on a physical router. Now, however, in the last video, we determined that, hey, we can do things like ping virtual machines that are inside of our NSX domain. Like, for example, here I’m going to try toping one of the web servers connected to the Web LS, and it’s working just fine. So now I’m just going to quickly clear my screen, and we’re going to run a trace route. So I’m going to type in trace RTD 170, 216, and 1011. And I’m going to observe the hops that this hits as it makes its way from this external machine to my web server. So the first hop is the default gateway for this machine, 192, 168, and 110 one. The following three hops are 192, 168, and 103. That might look a little familiar if you remember our diagram here. 192, dot 168, dot 100, and dot 3 are the uplink interfaces of the Tier Zero gateway. So the physical router is routing that traffic into the uplink interface of the Tier Zero gateway. And the tier-zero gateway is then forwarding that traffic to the tier-one gateway and the tier-one gateway again. The interface for that one will be 16428-5, where that traffic can eventually reach the Web-1 virtual machine. So you can see now that we have functional north-south routing, and we’ve completed the basic elements of setting up a multi-tiered topology with north-south routing. In the next video, we’ll dig deeper, open up the command line, and examine the route tables.
6. Demo – Using the CLI to View Routes
In this video, we’ll take a closer look at the Tier 0 gateway, and we’re going to do so using the command line. So here I am in the free labs at Holvmware.com, and I have just finished setting up east-west routing on tier one and north-south routing on tier zero. And what I now want to do is connect to one of my NSX edges, NSX edge one, using the command line. And these are all pre-built into the lab environment here. So I’ll just load up the NSX Edge 1 connection parameters and click open, and the password for this interface will be “VMware One.” So you’ll type it twice; make sure you do a capital V and a capital M. Okay, so now I’ve established an SSH session to the NSX edge, and the first command I’m going to issue is get logical router. And here we’ve got a few routers listed here. The one that I’m most concerned with in this particular demo is the service router for tier zero. That’s the router that actually runs on the edge. I’ve also got a distributed router for tier zero that runs on all of my transport nodes, and I’ve got a distributed router for tier one. Notice there is no service router for Tier 1. That’s because I did not enable any services on Tier 1, so it does not need a service router component. And so all of these router instances are running on this edge node.
It’s participating in the distributed router for tier zero, the distributed router for tier one, and it also has the service router for tier zero. And so I’m just going to go ahead and capture this UUID for my Tier Zero service router, and I’ll just click on it here to copy that text. And then I’ll just put in the same command, “Get logical router.” Only this time I will paste in the UUID of the Tier Zero service router, and then I specifically want to look at some of the BGP information. So I’ll type in BGP. And so now I’m getting information about my logical router. It’s going to be this specific logical route that I designated using the UUID. And I want to see BGP details here. And you can see what it’s showing me here. is a bunch of different network paths and the next hop for those different network paths, and it’s showing me other BGP parameters related to those next hops and paths like what is the local preference, what is the weight, and what is the autonomous system path? Now let’s pay a little special attention to 170, 216, and 100. That’s the network for my web logical switch. And we can see the next hop here. The next hop is the interface of the Tier 1 distributed router that is reachable through that automatically created inter-tier overlay segment. And you’ll notice that the next hop is for 170, 216, 20 and 170, 216, 30. All of the next hops are the same, as are all of the weights, and all of the paths in the autonomous system. All of those are the networks that are directly connected to Tier 1. And so, therefore, the traffic is being routed to that Tier 1 distributed router.
In this table here, we’ve also got a default route of 0. The following top numbers are 192, 168, and 101. That is the BGP neighbour of this router. That’s my external physical router. So if the Tier Zero gateway doesn’t know where to send something, it’ll send it to the next hop, which is the physical router. So now I’m just going to hit the little uparrow here, append the word “neighbor” to the end of my command, and hit Enter. And as you can see, it lists the physical routers 192, 168, and 101. It’s showing me the autonomous system of the neighbour in the local autonomous system for the Tier Zero gateway. And I can see how the BGP neighbour relationship is doing here. And then I can see more information about the BGP relationship between these two routers here. I can’t hit Q to get out of that. And I can add to the end of this BGP neighbor’s 192, 168, and 101 routes. And I can see all of the routes that are specific to that one particular BGP neighbor. And it’s simply my default path. Tier zero basically knows how to get to those internal networks that are under tier one, and it also knows how to get everywhere else that is basically through this physical router. That’s the default route. So let’s stop looking at BGP for the moment here, and instead I’m just going to get rid of some of this BGP stuff. And instead, let’s look at the interfaces of the Tier Zero gateway.
And so, yeah, here I can see all of the interfaces for this NSX edge and which network they are connected to. So this is a good way for me to list all of my interfaces and the networks that those interfaces are connected to. So let’s focus in on the Uplink interface. This is the uplink interface that’s being used to connect to the external physical router that has a neighbour relationship with that physical router. I can see that there is a logical interface on my Tier Zero gateway, and it is connected to my VLAN-backed segment, my uplink segment. So those are a few of the simple get commands that we can issue in the CLI of the NSX Edge. It’s a great way for me to take a closer look at my interfaces, at BGP attributes, or at my route tables. And if you want to take a closer look at these commands, I suggest finding the Nsxt 30 command line reference.
7. Service Router Active/Active Availability
Active service routers will operate on edge notes. So we have to have edgenotes for our tier-zero gateways. And this is where our service routers are going to run, and those service routers are going to act as the north-south router for our NSX environment. So any centralised service, including north-south routing, requires a service router on an edge node. And in the scenario where we’re going to have active nodes, we’re going to have multiple edge nodes in a cluster. We also need to have service routers running on edge nodes for tier one if tier one services are used, and we’ll talk about some of those services coming up in a couple of lessons here. We’re going to delve into a lot of the services that run on the edge. But for the moment, just be aware that if we’re enabling any services at Tier 1, aside from just east-west routing, if we’re enabling anything other than east-west routing, we’re going to have these service routers running on the edge nodes.
And so we have active or passive options when it comes to our service routers. And in either case, you’re going to have multiple service router instances running on the edge nodes, and the service routers are in the data plane. They are actively forwarding traffic, and that’s why it’s so important to make sure that they’re highly available. So our traffic is actually flowing through these service routers, and any traffic that’s coming in from the external network or any traffic that’s going out to the external network is going to flow through these. Now, in the situation in which we are doing active operations, we cannot enable stateful services.
So things like the edge firewall, sourcenet destination net, load balancing, and VPN configurations cannot be enabled. On the Tier Zero Gateway You can enable stateless services; those will work just fine in an active configuration, but stateful services do not. So if you require things like the firewall at the edge, you’re going to be using an active-passive configuration or active standby, whatever you want to call it. That’s the way that we’re going to have to set up our service routers if we’re going to use those stateful services. So let’s build out a diagram here and explain how all of this stuff works together. So here we can see on the left that we’ve got two transport nodes; these are ESXi hosts, and these are the transport nodes where some of our virtual machines will actually run. And then here, on the right, we’ve got two edge nodes. And notice that the edge nodes actually have distributed router components as well. So you want to think of the edge node in a lot of ways as being similar to a transport node. I really think of the edge node as a transport node for the SRS. And so it’s going to have distributed routing components; it’s going to have its own tap.
So I’ve got these two edge nodes, and I’m going to have an active configuration with SRS running on each of those edge nodes. And the thought process here is that, hey, if one of these edge nodes were to fail for whatever reason, I’m still going to have a service router up and running. And so let’s build out our diagram a little bit more here. Let’s add the transit segment for inter-tier. That’s what we see in this diagram. So if traffic needs to flow to the external physical network, for example, let’s just highlight this real quick. Let’s say I have a virtual machine here in my diagram, and the virtual machine is generating some traffic that is headed for the Internet, for example, right? So here’s my VM. VM generates traffic for the Internet. It hits the default gateway, which is the distributed router. The DR looks at its route table, and the SR is the next hop. So it forwards the packet onto the transit segment, where it can reach the service router. So that’s essentially how this is going to work. And we’re just assuming that this one is the active service router. And by the way, if we’re going in the opposite direction of that, if traffic is coming inbound from the Internet, it’s going to hit the service router. The service router is going to look at the destination network and forward it to its next hop, which happens to be the distributed router. And then the tap will be used to encapsulate that traffic and send it to whatever host it needs to go to. So we have the service router and distributed router components running on the edge nodes and the transport nodes, respectively. And then our service routers have these external connections to the physical network on a VLAN segment. And you’ll notice here that the two external interfaces have different IP addresses. On the left, we’ve got 192, 168, and 243, and on the right, we’ve got 253. So I’ve got these two different IP addresses for this service router. One IP on one edge node and one IP on another And the basic idea here is that the routing protocol is going to be used to detect failures. So if service router one here on this first edge node is active but then fails, the physical router will detect that failure. And when a routing protocol like BGP detects a failure, it tries to converge.
It tries to see if there are alternate routes to get to the same networks. So, if I’ve configured my physical network in a highly available manner, BGP will be able to detect that failure and route around it. So we’ll talk about that more in just a moment. But what I want to do now is take a little time to talk about services. And I mentioned that if I enable stateful services, I won’t be able to do active, active. Let’s talk about why that is. Assume once more that I have a virtual machine running on this host here on the left ESXi one. And the virtual machine generates some sort of traffic that’s bound for the Internet, and the distributed router routes it over to a service router running on edge node 1, and we’ve enabled the firewall on the service router. And so this firewall is a stateful firewall, which basically means this. Let’s say that the firewall allows traffic outbound. It has a rule that allows that traffic to go outbound. Well, with a stateful firewall, what essentially happens is that when it allows traffic outbound, it will automatically allow any corresponding return traffic inbound. So I don’t need a special inbound rule. So here comes the return traffic from whatever we were communicating with. It flows through this edge firewall because, again, it’s a stateful firewall and can reach the destination. That’s how a stateful firewall works.
So let’s think about what could possibly go wrong here. If I’m utilising a stateful firewall at the edge and I’ve got this automatically generated rule to dynamically allow return traffic in, it exists on this service router instance. What if the destination on the Internet responds, and the traffic were to flow through this physical router over here and hit this service router? This service router lacks the corresponding rule to allow that return traffic in dynamically. So basically what we’re looking at in this active-active configuration is that we could have something like equal-cost multipathing, where the traffic from our physical routers is being evenly distributed to these two service routers. Or maybe it’s somewhere in the physical network, evenly distributed across these two physical routers. Either way, the possibility exists that this VM may send traffic out; the traffic may go out one way and come back in another. And that doesn’t work with our state services.
So that’s why we have to configure an active-standby configuration if we want to do things like enable the firewall at the edge. Okay, so let’s take a look at how the routing piece of this works. So the SRS is constantly exchanging routing information with the physical routers. And so, from the perspective of the physical routers, they’re getting advertisements for all of the segments in our NSX domain. I may have created multiple layers and two segments here. Those layer-two segments could be connected to a tier-one distributed router, or they could be connected to a tier-zero distributed router. Either way, the service router for tier zero is going to find out about those networks and advertise them to the physical routers. So the physical routers are getting those route advertisements from the service router. And in this case, we’ve got two physical routers. So we’ve got a very nice redundant design here. Physical router: from this service router, one can learn about all of the routes. Two is learning about all the routes from this service router, and then maybe Physical Router One and Physical Router Two are also exchanging route tables with each other as well. and that could be the case. That could be a kind of redundant configuration. But let’s think about another scenario. This is one of them from the Nsxt reference design. What if we have a situation in which we have two physical routers and their route tables are not an exact match? Like, maybe one of the physical routers is the way out to the Internet, and it has a routing adjacency to this service router, and another physical router connects to the corporate land, and it also has a routing adjacency to this service router.
So to get to those different places, we have to use different physical routes. Well, in this situation, we can do something called “enabling inter-service routing.” And with inter-SR routing, what’s going to happen is that an IBGP session is going to be established between the two service routers. So this is just an option that I can enable in my Tier Zero gateway. And now we have both service routers with peering relationships on different physical routers. But anything that service router one learns, it advertises to service router two, and vice versa. And so what this means for the inside of my network is that the appropriate service router is going to be used for the appropriate external destinations. So maybe some traffic is destined for the corporate land. It hits the service router instance on the edge. Node One. That service router is simply going to forward that traffic to the service router on Edge Node Two. So this is kind of an unusual scenario here, but one that definitely comes up and is referenced in the NsXT reference design guide.
Let’s take a look at a more common scenario in which we’ve just established physical redundancy overall. So now we’ve got the same two physical routers—one for the Internet and one for corporate land—but both of them have physical connectivity to EdgeNode One and EdgeNode Two. So now each of my service routers is learning about the routes to the Internet and the routes to the corporate network. And so if there is a failure of either service router, there’s still connectivity to both of these networks, and the traffic can just route through the surviving service router. And when that does happen, what will happen? Let’s say, for example, that edge node one fails. So let’s assume that edge node one goes down.
As a result, this service router goes down, and as a result, physical Router One is going to detect, “Hey, this neighbour is down; it’s not working anymore.” But I do still have a route still. So I’ll just send all of the traffic towards this other router for anything inside the NSX domain. And Physical Router Two is going to do the same thing. This is part of BGP; it’s going to detect that this neighbor is now down and use the other route that it has. So the healing mechanism here for active patients is the routing protocol. That’s how failures are going to be detected, and that’s how we’re going to redirect traffic around a failed edge node. With active passive, things are a little bit different. We’re not going to use the routing protocol as the mechanism to fail over in that case. And you’ll see active development demonstrated in some of the lab demos that I’m going to provide here. We’ll take a deep look at the route table, the addressing, and how all of that stuff works in the demo.