1. Introduction to NSX-T Distributed Routing
At a very high level here. So the first concept that I want to explain is the concept of east-west routing. And so when we’re talking about East-West, basically what I want you to think about is routing between systems in the same network. So for example, with NSX, we’re talking about routing between virtual machines that exist within our NSX domain. So here in this situation, we see two virtual machines, and they’re running on the same ESXi host. And in this case, we’re going to assume there is no NSX present in this environment. So I’ve got two VMs, and you’ll notice the IP addresses of the virtual machines. They are on different networks. So if I want to allow traffic between VM One and VM Two, I have to route that traffic. VMs are on different networks. So here you see VM One sending a packet to VM Two. Because that packet is intended for a different network, the 10/1/1 network, VM One will simply forward it to its default gateway. So the packet flows out through the physical adapter, hits a physical switch in the network, and maybe that’s the end of it.
It might actually hit a layer 3 switch. But let’s just assume that it hits a physical router in the physical network. And now the router has the job of receiving this packet, analysing the destination, and routing that packet onto a network for that destination. So the router must have an interface on the same network as this destination virtual machine. The router will route the packet out of that interface. It’ll go through the physical network, through the physical adapter of the ESXi host, and eventually reach the virtual machine. And so this presents a few inefficiencies. Number one, the packet is unnecessarily traversing multiple physical hops. This is a packet that should just stay inside the CSXI host, but instead it’s going through a physical adapter over a cable, hitting a switch over another cable, hitting a router through the router, back through the switch, back through the physical adapter, and eventually to the virtual machine. We’re introducing latency that we really don’t need to introduce here. And number two, it’s a physical router and a physical switch. So we’re doing significant administration of our physical network here. If I want to create new networks, I have to modify the physical network hardware. So one potential solution is to deploy a virtual machine and do east-west routing within the VM. So here you can see a couple of ESXi hosts, and there is a virtual machine router deployed on this ESXi host on the right. This really hasn’t changed our situation very much, though. By deploying it as a VM, it’s going to exist on one host or maybe a set of hosts. But the packet is still going to have to flow over the physical network to hit this virtual router that’s running on a different host.
And then the virtual router can look at the destination network, route it onto a different VLAN, and perhaps send that packet out onto that destination VLAN so that it can reach the destination virtual machine. So establishing east-west routing with a virtual machine does give us some benefits, simply in that we can now run our router as a VM and we can have all of the benefits inherent to running a virtual machine, like being able to take snapshots and things like that. So now let’s take a look at how this changes with a distributed router. So the distributed router is actually a kernel module that runs inside the ESXi host. It’s not a virtual machine. It’s installed using a VMware installation bundle, or VIP, and it’s distributed to all of these ESXi hosts that are participating in NSX. So you’ve basically got this little router kernel module running on the ESXi hosts. Now, in the past, the distributed router has supported both OSPF and BGP, but now with Nsxt, that OSPF support has been removed, and the distributed router does support IP version 6. So we’ve got a real basic diagram here. Let’s break this down. I’ve got two virtual machines running on the same ESXi host. And again, you’ll notice these virtual machines are on different networks. So VM One has an IP address that’s on one network. VM Two has an IP address that’s on a different network. So traffic from VM One to VM Two is going to need to be routed. So VM One generates a packet bound for VM Two. That packet is sent to the default gateway, which in this case happens to be the distributed router. The distributed router is running as a kernel module locally on that ESXi host. So the distributed router receives this packet, analyses the destination address, and forwards it to the appropriate network without the traffic ever leaving that ESXi host. So now the traffic didn’t need to flow out of the host to hit a physical router.
It didn’t need to flow out of the host to hit a virtual machine or an edge node or anything like that. The routing is done right within the ESXi host itself. And here you can see a slightly more complex example, but it’s still the same concept. We’ve got virtual machines on multiple different VNIS, multiple different layers, and two segments, but they all exist on the same ESXi host. And so the distributed router can route between all of these networks without that traffic ever leaving the ESXi host. And so what you could have is maybe something like an application tier, a web tier, or a database tier, and enable routing between all of those different tiers using this distributed router. Okay, so let’s take a little bit of a deeper look here as to how this distributed router works and follow the packet as it moves from VM One to VM Two. So in this case, VM One is connected to an alogical switch layer two segment that we’ve created with NSX. That layer-two segment is also present on ESXi 2. So these two hosts are part of the same overlay transport zone.
And I’ve got a second layer-two segment that is also present on both of these hosts. You can see here that I’ve got one VM, VM One, running on this segment, and I’ve got VM Two running on a different segment, and they’re also running on different ESXi hosts. So let’s say that VM One generates some kind of packet bound for Virtual Machine Two, and let’s just say that it’s a ping. All right? As a result, VM generates this ping. The destination IP is the IP address of VM Two, which is on a different network. So VM One is going to send that ping towards its default gateway. and the default gateway is the distributed router. And the distributed router essentially acts as what we call a “first hop” router. It’s going to get the packet from the source, and it’s going to do this first-hop routing. It’s going to say, “Okay, I’ve got a packet coming in, and it’s destined for some subnet.” It’s destined for an IP address in a certain range, and it’ll check its route table and determine whether I have a route for this? And of course, the distributed router is connected to VNI 5001. So it’s got a directly connected interface for that network. So it uses its route table, selects the interface connected to that network, and routes the packet towards that network.
And at this point, the job of the distributed router is 100% complete. The packet was received on one network, got routed to a different network, and now it’s on the layer 2 segment that it should be on. It’s already on the destination layer two segment. It’s on VNI 5001, and that’s where the destination VM is. So the routing process is actually 100% complete. At this point, however, the packet is still sitting here on ESXi One, while the destination VM is on ESXi Two. So what happens at this point? Well, now our gene-overlay network is in use. So what’s going to happen now is that this ping is destined for the Mac address of VM Two. And a Mac table lookup is going to be done, and the tap here is going to determine, OK, to get to the Mac address of VM 2, where do I send this? And the Mac table will reflect that it needs to be sent to tap two. And at that point, it gets wrapped up in a Genev header with the destination IP of this tap and with the destination Mac of this router here. We’ve got a router in the physical network, and at that point it’ll be encapsulated. It will flow over the physical network. It will hit this router. The router will look at the destination IP. It will forward it out through the appropriate physical interface towards Tap 2. Tap Two is going to begin to unwrap that outer header. Tap Two is going to determine, “Hey, I’m the destination IP.” And as it pulls off that outer header, one of the aspects of the outer header is the VNI. Now the tap knows which VNI to drop that packet into, and so that’s what it’ll do. So Tap Two receives, decapitulates by pulling off the outer header, and then dumps the traffic on VNI 5001, where it can reach the destination virtual machine. So that’s the distributed router in NSX. It’s performing the initial routing, the first-hop routing, right on the source ESXi host and right on the source transport node. And in the case of east-west routing, that’s the only routing that needs to happen—that routing right at the source host. So it’s eliminating the unnecessary consumption of physical bandwidth or utilising a physical router to get that packet from point A to point B.
2. NSX-T Single Tier Routing Architecture
So for the purposes of this first lesson, we’re not going to talk about Tier 1 at all. We’re just going to explain how logical routing works with only a tier-zero router. So let’s start by breaking down this diagram that we’re going to be working off of. You can see we’ve got three ESXi hosts. These are transport nodes in our NSX domain: ESXi 0102 and zero three. On the far right, we’ve got something called an edge node. And the edge node essentially represents the boundary between our NSX network and our external networks, or things that are not controlled by NSX. It could be our corporate land, the Internet, or anything else outside of the NSX domain. The edge node is essentially the boundary for those networks. And then you can see we’ve got a VLAN and an overlay transport zone created. And the transport zones that we’ve established include not only these transport nodes but also the edge node as well. So the edge node could be a physical server, but most of the time you’ll see it as a virtual machine. So let’s assume that this edge node is a virtual machine running on an ESXi host. It’s got its own tap zone tunnel endpoint, and it is participating in the same transport zones as my transport notes. And then, at the bottom, we’ve got our physical adapters.
We’ve got a physical network. We’ve designated VLAN 10 as our transport VLAN. So any traffic that’s being encapsulated by the V taps and sent onto the physical network is using VLAN 10. All right, so that’s the breakdown of our diagram here at the starting point. And now we start creating some layer-two segments. So within our transport zone, we’ve now created these two layering segments. We’ve got VNI 5001—maybe that’s my web logical switch. And as a matter of fact, I just updated my diagram to give those segments a little bit more of a descriptive name. So within this overlay transport zone, I’ve got a web logical switch and an app logical switch. And I’ve got certain virtual machines that are connected to the web logical switch and other virtual machines that are connected to the app logical switch. And at the moment, those layer-two segments are isolated from each other because I don’t have any kind of router in place. So anything connected to the web logical switch cannot communicate with anything on the app logical switch. So I’ve got to introduce some kind of routing component to allow my web and app virtual machines to communicate with each other. And here it is. It’s a distributed router. So I’ve got something called a “tier zero distributed router.” And the Tier Zero distributed router consists of a distributed routing component and a centralised component called the Services router, or SR. And so what I want you to think of this as is a distributed router.
All of these ESXi hosts have a router component running as a kernel module, and my edge node has the same distributed routing component running. It will also perform packet routing on the first hop. So if you’re not familiar with that concept, watch the last video. I broke that down. Okay, so now on each of these hosts, I have a Tier 0 distributed router that can handle first hop routing, right? So for example, if Web One wanted to send a packet to, let’s say, App Two, the packet would come out, hit the logical switch, and go to the default gateway, which is an interface on the Tier Zero Distributed Router. That router would route it toward the logical switch for the application. It would get encapsulated by the TEP flow over the physical network on VLAN 10, get decapsulated by this TEP, get delivered to the appropriate VNI, and then get delivered to the virtual machine. So adding the Tier Zero DistributedRouter gives us that east-west routing functionality within our NSX domain. So when you create one of these Layer 2 segments, you establish an IP address for the default gateway. And it’s kind of funny because we’ve got DistributedRouter modules running on each of these hosts. We didn’t establish a different IP address for each of these hosts. How does this work? Think about it this way. If I V motion Web One to ESXi 0 2, it still needs to see the same IP address and Mac address for its default gateway. I don’t want those things to change. So, basically, each host has a Mac table that sends the frame to the Dr instance on that host.
If Web One generates some traffic bound for its default gateway, it’s not going to get VXLIN encapsulated and sent to this logical routing instance. It’s going to be sent to whatever local kernel module exists on that particular ESXi host. And so there’s something called a VMAC where essentially each one of these distributed router kernel modules has an identical Mac address, but that identical Mac address is localised to that particular ESXi host. That way, if I take a virtual machine and move it, it’s still going to see the same Mac address consistently across each host. And so that’s just a little concept to kind of keep in mind. You may really never need to think about that. It may not even be important to you at all. But I just like to kind of straighten that out for the folks who are used to working with networking. You’re used to thinking, “Hey, everything needs a unique Mac; everything needs a unique IP.” That’s a little bit of a break from the way that we used to think about things with routing. We’re going to have this kernel module. We’ll duplicate the IP and Mac across it. And so the distributed router has something called logical interfaces, or lifts. And one logical interface on the distributed router is connected to this web logical switch, and one logical interface is connected to this app logical switch. When you configure your layer-2 segment, you are essentially establishing the layer-2 segments that will connect to this Tier-0 distributed router.
So if I create a third layer with two segments now, I’m going to give it an IP address for the default gateway. That IP address is going to get automatically established on the Tier Zero distributed router. Okay? So let’s switch over to the Nsxt reference design guide, my favourite reference for Nsxt here. And we’re going to take a look at a couple of the diagrams that they’ve provided because I really like these two diagrams where you can see it from a logical perspective and a physical perspective. So from a logical perspective, over here on the left, we see a tier-zero gateway, and we can see we’ve got multiple virtual machines connected to this one segment and some virtual machines connected to the app segment. And from a logical perspective, it’s as if there are two different layer 2 switches, and they’re connected to a router, right? But from a physical perspective, those virtual machines may be distributed across multiple transport nodes. There are kernel modules running out of each of those transport nodes to handle that routing. And on the same document, you’ll find some packet walks, essentially walking you through. Hey, what happens if Web One generates a packet for App One? Well, it’s going to send it to its default gateway, which is the distributed router. So we have this interface that we set up on the distributed router when we created the layer-two segment. The distributed router will do a route table lookup, find this 172.16.20 network on another directly connected logical interface, and route it out onto the app segment. And once it’s on the app segment, it can reach the destination virtual machine. The process gets a little bit more complex if you’ve got a destination VM that’s on a different hypervisor, but the routing isn’t more complex at all. The routing is exactly the same. So Web One generates a packet destined for App.
Two It hits the distributed router, and the distributed router, which has an interface on that same segment, does the routing on the source host. And then after that, it’s just encapsulation using Genev and getting that packet over to the tap on the destination host so that it can reach the destination VM. Okay? So now, as a little final note on the slide, remember that when we create a segment, a layer-two segment, it is created on every host inside the associated transport zone. The distributed router is also configured as a forever node within that transport zone, so they automatically align with each other. Okay, so what we just looked at was the distributed router component that exists on all of our transport nodes. There’s also a services router component that is there to handle things like north-south routing and also to provide services like Nat, DHCP, load balancing, VPN, a gateway, and firewall bridging. And as we get deeper into this course, we’re going to have lessons on many of these functions that the services router performs so that we can learn how it actually does these things. But let’s start by just kind of breaking it down as to how it exists in our diagram.
So our Tier Zero gateway has the distributed routing component that we’ve been looking for, but it also has this centralised services router, the distributed router. This is our east-west piece. This is for routing within the NSX domain. It has logical interfaces on all of our layers, two segments, right? So that’s the purpose of the distributed router. Over here on the right, this is the edge node. And remember, the edge node can be a physical server or it can be a virtual machine. And it has its own tap as well, so that it has the ability to participate in our VLAN and overlay transport zones. So there’s still a distributed router on the edge node. But the job of the edge node is basically to host the service router. So whenever we enable one of those services that we saw on the previous slide, this service’s router needs to be instantiated. It is also necessary for north-south routing. And so the services that we configure on the services routers are going to run on these services router nodes at the edge node. And this is what provides us connectivity to the external network. Now, this is not the underlay network. Remember, down here at the bottom of the screen, we see the physical underlay. This is what connects all of our nodes. This is a different network.
The tier-zero services router has connectivity to some external segment, some external VLAN, that my physical router exists on. And this interface will support static or BGP routing. So it’s basically an uplink to the external network and then interconnecting these components. We have something called an “intra-tier transit link.” So the intra-tier transit link used to be something that you would manually create in NSXV. We would create something called a transit network. This is an internal link between the distributed router components, connecting them up to the service router components. And so now we’ve got this segment essentially connecting to the services router. So if we’ve got traffic that needs to get routed out to some northbound network, the tier-zero router has a way to send that transit traffic to the tier-zero services router. That’s what we’re going to break down in the next slide. Let’s look at a packet walk and really explore how this traffic flows. Our diagram is starting to get a little bit complex, so let’s kind of dial it back a little bit here and just really examine how traffic flows from a VM to some external network. And again, we want to think of this service router as the border between NSX and the external network. So I need to go outside of my east-west routing domain; I need to go outside of NSX. Maybe there is a physical server in my data center. Maybe I need to send traffic to the Internet. That’s where the service router comes in. So here in our diagram, we’ve got a virtual machine called VM-1 running on a transport node and an ESXi host. And within that ESXi host, we’ve got our Tier Zero distributed router. We’ve also got that at the edge node here as well. And the VM is connected to a layer-two segment. So the VM generates some traffic bound for the Internet. That traffic is going to go to its default gateway. And in this case, the default gateway is a logical interface on the distributed router. So the packet hits the distributed router.
The distributed router analyses its routing table and says this is bound for the Internet. Let me check the next hop. And in this case, the distributed router doesn’t have a route for every destination on the Internet. It doesn’t have a route table with every possible destination listed because that would be too many routes. But what it does have is a default route. And so the default route for this distributed router is the services router. So the distributed router will now perform first hop routing and send that packet to a network where it can reach the next hop. In this case, the next hop is the services router. So we have this inter-tier transit network that was generated automatically. That’s the network that the distributed router forwards this onto. It gets encapsulated by the tap, moves over the physical underlay network, gets decapsulated by the receiving tap on the edge node, and it arrives on the local network of the edge node, where it can reach the services router.
And so the services router is actually running on this edge node, and the services router performs a routing lookup. It determines that the route to this destination was learned from an external interface. Or maybe it’s just going to use its default route. One way or another, the services router has a route for this to send it out towards this external router, a physical router on our physical network. Okay? So in order to keep things somewhat simple here, and I know you may be thinking, “Rick, this doesn’t seem all that simple,” but it can get a lot more complex. So to keep things simple here, I’ve only shown you a single edge node. We’re probably going to have multiple edge nodes for redundancy purposes. So we’re going to want at least two of these edge nodes, and there are different ways you can set them up for high availability. We’ll talk about those later. But what I really want you to understand now is the fact that the edge node services router is our north-south routing component. That’s when we set this up. There is an auto-generated transit link for inter-tier traffic between the tier-zero distributed router component and the tier-zero services router component. And so in this example scenario, single-tier routing works just fine. We don’t need multiple routing tiers. It gets the job done just fine with a single-tier routing model. In the next module, we’re going to break down multi-tier and explain what the use cases are. And it gets a little bit more complex once we move into multi-tier architecture.