In this post I am going to walk through the northbound frame flows in the Cisco UCS system. Hopefully this information will help you better understand UCS networking.
In other posts I will walk through the southbound and maybe east-west frame flows. I am breaking the posts down into multiple posts to make it easier for the reader to get through and understand.
I learned most of the information in this post from the Cisco Live 2010 session “Network Redundancy and Load Balancing Designs for UCS Blade Servers” presented by Sean McGee of Cisco. Some of the diagrams in this post were also taken from that same session.
Acronyms used in this post:
CNA – Converged Network Adapters on the mezzanine card
FEX – Fabric Extender 2104XP that lives in the chassis. There are usually two of these per chassis, one for Fabric A and on for Fabric B.
FI = Fabric Interconnect, the 6120s/6140s. Usually deployed as a pair in an HA configuration.
IOM – I/O Module, another name for the FEX.
First lets take a look at the physical components within the UCS System. Below is a logical and physical diagram of the components that make up UCS.
Now lets take a look at northbound frame flows. The following diagram lists the decision points that take place as the frame travels from the OS up through and out of UCS to the LAN.
First we will start with the northbound traffic leaving from the Operating System.
- OS creates the frame – The happens the same way in UCS as is it does in any other hardware.
- The OS then has to decide which PCIe Ethernet interface to forward the frame out of. If there multiple PCIe Ethernet interfaces the OS NIC teaming software decides which interface to use based on a hashing algorithm or a round robin mechanism. In the case of VMware and the VIC (Palo) this could be from 1 to 58 choices, realistically you probably will have less than 10.
- If there is just 1 PCIe interface and Fabric Failover is enabled then it is the UCS Menlo ASIC that makes the frame forwarding decision.
After the frame has entered the physical mezzanine CNA another decision must be made as to which one of the 2 physical CNA ports are used to send the frame to the FEX. This decision depends on how the vNIC was configured in UCSM. When configured a vNIC as part of a Service Profile in UCSM you must choose which fabric to place this vNIC on (A or B) and whether or not Fabric Failover is enabled. The screen shot below shows these configuration options.
With this configuration Fabric A will be used unless for some reason it is down and then Fabric B will be used because Fabric Failover is enabled. For the purpose of this post we will assume the frame is exiting CNA Port 1 that is physically pinned to FEX (Fabric Extension Module in the back of the chassis) 1 on the mid-plane of the chassis. FEX 1 is the left hand FEX as you are looking at the back of the chassis, see the diagram below.
The FEX have 8 10GE KR back-end mid-plane ports or traces and 4 front-end 10GE SFP+ ports. The picture below shows the 8 back-end ports and the 4 front-end ports on a FEX. The 8 back-end ports are not physically visible like the picture shows this is just a good picture that logically shows the ports.
The back-end FEX port that the frame goes across depends on which blade this particular OS is running on. In our example we will say blade 3. There is a blade CNA port to FEX back-end port pinning that happens. For blade 3 CNA 1 the back-end FEX port will be port 3 or Eth1/1/3.
Eth X/Y/Z where
- X = chassis number
- Y = mezzanine card number or CNA number
- Z = FEX (IOM) port number
Below is a screen shot of the output of the cmd “show platform software redwood sts”. This command must be executed from the fex-1# context. To get here from the UCSM cli prompt first type “connect local-mgmt” then “ucs-A(local-mgmt)# connect iom 1” The 1 in this cmd is the chassis number.
Once the frame is in the FEX it has to be forwarded to the Fabric Interconnect across one of the 4 front-end 10GE SFP+ ports. The port the frame goes across depends on how many FEX to Fabric Interconnect cables are connected and which blade it originally came from. There is a blade to IOM uplink pinning that happens based on the number of uplinked SFP+ ports. This pinning is not user controllable.
Here is a table and diagram showing how the pinning works in the three different uplink options.
1 IOM-FI Cable | 2 IOM-FI Cables | 4 IOM-FI Cables |
All blades CNA 1 goes out this one cable, 8:1 oversubscription | slots 1,3,5,7 are pinned to one uplink and slots 2,4,6,8 are pinned to the other, 4:1 oversubscription | slots 1,5 out 1 slots 2,6 out 2 slots 3,7 out 3 slots 4, 8 out 4 |
For our example there are 4 IOM-FI uplinks.
So to summarize the path the frame has traveled up to this point.
- OS generated frame.
- OS chose the PCIe Ethernet Interface using either NIC teaming software. In our case there was only one PCIe Ethernet vNIC assigned to the Service Profile on this blade.
- Frame was forwarded to CNA 1 because we set Fabric A as the preferred fabric unless it is down. The Fabric Failover ASIC chose CNA 1 because Fabric A is up.
- CNA 1 is pinned to IOM 3 because it is blade 3.
- The uplink IOM SFP+ port 2 was used to forward the frame to Fabric Interconnect A because it is blade 3 and there are 4 SFP+ uplinks connected from IOM-FI.
The next step is for Fabric Interconnect A to forward this frame to a northbound L2 LAN switch. Before this can happen the FI has to determine which northbound Ethernet port to send it out of. This decision process is as follows
- Are there any LAN Pin Groups? If manual pinning is configured for this blade’s vNIC 1 to go out a specific uplink or Port Channel then that port or Port Channel is used to forward the frame. If it is Port Channel then the specific interface of the Port Channel that is used is based on the Port Channel hashing algorithm.
- Are Port Channels in use? If it is Port Channel then the specific interface of the Port Channel that is used is based on the Port Channel hashing algorithm.
If there are no LAN Pin Groups or Port Channels then automatic pinning is used in a round robin fashion. If there are 8 blades in the chassis and 4 northbound Ethernet uplinks then there will be 2 blades pinned to each uplink. This works in much the same way the default VMware vSwitch teaming works if you familiar with that.
So I think that about covers the northbound Ethernet frame flows. I stated earlier I will be posting the corresponding south, west and east flows in follow on posts.