See my post on configuration and migration to multihop FCoE for details on my lab setup – http://jeremywaldrop.wordpress.com/2013/04/11/cisco-ucs-fcoe-multihop-configuration-and-migration/
When I first configured UCS multihop FCoE I experienced terrible SAN performance. It was so bad that it took 20 minutes to boot a single virtual machine.
I didn’t have much time to troubleshoot as my co-workers needed the lab to be functional to use their test VMs. I posted the question on the Cisco UCS support community – https://supportforums.cisco.com/message/3898514
There were a few responses from folks that seemed to have the same issue so I chalked it up to a bug or something with the generation 1 hardware our lab is on.
After about a month I came to realize that this probably isn’t a bug or either nobody is implementing multihop FCoE. I asked around some more and came across some folks at Cisco that were successfully using multihop FCoE.
I did some more research and thinking about it and came to realize that this could be a QoS issue. I considered this because we typically modify the default UCS QoS system classes and create QoS polices for every traffic type. We mostly do this to place some guard rails around vMotion traffic.
Our typical QoS configuration had us using the Platinum class for VM traffic and this turned out to be the root of the problem.
By default both the Platinum and FC QoS priorities have pause no-drop enabled. This configuration worked fine when using native FC but with FCoE multihop this presented a problem because the pause no-drop was then being used on two different qos groups on the same interfaces.
As soon as I enabled no-drop on the Platinum QoS Priority in UCSM my SAN performance issues went away.
Here is a screen shot of the UCSM QoS System configuration.