RoCE/RDMA/DCB what is it and how to configure it

Updated May 26th 2018 with HPE FlexFabric config

You have probably heard these acronyms somewhere, so what are these and are they the same. In short yes and no

RoCE stands for RDMA over Converged Ethernet, the RDMA part is Remote Direct Memory Access.

RDMA allows for network data(TCP packets) to be offloaded on the Network cards and put directly in to the memory, bypassing the hosts CPU. Allowing for the host to have all the access to the CPU. In normal TCP offload all the network traffic goes trough the CPU and with higher speeds will take more CPU. On a 10gbit network it would take about 100% cpu on a 12 core Intel Xeon V4 CPU.

Mellanox has a good explanation for RDMA here.

DCB stands for Data Center Bridging

What it contains are enhancements to Ethernet communication protocol. Ethernet is a best-effort network that may experience packet loss when network devices are busy, creating re transmission. DCB allows for selected traffic to have zero packet loss. It eliminates loss due to queue overflow and to be able to allocate bandwidth on links. DCB allows for different priorities of packets being sent over the network.

 

In this post i will cover how to enable RDMA and DCB in Windows for SMB and on different switches. I will update with more switches as i read trough different vendors configuration. As the setup varies a lot from vendor to vendor.

Switches and Vendors that is covered in this post

Dell

N4000 series
Force 10 S4810p, S6000, S6000-on(FTOS)

Cisco

Nexus NX-OS

Mellanox

SN2100

HPE

FlexFabric 5700/5900

Quanta

LB8

 

How to configure Windows Server 2012, 2012R2 and 2016 with RDMA and DCB

For SMB you will need to install WindowsFeauture Data-Center-Bridging

Reboot the server and let’s configure the DCB settings. SMB always use Priority 3, you can use any other, but best practice is 3.

After the QOS part is done, let’s configure a network team or a switch. For S2D one uses a setswitch with Embededteaming

Let’s create some network cards and enable RDMA on them. Once RDMA is enabled DCB will also be enabled for SMB.

To check if RDMA is enabled you can run this command

Now DCB and RDMA is configured in Windows, let’s move to the switch setup.

 

This is where the hard part is, figuring out the correct setup for your switch. Most switch vendors support this.

Dell N4000 series

Turn off flowcontrol on all interfaces.

What you set here is that we have sett traffic class 3 into group 0, and we have set max and min bandwith on the groups. The groups are 0,1,2. This gives max bandwith for group 0 and 1 50% each.  Then we enable the DCB config on the interfaces with mode on. And with priority 3 no-drop we enable the no packet drop on the traffic class 3.

Dell Force 10 S4810p

Turn off flowcontrol on all interfaces.

Dell Force 10 S6000, S6000-On(FTOS)

Turn off flowcontrol on all interfaces.

Cisco Nexus NX-OS

By default PFC(Priority Flow Control) is enabled on Cisco Nexus switches. To hard enable it do the following.

Cisco Nexus 3132  NX-OS 6.0(2)U6(1)

By default PFC(Priority Flow Control) is enabled on Cisco Nexus switches. To hard enable it do the following.

 

Mellanox SN2100

 

HPE FlexFabric 5700/5900 series

Quanta

This is the basic how to enable, not had the chance to test this out my self yet. So this will be updated as the manual is not straight forward.

 

21 thoughts on “RoCE/RDMA/DCB what is it and how to configure it

  • June 9, 2018 at 7:51 pm
    Permalink

    Hi JT, what a great information shared here. Thanks!

    But I have a question with regards to my setup. I have 4 10Gbe Mellanox ConnextX Pro3. 2 of the ports are teamed together using SET. I have enabled PFC for group 3 for use of my Live Migration traffic.

    Another 2 ports, not team and use for SMB traffic. I have also created PFC priority 3 for this. On Windows 2016, I have also enabled the same priority 3 / 99% weight as both ports is solely used for SMB traffic. The problem I’m facing is, when I run Test-RDMA.ps1, it keeps showing me error that physical switch need to be configure for PFC. I am lost and confused. Can you guide me what I did wrong here? I have disabled vlan tagging for that ports as well.

    Also, can I have 2 different PFC on same priority group 3 on my switches?

    Thanks in advance for sharing your knowledge.

    Reply
    • June 10, 2018 at 3:52 pm
      Permalink

      What switches are you using? And are the supported for DCB and PFC?

      Jan-Tore

      Reply
      • June 11, 2018 at 11:00 am
        Permalink

        Hi JT, its Dell S4048-on and it support DCB and PFC. Thanks

        Reply
        • June 12, 2018 at 1:36 pm
          Permalink

          Have you confiugerd the Dell switches for DCB and PFC?

          Regards
          Jan-Tore

          Reply
  • May 13, 2018 at 10:16 pm
    Permalink

    Hello,
    Does the HPE 5700 Support RDMA/ROCE?

    I don’t See IEEE 802.1Qaz Enhanced Transmission Selection (ETS) available on this switch,

    However, ECN, DCB and PFC are available.

    Thanks

    Reply
    • May 13, 2018 at 10:57 pm
      Permalink

      I am helping someone with a 5700 right now. Will update once i have been able to look into it. They are having RDMA issues, so will let you know if it’s working or not.

      Unless you have them, get something else. They are not too easy to configure. Some HPE Aruba or Dell S/Z series, Lenovo NE series.

      Regards
      Jan-Tore

      Reply
    • May 20, 2018 at 12:04 pm
      Permalink

      Hello Jan,

      Did you have a chance to check the HPE 5700 with RoCE?

      Thanks

      Reply
      • May 26, 2018 at 5:23 pm
        Permalink

        The basic config should be similar. But there is very little info out there on the FlexFabric DCB/PFC config. And i don’t have access to HPE support site to check for more docs. And i have not gotten the complete config.

        But from what i could see the specefic config on the switch for DCB is this.

        priority-flow-control auto
        priority-flow-control no-drop dot1p 3
        qos trust dot1p

        But i would say that there might be some config missing. but i can’t confirm as i don’t have full config example for 5700. I did find one for 5940, still a bit not sure about the HPE setup. But il go trough this guide and see what i figure out.

        http://manualzz.com/doc/32098665/rdma-over-converged-ethernet–roce–design-guide

        Reply
        • May 27, 2018 at 9:26 pm
          Permalink

          Thanks mate, that’s very useful.

          Reply
  • March 19, 2018 at 8:34 pm
    Permalink

    Are you trying to turn on flow-control on these switches? The wording seems backwards from the configurations you’re showing.

    Reply
    • March 20, 2018 at 12:49 pm
      Permalink

      Depends on wich switch you are talking about. But yes with RoCE you want PFC to be on 🙂

      Jan-Tore

      Reply
      • March 23, 2018 at 6:09 pm
        Permalink

        To Patrick’s comment, in regards to the Dell Force 10 S4810p you said “turn off flowcontrol for all interfaces”. We have other servers plugged into other interfaces of the switches with flowcontrol enable (flowcontrol rx on tx on). Will the configuration affect/conflict with these interfaces? DCB-Map are not applied to those non-rdma interfaces.

        -ken

        Reply
  • March 14, 2018 at 4:22 am
    Permalink

    Hello JT.
    I wanted to let you know that we found this blog post extremely helpful. I do have a question if you have a moment. We have Cisco Nexus 9000 series switches with NX-OS 7. My network admins said that the values you provided we not allowed. You posted this: pause buffer-size 20000 pause-threshold 100 resume-threshold 1000 pfc-cos 3, but they said the minimum values they could set were this: pause buffer-size 27456 pause-threshold 12480 resume-threshold 12480 pfc-cos 3. Can you help me out here? I want to make sure I have it right. We are setting up a Storage Space Direct Cluster.
    Thanks
    -Matthew

    Reply
    • March 17, 2018 at 9:52 am
      Permalink

      Thanks for the feedback.

      My guide is a baseline for how to set it up. The os might change as new versions come out. There is a guide for Nexus 3132 in the official MS doc and it does not have these settings. As it’s a diffrent NX-OS i belive. But what i recomend is using my baseline, do a google search of the latest NX-OS and see what they put in there. I will update my post with the guidelines for the NX3132 switch.

      But if the minimum threshold’s have change i do not see any reason not to use the new values. But always refer to the latest CLI guide for the OS you are running. If you get it to work, let me know and il update the blog post.

      JT

      Reply
  • March 12, 2018 at 11:37 pm
    Permalink

    Great post JT!

    Do you think RoCE/RDMA/DCB will work on Juniper EX4550s? If so, have you tried it and can you share the configs?

    KL

    Reply
    • March 17, 2018 at 9:47 am
      Permalink

      To be honest i do not know. I have no experience with Juniper, they do say that it support DCB and PFC. But no mention of RoCE only FCoE. I think you will need to dig deep to find the correct info/config. You could ofc ask juniper. But you would need to to a lot of googling 🙂

      Let me know if you figure it out.

      Oh and remember to turn off DCBX as it’s not supported with S2D.

      JT

      Reply
      • May 28, 2018 at 10:29 am
        Permalink

        Hi,

        We have configured Juniper QFX5100 to work with DCB and PFC, it only supports RoCEv1.
        To support RoCEv2 you need 17.4 release and at least QFX5110 modell.
        I am working with JTAC on some PFC issues and I will ask them if EX4550 is supported for RoCEv1.

        NS

        Reply
        • May 28, 2018 at 12:15 pm
          Permalink

          Thanks for this 🙂

          You know how to change the settings in the OS to work on RoCEv1? If it’s Mellanox cards it’s a registry key.

          Regards
          Jan-Tore Pedersen

          Reply
          • May 28, 2018 at 12:28 pm
            Permalink

            Yes we are using Mellanox cards.

            # The following RoCE modes are supported:
            # •RoCE V1 MAC based (legacy) : 1
            # •RoCE V2 IP based (routable) : 2

            # Check status on Mellanox NIC
            Get-MlnxDriverCoreSetting

            # Set RoceMode
            Set-MlnxDriverCoreSetting –RoceMode 1

  • June 29, 2017 at 6:04 pm
    Permalink

    Just a note, I believe your Dell Force 10 S4810 config is slightly off. You are marking Priority 4 not Priority 3 with the current command of “priority-pgid 1 1 1 1 0 1 1 1”

    Reply
    • June 30, 2017 at 10:29 am
      Permalink

      Thanks you for pointing that out, you are absolutely correct 🙂 I write mistake from my side.

      Thanks
      JT

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

* Checkbox GDPR is required

*

I agree