Enable L3 PFC + DCQCN for RoCE on Edgecore SONiC

This article is only for reference, as Edgecore SONiC, a customized variant with a lot of proprietary commands, is quite different from community SONiC. Also, some commands and configurations are specialized for certain switch ASICs, such as Intel Tofino I use right now. Thus, I would still suggest do not throw the official guidebook away, read it carefully, and it will save your life.

Concepts

  • DSCP / dot11p Tag: A value embedded in the packet header.
  • Buffer: Receive buffer (SRAM) on switches partitioned into lossy pool and lossless pool.
  • Traffic Class (TC): An intermediate value
  • Priority Group (PG): Receive queue on switches.
  • Queue: Send queue on switches.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
                  Map             =               
┌────► Queue ID ───► PFC Priority
Map │ (Egress) of RX PAUSE
DSCP Tag ─────► TC
│ =
└────► PG ID ──────► PFC Priority
Map (Ingress) of TX PAUSE

│ Bind
├──────► Lossless
│ Buffer
│ Bind
└──────► Lossy
Buffer

Fig. Numerical Relationship between terminologies
for Intel Tofino 2 on EdgeCore SONiC OS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
              ┌───────────────────────────────────────────────────────────────────────┐                          
│ Switch │
┌─────────┐ │ ┌─────────┐ ┌──────────┐ ┌────────┐ ┌─────────┐ ┌───────┐ │
│ Packet │ │ │ PG0 │ │ Lossy │ │ │ │ Queue0 │ │ │ │
│ DSCP=0 ├───┼──►│ of Eth0 ├──►│ Buffer ├──►│ ├──►│ of Eth8 ├──►│ │ │ ┌─────────┐ ┌─────────┐
└─────────┘ │ └─────────┘ └──────────┘ │ XBAR │ └─────────┘ │ Sche │ │ │ Packet │ │ Packet │
│ ... │ Switch │ ... │ duler ├───┼──►│ DSCP=0 │ │ DSCP=26 │
┌─────────┐ │ ┌─────────┐ ┌──────────┐ │ │ ┌─────────┐ │ │ │ └─────────┘ └─────────┘
│ Packet ├───┼──►│ PG3 ├──►│ Lossless ├──►│ ├──►│ Queue3 ├──►│ │ │
│ DSCP=26 │ │ │ of Eth0 │ │ Buffer │ │ │ │ of Eth8 │ │ │ │
└─────────┘ │ └─────────┘ └──────────┘ └────────┘ └─────────┘ └───────┘ │
│ │
└───────────────────────────────────────────────────────────────────────┘

Fig. Ingress/Egress procedure of Intel Tofino 2 on EdgeCore SONiC OS

(Optional) Factory Reset Configuration

If you would like to drop your modification to the system, try these commands.

Normal

1
2
3
sudo rm /etc/sonic/config_db.json
sudo config-setup factory
sudo config reload -y

Force

If the system has already been trapped in a strange state, these commands might be able to do a force reset.

1
2
3
4
sudo rm /etc/sonic/config_db.json
sudo config-setup factory
sudo config reload -y -f
sudo service swss restart

(Optional) Set up Ports

If you have trouble enabling L2 forwarding, maybe there is a mismatch in the port speed or FEC configuration.

Check Status of NICs

1
sudo mlxlink -d mlx5_0 -m -c -e

If State is not Active, there might be something wrong with the port configuration or physical connection.

Enable Auto-Negotiation on NICs

1
sudo ethtool -s enp216s0f0np0 autoneg on

Disable Auto-Negotiation on Switch

Disable Auto-Negotition when it does not work as you expect. If it works well, just leave it alone.

1
2
sudo config int autoneg Ethernet0 disabled
sudo config int autoneg Ethernet8 disabled

Force 100Gb Port Speed

Change the port speed as you wish. The supported port speed varies depending on switches and NICs.

1
2
sudo config interface breakout Ethernet0 '1x100G[40G](4)'
sudo config interface breakout Ethernet8 '1x100G[40G](4)'

(Optional) Configure FEC (Forward Error Correction) to RS mode

Setting FEC to none is also fine. Just make sure switches and NICs use the same mode.

1
2
sudo config interface fec Ethernet0 rs
sudo config interface fec Ethernet8 rs

Bring up Ports

1
sudo config interface startup Ethernet0-8

(Optional) Enable L2 Forwarding

By default, L2 switching is disabled on SONiC OS, unlike other switches.

1
2
3
4
sudo config vlan add 1000
sudo config vlan member add -u 1000 Ethernet0
sudo config vlan member add -u 1000 Ethernet8
sudo config inter ip add Vlan1000 10.200.0.1/24

Enable L3 PFC

Since Intel Tofino 2 only supports 5 PGs, we decided to use only 5 TCs/PGs/Queues and then do 1:1 mapping between PGs/Queues and TCs.

Load Buffer Profile

1
2
3
sudo config qos reload
sudo config save -y
sudo reboot

Set and Apply DSCP-to-TC Table for Ports

1
2
3
4
5
6
7
8
sudo config qos dscp-tc add dscp-tc-prof --dscp 0-15 --tc 1
sudo config qos dscp-tc update dscp-tc-prof --dscp 16-23 --tc 2
sudo config qos dscp-tc update dscp-tc-prof --dscp 24-31 --tc 3
sudo config qos dscp-tc update dscp-tc-prof --dscp 32-39 --tc 4
sudo config qos dscp-tc update dscp-tc-prof --dscp 40-63 --tc 5

sudo config interface qos dscp-tc bind Ethernet0 dscp-tc-prof
sudo config interface qos dscp-tc bind Ethernet8 dscp-tc-prof

Set and Apply TC-to-Queue Table for Ports

1
2
3
4
5
sudo config qos tc-queue add tc-queue-prof --tc 1 --queue 1
sudo config qos tc-queue update tc-queue-prof --tc 2 --queue 2
sudo config qos tc-queue update tc-queue-prof --tc 3 --queue 3
sudo config qos tc-queue update tc-queue-prof --tc 4 --queue 4
sudo config qos tc-queue update tc-queue-prof --tc 5 --queue 5

Set and Apply TC-to-PG (Priority Group) Table

1
2
3
4
5
sudo config qos tc-pg add tc-pg-prof --tc 1 --pg 1
sudo config qos tc-pg update tc-pg-prof --tc 2 --pg 2
sudo config qos tc-pg update tc-pg-prof --tc 3 --pg 3
sudo config qos tc-pg update tc-pg-prof --tc 4 --pg 4
sudo config qos tc-pg update tc-pg-prof --tc 5 --pg 5

Specify Queue Scheduler (ETS)

1
2
3
4
5
sudo config scheduler add sched-dwrr-100 --sched_type DWRR --weight 100
sudo config scheduler add sched-strict --sched_type STRICT

sudo config interface scheduler bind queue Ethernet0 3 sched-dwrr-100
sudo config interface scheduler bind queue Ethernet8 3 sched-dwrr-100

Bind Lossless Buffer Profile

1
2
3
4
5
sudo config interface buffer bind priority-group Ethernet0 3 ingress_lossless_profile
sudo config interface buffer bind priority-group Ethernet8 3 ingress_lossless_profile

sudo config interface buffer bind queue Ethernet0 3 egress_lossless_profile
sudo config interface buffer bind queue Ethernet8 3 egress_lossless_profile

Enable PFC for Ports

1
2
sudo config interface pfc priority Ethernet0 3 on
sudo config interface pfc priority Ethernet8 3 on

Enable ECN

Weighted random early detection (WRED) was initially purposed to randomly drop packets to signal the sender’s congestion control algorithm to slow the sending rate. When WRED works in ECN mode, it will randomly put ECN marks into forwarded packets instead of simply dropping them, and the possibility depends on the buffer usage.

1
2
3
4
sudo config wred add wred-prof --mode ecn --gmin 1048576 --gmax 2097152 --gdrop 5

sudo config interface wred bind queue Ethernet0 3 wred-prof
sudo config interface wred bind queue Ethernet8 3 wred-prof

References