
我正在嘗試使用綁定(特別是 802.3ad 模式)將具有兩個 1Gbps NIC 的 Linux 伺服器與交換器 Netgear ProSafe GSM7248V2 連接。結果非常令人困惑,我將感謝任何有關下一步嘗試的提示。
在伺服器端,這是我的 /etc/network/interfaces:
auto bond0
iface bond0 inet static
address 192.168.1.15/24
gateway 192.168.1.254
dns-nameservers 8.8.8.8
dns-search my-domain.org
bond-slaves eno1 eno2
bond-mode 4
bond-miimon 100
bond-lacp-rate 1
bond-xmit_hash_policy layer3+4
hwaddress aa:bb:cc:dd:ee:ff
交換器的配置如下:
(GSM7248V2) #show port-channel 3/2
Local Interface................................ 3/2
Channel Name................................... fubarlg
Link State..................................... Up
Admin Mode..................................... Enabled
Type........................................... Dynamic
Load Balance Option............................ 6
(Src/Dest IP and TCP/UDP Port fields)
Mbr Device/ Port Port
Ports Timeout Speed Active
------ ------------- --------- -------
0/7 actor/long Auto True
partner/long
0/8 actor/long Auto True
partner/long
(GSM7248V2) #show lacp actor 0/7
Sys Admin Port Admin
Intf Priority Key Priority State
------ -------- ----- -------- -----------
0/7 1 55 128 ACT|AGG|LTO
(GSM7248V2) #show lacp actor 0/8
Sys Admin Port Admin
Intf Priority Key Priority State
------ -------- ----- -------- -----------
0/8 1 55 128 ACT|AGG|LTO
(GSM7248V2) #show lacp partner 0/7
Sys System Admin Prt Prt Admin
Intf Pri ID Key Pri Id State
------ --- ----------------- ----- --- ----- -----------
0/7 0 00:00:00:00:00:00 0 0 0 ACT|AGG|LTO
(GSM7248V2) #show lacp partner 0/8
Sys System Admin Prt Prt Admin
Intf Pri ID Key Pri Id State
------ --- ----------------- ----- --- ----- -----------
0/8 0 00:00:00:00:00:00 0 0 0 ACT|AGG|LTO
我相信xmit“layer3+4”與交換器的負載平衡類型6最相容。第一件令人驚訝的事情是交換器看不到 LACP 夥伴的 MAC 位址。
在伺服器端,這是 /proc/net/bonding/bond0 的內容:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: ac:1f:6b:dc:2e:88
Active Aggregator Info:
Aggregator ID: 15
Number of ports: 2
Actor Key: 9
Partner Key: 55
Partner Mac Address: a0:21:b7:9d:83:6a
Slave Interface: eno1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:dc:2e:88
Slave queue ID: 0
Aggregator ID: 15
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: ac:1f:6b:dc:2e:88
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 1
system mac address: a0:21:b7:9d:83:6a
oper key: 55
port priority: 128
port number: 8
port state: 61
Slave Interface: eno2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:dc:2e:89
Slave queue ID: 0
Aggregator ID: 15
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: ac:1f:6b:dc:2e:88
port key: 9
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 1
system mac address: a0:21:b7:9d:83:6a
oper key: 55
port priority: 128
port number: 7
port state: 61
如果我理解正確的話,這意味著 Linux 綁定驅動程式正確地確定了所有聚合器詳細資訊(金鑰、連接埠號碼、系統優先權、連接埠優先權等)。儘管如此,在重新啟動網路服務後,我在 dmesg 中收到了此訊息:
[Dec14 20:40] bond0: Releasing backup interface eno1
[ +0.000004] bond0: first active interface up!
[ +0.090621] bond0: Removing an active aggregator
[ +0.000004] bond0: Releasing backup interface eno2
[ +0.118446] bond0: Enslaving eno1 as a backup interface with a down link
[ +0.027888] bond0: Enslaving eno2 as a backup interface with a down link
[ +0.008805] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[ +3.546823] igb 0000:04:00.0 eno1: igb: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ +0.160003] igb 0000:05:00.0 eno2: igb: eno2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ +0.035608] bond0: link status definitely up for interface eno1, 1000 Mbps full duplex
[ +0.000004] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
[ +0.000008] bond0: first active interface up!
[ +0.000166] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[ +0.103821] bond0: link status definitely up for interface eno2, 1000 Mbps full duplex
兩個介面都處於活動狀態,網路連線似乎很正常,我只是收到一個奇怪的警告,提示沒有 802.3ad 相容夥伴。
此外,當我嘗試從連接到同一交換器的兩台不同機器(每台連接速度為1Gbps)同時複製兩個大型二進位(每個10GB)時,伺服器上bond0 介面的總體吞吐量遠低於1Gbps,儘管我期望接近 2 Gbps(讀取速度等不是這裡的限制因素,所有 SSD,快取良好等)。當我從同一台機器上依次複製相同的檔案時,我可以輕鬆達到接近 1Gbps 的吞吐量。
請問您知道這裡可能出了什麼問題嗎?關於診斷,令人困惑的警告出現在 dmesg(無 802.3ad 相容夥伴)和交換器的 sh lacp 輸出中(無夥伴的 MAC,儘管常規連接埠記錄顯示所連接 NIC 的正確 MAC 位址)。關於網路效能,我確實看不到使用兩個不同連接的任何聚合。我將非常感謝任何提示。
答案1
交換器配置為long
LACP 逾時 - 每 30 秒一個 LACPDU。
Linux 系統配置為bond-lacp-rate 1
.
我找不到這在 Debian 中實際做了什麼,但如果它將lacp_rate=1
模組選項傳遞給bonding(參考),那麼這就是快速超時 - 每 1 秒一個 LACPDU。
慢/快 LACP 速率之間的這種不匹配是一種錯誤配置。
我能找到的所有範例文件都表明 Debian 接受,bond-lacp-rate slow
這有望為您糾正它。
您也可以bond-lacp-rate
從設定檔中刪除該行,因為預設速率較慢,然後卸載綁定模組或重新啟動以套用。
不要僅使用兩個流來測試吞吐量。此layer3+4
策略不保證任何兩個流都有一個單獨的 NIC,只是在給定足夠的流的情況下,流量應該稍微均衡。
使用 16 或 32 個同時 iperf3 TCP 流進行測試。所有串流的總吞吐量應接近 2Gbps。