![具有綁定介面的機器不會在所有從屬介面上接收多播封包](https://rvso.com/image/668060/%E5%85%B7%E6%9C%89%E7%B6%81%E5%AE%9A%E4%BB%8B%E9%9D%A2%E7%9A%84%E6%A9%9F%E5%99%A8%E4%B8%8D%E6%9C%83%E5%9C%A8%E6%89%80%E6%9C%89%E5%BE%9E%E5%B1%AC%E4%BB%8B%E9%9D%A2%E4%B8%8A%E6%8E%A5%E6%94%B6%E5%A4%9A%E6%92%AD%E5%B0%81%E5%8C%85.png)
將我們的電腦從 RHEL 6.6 升級到 RHEL 6.7 後,我們發現一個問題:30 台電腦中有 4 台僅在其兩個從屬介面之一上接收多重播送流量。目前尚不清楚升級是否相關,或者包含的重新啟動是否觸發了該行為 - 重新啟動的情況很少見。
我們預計在 4 個不同的連接埠上收到大量發送到群組 239.0.10.200 的多重播送資料包。如果我們檢查ethtool
其中一台有問題的機器上的統計信息,我們會看到以下輸出:
健康的介面:
# ethtool -S eth0 |grep mcast
[0]: rx_mcast_packets: 294
[0]: tx_mcast_packets: 0
[1]: rx_mcast_packets: 68
[1]: tx_mcast_packets: 0
[2]: rx_mcast_packets: 2612869
[2]: tx_mcast_packets: 305
[3]: rx_mcast_packets: 0
[3]: tx_mcast_packets: 0
[4]: rx_mcast_packets: 2585571
[4]: tx_mcast_packets: 0
[5]: rx_mcast_packets: 2571341
[5]: tx_mcast_packets: 0
[6]: rx_mcast_packets: 0
[6]: tx_mcast_packets: 8
[7]: rx_mcast_packets: 9
[7]: tx_mcast_packets: 0
rx_mcast_packets: 7770152
tx_mcast_packets: 313
損壞的接口:
# ethtool -S eth1 |grep mcast
[0]: rx_mcast_packets: 451
[0]: tx_mcast_packets: 0
[1]: rx_mcast_packets: 0
[1]: tx_mcast_packets: 0
[2]: rx_mcast_packets: 5
[2]: tx_mcast_packets: 304
[3]: rx_mcast_packets: 0
[3]: tx_mcast_packets: 0
[4]: rx_mcast_packets: 5
[4]: tx_mcast_packets: 145
[5]: rx_mcast_packets: 0
[5]: tx_mcast_packets: 0
[6]: rx_mcast_packets: 5
[6]: tx_mcast_packets: 10
[7]: rx_mcast_packets: 0
[7]: tx_mcast_packets: 0
rx_mcast_packets: 466
tx_mcast_packets: 459
其他 10 台機器可進行組播。如果我們檢查損壞的機器從哪些主機接收多播(使用 tcpdump),它只會從預期主機的子集(3-6)接收。
配置
Linux版本:
# uname -a
Linux ab31 2.6.32-573.3.1.el6.x86_64 #1 SMP Mon Aug 10 09:44:54 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
如果配置:
# ifconfig -a
bond0 Link encap:Ethernet HWaddr 4C:76:25:97:B1:75
inet addr:10.91.20.231 Bcast:10.91.255.255 Mask:255.255.0.0
inet6 addr: fe80::4e76:25ff:fe97:b175/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:18005156 errors:0 dropped:0 overruns:0 frame:0
TX packets:11407592 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:10221086569 (9.5 GiB) TX bytes:2574472468 (2.3 GiB)
eth0 Link encap:Ethernet HWaddr 4C:76:25:97:B1:75
inet6 addr: fe80::4e76:25ff:fe97:b175/64 Scope:Link
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:13200915 errors:0 dropped:0 overruns:0 frame:0
TX packets:3514446 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9386669124 (8.7 GiB) TX bytes:339950822 (324.2 MiB)
Interrupt:34 Memory:d9000000-d97fffff
eth1 Link encap:Ethernet HWaddr 4C:76:25:97:B1:75
inet6 addr: fe80::4e76:25ff:fe97:b175/64 Scope:Link
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:4804241 errors:0 dropped:0 overruns:0 frame:0
TX packets:7893146 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:834417445 (795.7 MiB) TX bytes:2234521646 (2.0 GiB)
Interrupt:36 Memory:da000000-da7fffff
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:139908 errors:0 dropped:0 overruns:0 frame:0
TX packets:139908 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:210503939 (200.7 MiB) TX bytes:210503939 (200.7 MiB)
網路配置:
# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
IPADDR=10.91.20.231
NETMASK=255.255.0.0
GATEWAY=10.91.1.25
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
BONDING_OPTS="miimon=100 mode=802.3ad"
# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
HWADDR="4C:76:25:97:B1:75"
BOOTPROTO=none
ONBOOT="yes"
USERCTL=no
MASTER=bond0
SLAVE=yes
# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE="eth1"
HWADDR="4C:76:25:97:B1:78"
BOOTPROTO=none
ONBOOT="yes"
USERCTL=no
MASTER=bond0
SLAVE=yes
驅動程式資訊(與 eth1 相同):
# ethtool -i eth0
driver: bnx2x
version: 1.710.51-0
firmware-version: FFV7.10.17 bc 7.10.11
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
適配器:
# lspci|grep Ether
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
/proc/net/bonding/bond0:
$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 33
Partner Key: 5
Partner Mac Address: 00:01:09:06:09:07
Slave Interface: eth0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 4c:76:25:97:b1:75
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: eth1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 4c:76:25:97:b1:78
Aggregator ID: 1
Slave queue ID: 0
其他資訊
重新啟動 (
ifconfig down
,ifconfig up
) 損壞的介面可修復此問題有時,在啟動過程中,我們會在系統日誌中看到以下訊息(我們不使用 IPv6),但是,即使未記錄此訊息,也會出現問題
Oct 2 11:27:51 ab30 kernel: bond0: IPv6 duplicate address fe80::4e76:25ff:fe87:9d75 detected!
配置期間系統日誌的輸出:
Oct 5 07:44:31 ab31 kernel: bonding: bond0 is being created... Oct 5 07:44:31 ab31 kernel: bonding: bond0 already exists Oct 5 07:44:31 ab31 kernel: bond0: Setting MII monitoring interval to 100 Oct 5 07:44:31 ab31 kernel: bond0: Setting MII monitoring interval to 100 Oct 5 07:44:31 ab31 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Oct 5 07:44:31 ab31 kernel: bond0: Setting MII monitoring interval to 100 Oct 5 07:44:31 ab31 kernel: bond0: Adding slave eth0 Oct 5 07:44:31 ab31 kernel: bnx2x 0000:01:00.0: firmware: requesting bnx2x/bnx2x-e2-7.10.51.0.fw Oct 5 07:44:31 ab31 kernel: bnx2x 0000:01:00.0: eth0: using MSI-X IRQs: sp 120 fp[0] 122 ... fp[7] 129 Oct 5 07:44:31 ab31 kernel: bnx2x 0000:01:00.0: eth0: NIC Link is Up, 10000 Mbps full duplex, Flow control: none Oct 5 07:44:31 ab31 kernel: bond0: Enslaving eth0 as a backup interface with an up link Oct 5 07:44:31 ab31 kernel: bond0: Adding slave eth1 Oct 5 07:44:31 ab31 kernel: bnx2x 0000:01:00.1: firmware: requesting bnx2x/bnx2x-e2-7.10.51.0.fw Oct 5 07:44:31 ab31 kernel: bnx2x 0000:01:00.1: eth1: using MSI-X IRQs: sp 130 fp[0] 132 ... fp[7] 139 Oct 5 07:44:31 ab31 kernel: bnx2x 0000:01:00.1: eth1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none Oct 5 07:44:31 ab31 kernel: bond0: Enslaving eth1 as a backup interface with an up link Oct 5 07:44:31 ab31 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready Oct 5 07:44:31 ab31 kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
此
bond0
介面已加入多播組,如下所示ip maddr
:... 4: bond0 inet 239.0.10.200 users 16 ...
一切都可以在同一網路上的其他機器上運作。然而,似乎(未 100% 確認)工作機器有另一個網路適配器:
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
在檢查交換器統計資料時,我們可以看到資料傳送到兩個介面。
到目前為止我們已經嘗試過的
正如建議的Linux 核心不傳遞多播 UDP 封包我們調查了是否有
rp_filter
問題。然而,更改這些標誌並沒有為我們帶來任何改變。將核心降級為 RedHat 升級之前使用的核心 - 沒有變化。
任何有關如何進一步排除故障的提示都將受到讚賞。如果需要更多信息,請告訴我。
答案1
我們使用的戴爾刀片伺服器出現了這個問題。在與戴爾支援人員合作後,我們似乎IGMPv3 EXCLUDE
在加入多播組時使用了過濾。顯然刀鋒伺服器中的交換器不支援排除模式。我們建議切換到IGMPv3 INCLUDE
過濾模式。
然而,我們現在已經停止在我們的平台中使用多播,因此我們可能不會抽出時間來嘗試這些更改。因此,我不能肯定地說這是根本原因。