從 Windows 遷移到 Debian 時,Zookeeper DNS 名稱與領導者選舉有關

從 Windows 遷移到 Debian 時,Zookeeper DNS 名稱與領導者選舉有關

我正在將 Windows 上的 kafka/zookeeper 叢集遷移到 Debian wheezy。

  • Java版本:1.7.0_80
  • Debian 版本:7.9
  • Zookeeper版本:3.3.5+dfsg1-2 0
  • 卡夫卡版本:2.10-0.8.2.1

如果我使用其他 Debian 伺服器的 IP 位址在 Debian 伺服器上設定 Zookeeper,則一切正常。如果我改用 DNS 名稱,則 Debian 伺服器上的領導者選舉會失敗。

在 Debian 伺服器上,我可以使用「host」命令來尋找任何其他 Debian 伺服器的 IP,因此 DNS 解析正常。

一切都是自動化的:伺服器建立、Debian 安裝、zookeeper 安裝、zookeeper 設定;因此,手動配置錯誤的視窗處於最低限度,並且易於重現或更改。

使用clientPortAddress=DNSNAME沒有任何區別;它仍然失敗。 iptables 中沒有配置任何內容。這些伺服器之間沒有防火牆。

其中伺服器1-3為Windows 2012R2伺服器,伺服器4-6為Debian伺服器。

此配置有效:

 server.1=testkafka400:2888:3888
 server.2=testkafka401:2888:3888
 server.3=testkafka402:2888:3888
 server.4=10.1.132.152:2888:3888
 server.5=10.1.132.153:2888:3888
 server.6=10.1.132.154:2888:3888

此配置不起作用:

 server.1=testkafka400:2888:3888
 server.2=testkafka401:2888:3888
 server.3=testkafka402:2888:3888
 server.4=testkafka403:2888:3888
 server.5=testkafka404:2888:3888
 server.6=testkafka405:2888:3888

當我使用 DNS 名稱時,我得到以下輸出 - 其中異常會重複出現。請注意,以下日誌來自包含以下內容的叢集設定:僅有的Debian 伺服器,使用 DNS 名稱,以便進行測試。如果我切換到IP,集群就可以工作並且可以進行選舉。

[2015-11-03 13:55:52,309] INFO Reading configuration from: /etc/zookeeper/config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2015-11-03 13:55:52,322] INFO Defaulting to majority quorums (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2015-11-03 13:55:52,344] INFO autopurge.snapRetainCount set to 3 (org.apache.zookeeper.server.DatadirCleanupManager)
[2015-11-03 13:55:52,344] INFO autopurge.purgeInterval set to 24 (org.apache.zookeeper.server.DatadirCleanupManager)
[2015-11-03 13:55:52,345] INFO Purge task started. (org.apache.zookeeper.server.DatadirCleanupManager)
[2015-11-03 13:55:52,454] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager)
[2015-11-03 13:55:52,472] INFO Starting quorum peer (org.apache.zookeeper.server.quorum.QuorumPeerMain)
[2015-11-03 13:55:52,581] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2015-11-03 13:55:52,601] INFO tickTime set to 3000 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2015-11-03 13:55:52,601] INFO minSessionTimeout set to -1 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2015-11-03 13:55:52,601] INFO maxSessionTimeout set to -1 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2015-11-03 13:55:52,601] INFO initLimit set to 20 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2015-11-03 13:55:52,626] INFO Reading snapshot /etc/zookeeper/data/version-2/snapshot.0 (org.apache.zookeeper.server.persistence.FileSnap)
[2015-11-03 13:55:52,675] INFO My election bind port: testkafka403.prod.local/127.0.1.1:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2015-11-03 13:55:52,713] INFO LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer)
[2015-11-03 13:55:52,715] INFO New election. My id =  4, proposed zxid=0x100000014 (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2015-11-03 13:55:52,717] INFO Notification: 1 (message format version), 4 (n.leader), 0x100000014 (n.zxid), 0x1 (n.round), LOOKING (n.state), 4 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2015-11-03 13:55:52,732] WARN Cannot open channel to 5 at election address testkafka404.prod.local/10.1.132.153:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.SocketTimeoutException
at java.net.SocksSocketImpl.remainingMillis(SocksSocketImpl.java:111)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:745)
[2015-11-03 13:55:52,737] WARN Cannot open channel to 6 at election address testkafka405.prod.local/10.1.132.154:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:745)
[2015-11-03 13:55:52,919] WARN Cannot open channel to 6 at election address testkafka405.prod.local/10.1.132.154:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)

我們確實希望能夠使用 DNS 名稱,但不知道應該從哪裡開始尋找解決方案。也許我們錯過了安裝或啟動重要的 Debian 或 Java 功能?

答案1

好的,我知道這裡發生了什麼事。當我嘗試在 Linux 虛擬機器上的 Vagrant 中設定 3 節點 Spring-XD 叢集時,我看到了相同的問題。

此配置有效:

server.1=172.28.128.3:2888:3888
server.2=172.28.128.4:2888:3888
server.3=172.28.128.7:2888:3888

但這個沒有:

server.1=spring-xd-1:2888:3888
server.2=spring-xd-2:2888:3888
server.3=spring-xd-3:2888:3888

「確鑿的證據」是我的動物園管理員日誌中的這一行:

2015-11-26 20:48:31,439 [myid:1] - INFO [Thread-2:QuorumCnxManager$Listener@504] - 我選擇綁定埠:spring-xd-1/127.0.0.1:3888

那麼,Zookeeper為什麼要將選舉埠綁定在環回介面上呢?出色地...

我的/etc/hosts其中一台虛擬機器看起來像這樣:

127.0.0.1   spring-xd-1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

## vagrant-hostmanager-start
172.28.128.3    spring-xd-1
172.28.128.4    spring-xd-2
172.28.128.7    spring-xd-3
## vagrant-hostmanager-end

127.0.0.1我從行中刪除了主機名/etc/hosts,並在所有 3 個節點上退回了 Zookeeper 服務,並且嘭!一切都變成玫瑰花。所以,現在每台機器上的主機檔案如下所示:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

## vagrant-hostmanager-start
172.28.128.3    spring-xd-1
172.28.128.4    spring-xd-2
172.28.128.7    spring-xd-3
## vagrant-hostmanager-end

我猜您在 Windows 上沒有看到這個問題,因為C:\Windows\System32\drivers\etc\hosts預設情況下主機檔案 ( ) 沒有條目。透過新增類似的127.0.0.1行,您應該能夠在 Windows 上重現問題。

我稱之為動物園管理員錯誤。編輯主機檔案足以證明問題並在 Vagrant 中修復它,但我不會推薦它用於任何“真實”環境。

編輯:根據http://ccl.cse.nd.edu/operations/condor/hostname.shtml,這似乎是 Linux 上的叢集應用程式的一個相當常見的問題,建議按照我上面的描述編輯主機檔案。但是,那有關集群設定的 Zookeeper 文檔沒有提到它。

答案2

可能這個問題是由於將節點設定hostname127.0.0.1in引起的/etc/hosts。在這種情況下,ZK 將綁定leader|election ports127.0.0.1位址。

配置參數quorumListenOnAllIPs=true應該可以解決這個問題並綁定election|leader ports0.0.0.0.

更多選項及其影響可以在以下位置找到:ZK管理指南

檢查一下總是好的原始碼

相關內容