DNS 解析不正確的 Kubernetes 叢集

DNS 解析不正確的 Kubernetes 叢集

問題描述:

我有一個 Harvester HCI 叢集 (RKE2),其中 Pod 無法解析網際網路域的正確 IP 位址。

kubectl run debug --image=busybox -i --tty --rm -- sh

/ # ping serverfault.com
PING serverfault.com (<redacted IP address>): 56 data bytes
64 bytes from <redacted IP address>: seq=0 ttl=63 time=0.362 ms
64 bytes from <redacted IP address>: seq=1 ttl=63 time=0.312 ms
64 bytes from <redacted IP address>: seq=2 ttl=63 time=0.319 ms
64 bytes from <redacted IP address>: seq=3 ttl=63 time=0.449 ms
64 bytes from <redacted IP address>: seq=4 ttl=63 time=0.317 ms
64 bytes from <redacted IP address>: seq=5 ttl=63 time=0.363 ms
64 bytes from <redacted IP address>: seq=6 ttl=63 time=0.296 ms
64 bytes from <redacted IP address>: seq=7 ttl=63 time=0.361 ms
^C
--- serverfault.com ping statistics ---
8 packets transmitted, 8 packets received, 0% packet loss
round-trip min/avg/max = 0.296/0.347/0.449 ms

<redacted IP address>在這種情況下,恰好是叢集所在網路的公用 IP 位址(而不是其中一個serverfault.comIP 位址)。

但是在同一個容器中,nslookup確實列出了正確的 IP 位址:

/ # nslookup serverfault.com
Server:     10.53.0.10
Address:    10.53.0.10:53

Non-authoritative answer:
Name:   serverfault.com
Address: 104.18.23.101
Name:   serverfault.com
Address: 104.18.22.101

Non-authoritative answer:

這在主機節點上無法重現:

# ping serverfault.com
PING serverfault.com (104.18.23.101) 56(84) bytes of data.
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=1 ttl=57 time=1.27 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=2 ttl=57 time=1.30 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=3 ttl=57 time=1.33 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=4 ttl=57 time=1.29 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=5 ttl=57 time=1.23 ms
64 bytes from 104.18.23.101 (104.18.23.101): icmp_seq=6 ttl=57 time=1.28 ms
^C
--- serverfault.com ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5006ms
rtt min/avg/max/mdev = 1.231/1.284/1.333/0.030 ms

集群本身是全新安裝的Harvester HCI v1.2.0安裝後無需進行任何額外的配置變更。

我正在尋找有關如何解決此問題的更多提示,並找出解決錯誤 IP 位址的原因。


情境:

/etc/resolve.conf在主機上:

### /etc/resolv.conf is a symlink to /var/run/netconfig/resolv.conf
### autogenerated by netconfig!

search harvester.<redacted domain> 1
nameserver 10.10.0.1

/etc/resolve.conf在 Pod 容器上:

search default.svc.cluster.local svc.cluster.local cluster.local harvester.<redacted domain>
nameserver 10.53.0.10
options ndots:5

/etc/nsswitch.conf在主機上:

#
# /etc/nsswitch.conf
#

passwd:     compat
group:      compat
shadow:     compat
# Allow initgroups to default to the setting for group.
# initgroups:   compat

hosts:      files mdns_minimal [NOTFOUND=return] dns
networks:   files dns

aliases:    files usrfiles
ethers:     files usrfiles
gshadow:    files usrfiles
netgroup:   files nis
protocols:  files usrfiles
publickey:  files
rpc:        files usrfiles
services:   files usrfiles

automount:  files nis
bootparams: files
netmasks:   files

/etc/nsswitch.conf在 Pod 容器上:

# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.

passwd:         files
group:          files
shadow:         files
gshadow:        files

hosts:          files dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

/etc/hosts在這兩種情況下都不包含任何其他/可疑條目。

答案1

我發現問題出在以下ndots選項resolve.conf

options ndots:5

此選項表示只有當主機名稱包含 5 個或更多點時,它才不會被附加到搜尋域中。

我懷疑這個選項是必要的,因為 kubernetes 內部使用了大量帶有多個點的主機名稱。

然而,serverfault.com例如只有一個點,所以我被附加到本地域,harvester.<redacted domain>使其成為serverfault.com.harvester.<redacted domain>。我們碰巧*在該網域上有一個通配符 ( ) 記錄,它指向網路的公共 IP 位址。結果serverfault.com.harvester.<redacted domain>將使用通配符記錄來解決,解釋該行為。

為了解決這個問題,我們暫時刪除了本地域的 DHCP 記錄。因此,search配置中將result.conf不再包含它,因此互聯網域將不再附加到本地域。

從長遠來看,我們計劃刪除通配符域。

相關內容