新增第二個控制平面後，主伺服器上的 API 伺服器停止

2024-6-28 • tag-icon

在我目前的測試設定中，我有幾個運行 Debian-11 的虛擬機器。所有節點都有一個私有IP和第二個wireguard介面。未來節點將位於不同的位置、不同的網絡，Wireguard用於「覆蓋」所有不同的網路環境。我想在所有節點上安裝 Kubernetes。

node   public ip        wireguard ip
vm1    192.168.10.10    10.11.12.10
vm2    192.168.10.11    10.11.12.11
vm3    192.168.10.12    10.11.12.12
...

所以我在所有節點上安裝了 docker 和 kubeadm/kubelet/kubectl 版本 1.23.5。我還在所有節點上安裝了 haproxy。它透過列出 localhost:443 並將請求轉發到線上控制平面之一來充當負載平衡器。

然後我用 kubeadm 啟動集群

vm01> kubeadm init --apiserver-advertise-address=10.11.12.10 --pod-network-cidr=10.20.0.0/16

之後我測試了整合法蘭絨或印花布。透過新增--iface=<wireguard-interface>或設定自訂清單...nodeAddressAutodetectionV4.interface: <wireguard-interface>。

當我添加一個普通節點時 - 一切都很好。新增節點、建立 Pod，並透過定義的網路介面完成通訊。

當我新增沒有wireguard介面的控制平面時，我還可以加入不同的控制平面

vm2> kubeadm join 127.0.0.1:443 --token ... --discovery-token-ca-cert-hash sha256:...  --control-plane

當然，在此之前，我已將多個檔案從 vm01 複製到 vm02 ，/etc/kubernetes/pki例如ca.*、sa.*、和。front-proxy-ca.*apiserver-kubelet-client.*etcd/ca.*

但是，當我將 flannel 或 calico 網路與wireguard 介面一起使用時，在 join 命令之後會發生一些奇怪的情況。

root@vm02:~# kubeadm join 127.0.0.1:443 --token nwevkx.tzm37tb4qx3wg2jz --discovery-token-ca-cert-hash sha256:9a97a5846ad823647ccb1892971c5f0004043d88f62328d051a31ce8b697ad4a --control-plane
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost mimas] and IPs [192.168.10.11 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost mimas] and IPs [192.168.10.11 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local mimas] and IPs [10.96.0.1 192.168.10.11 127.0.0.1]
[certs] Using the existing "apiserver-kubelet-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
To see the stack trace of this error execute with --v=5 or higher

逾時後，即使在 vm01 上，API 伺服器也會停止運作，我無法再執行任何 kubeadm 或 kubectl 命令。 6443 上的 HTTPS 服務已失效。但我既不明白為什麼 vm01 上的 API 伺服器在添加第二個 API 伺服器時停止工作，也找不到輸出談論 192.168.... IP 的原因，因為叢集應該只透過 10.11.12.0 進行通訊/24 線衛網絡。

答案1

發現類似問題後https://stackoverflow.com/questions/64227042/setting-up-a-kubernetes-master-on-a- Different-ip我想，這也是這裡的解決方案。當我添加時--apiserver-advertise-address=<this-wireguard-ip>，輸出發生變化（沒有 192.168.. IP）並且它加入。我不明白的是，為什麼 VM01 API 伺服器停止工作。

無論 join 指令在背景執行什麼操作，它都需要在第二個控制平面上建立一個 etcd 服務，而且該服務也必須在與 flannel/calico 網路介面相同的 IP 上執行。如果使用主網路接口，則第二/第三控制平面上不需要此參數。

答案1

相關內容