kubeadm バージョン 1.25.4-00 を使用して、Debian GNU/Linux 11 (bullseye) システム上で実行されている kubernetes マスター ノードを初期化しようとします。
kubernetes.io の公式ガイドラインに従いました。インストールしcontainerd
て設定しましSystemdCgroup = true
た/etc/containerd/config.toml
。
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
containerd は問題ないようです:
$ sudo systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-11-21 08:12:35 UTC; 1min 7s ago
Docs: https://containerd.io
Main PID: 7897 (containerd)
Tasks: 8
Memory: 10.5M
CPU: 470ms
CGroup: /system.slice/containerd.service
└─7897 /usr/bin/containerd
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.900148031Z" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.900245191Z" level=info msg=serving... address=/run/containerd/containerd.sock
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.900338622Z" level=info msg="containerd successfully booted in 0.046780s"
Nov 21 08:12:35 master-1 systemd[1]: Started containerd container runtime.
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.909836633Z" level=info msg="Start subscribing containerd event"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.909931756Z" level=info msg="Start recovering state"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910044670Z" level=info msg="Start event monitor"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910056885Z" level=info msg="Start snapshots syncer"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910069145Z" level=info msg="Start cni network conf syncer"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910079607Z" level=info msg="Start streaming server"
....
kubeadm init を実行すると、システムがハングし、4 分後にタイムアウトします。
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16 -v=9
ファイアウォールの問題はないようで、kubeadm は containerd と cgroups を正しく検出しているようです。
I1121 08:16:46.935270 8096 initconfiguration.go:117] detected and using CRI socket: unix:///var/run/containerd/containerd.sock
I1121 08:16:46.935936 8096 interface.go:432] Looking for default routes with IPv4 addresses
I1121 08:16:46.936037 8096 interface.go:437] Default route transits interface "eth0"
I1121 08:16:46.936268 8096 interface.go:209] Interface eth0 is up
I1121 08:16:46.936427 8096 interface.go:257] Interface "eth0" has 3 addresses :[x.x.y.y/32 .........::1/64 ......../64].
I1121 08:16:46.936525 8096 interface.go:224] Checking addr x.x.y.y/32.
I1121 08:16:46.936596 8096 interface.go:231] IP found x.x.y.y
I1121 08:16:46.936616 8096 interface.go:263] Found valid IPv4 address x.x.y.y for interface "eth0".
I1121 08:16:46.936710 8096 interface.go:443] Found active IP x.x.y.y
I1121 08:16:46.936803 8096 kubelet.go:218] the value of KubeletConfiguration.cgroupDriver is empty; setting it to "systemd"
I1121 08:16:46.948350 8096 version.go:186] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.txt
I1121 08:16:47.327247 8096 version.go:255] remote version is much newer: v1.25.4; falling back to: stable-1.24
I1121 08:16:47.327368 8096 version.go:186] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.25.txt
[init] Using Kubernetes version: v1.25.4
[preflight] Running pre-flight checks
I1121 08:16:47.716620 8096 checks.go:570] validating Kubernetes and kubeadm version
I1121 08:16:47.716770 8096 checks.go:170] validating if the firewall is enabled and active
I1121 08:16:47.731470 8096 checks.go:205] validating availability of port 6443
I1121 08:16:47.732017 8096 checks.go:205] validating availability of port 10259
....
kubelet の起動を待機しているときに、次の警告が表示されます。このメッセージは、4 分後のタイムアウトまで表示されます。
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I1121 08:17:12.320743 8096 round_trippers.go:466] curl -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.25.4 (linux/amd64) kubernetes/fdc7750" 'https://x.x.y.y:6443/healthz?timeout=10s'
I1121 08:17:12.321047 8096 round_trippers.go:508] HTTP Trace: Dial to tcp:x.x.y.y:6443 failed: dial tcp x.x.y.y:6443: connect: connection refused
I1121 08:17:12.321112 8096 round_trippers.go:553] GET https://x.x.y.y:6443/healthz?timeout=10s in 0 milliseconds
I1121 08:17:12.321157 8096 round_trippers.go:570] HTTP Statistics: DNSLookup 0 ms Dial 0 ms TLSHandshake 0 ms Duration 0 ms
I1121 08:17:12.321209 8096 round_trippers.go:577] Response Headers:
I1121 08:17:12.821526 8096 round_trippers.go:466] curl -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.25.4 (linux/amd64) kubernetes/fdc7750" 'https://x.x.y.y:6443/healthz?timeout=10s'
I1121 08:17:12.821882 8096 round_trippers.go:508] HTTP Trace: Dial to tcp:x.x.y.y:6443 failed: dial tcp x.x.y.y:6443: connect: connection refused
.....
kublet のステータスを確認すると、次のようになります。
$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Mon 2022-11-21 08:17:12 UTC; 4min 30s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 8228 (kubelet)
Tasks: 14 (limit: 4556)
Memory: 52.0M
CPU: 6.246s
CGroup: /system.slice/kubelet.service
└─8228 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-ru>
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.526642 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.626872 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.727919 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.829055 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.930002 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.959961 8228 eviction_manager.go:254] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"master->
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.029432 8228 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady >
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.030749 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.130874 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.231537 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
journalctl を確認すると、次のようになります:
$ sudo journalctl -xeu kubelet
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.585238 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.685464 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.786279 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.887211 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.987526 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:38 master-1 kubelet[8228]: E1121 08:22:38.045350 8228 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady >
Nov 21 08:22:38 master-1 kubelet[8228]: E1121 08:22:38.088201 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
....
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.500610 8228 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://x.x.y.y:6443/apis/coordin>
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.512026 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.613041 8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:40 master-1 kubelet[8228]: I1121 08:22:40.700243 8228 kubelet_node_status.go:70] "Attempting to register node" node="master-1"
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.701021 8228 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://x.x.y.y:6443/api/v1/node>
...
.....
この問題の根本原因を突き止めるにはどうすればいいでしょうか? ログ ファイルからは特に役立つヒントは得られませんでした。
注記:containerd の代わりに CRI-O をインストールすると、kubeadm は問題なく動作します。
答え1
kubeadm v1.25.4 と containerd v1.4.13 でも同じ問題が発生します。
Containerd も問題ないようで、Kubelet サービスはアクティブな状態ですが、kubelet-api はすべてのコントロール プレーン ポッドでダウンしたままです。
kubectl get pods --all-namespaces
The connection to the server localhost:8080 was refused - did you specify the right host or port?
syslog ファイルには他のログも保存されています:
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.517592 2809 kubelet.go:2448] "Error getting node" err="node \"master-1\" not found"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.618103 2809 kubelet.go:2448] "Error getting node" err="node \"master-1\" not found"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.718895 2809 kubelet.go:2448] "Error getting node" err="node \"master-1\" not found"
Nov 25 09:39:08 master-1 containerd[450]: time="2022-11-25T09:39:08.774397538Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-scheduler-master-1,Uid:c8fdb264532b280b4098380e628d113d,Namespace:kube-system,Attempt:0,}"
Nov 25 09:39:08 master-1 containerd[450]: time="2022-11-25T09:39:08.774397563Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-apiserver-master-1,Uid:e8e76556f3e67024151f36c60b85b622,Namespace:kube-system,Attempt:0,}"
Nov 25 09:39:08 master-1 containerd[450]: time="2022-11-25T09:39:08.800116714Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-scheduler-master-1,Uid:c8fdb264532b280b4098380e628d113d,Namespace:kube-system,Attempt:0,} failed, error" error="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800548 2809 remote_runtime.go:233] "RunPodSandbox from runtime service failed" err="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800620 2809 kuberuntime_sandbox.go:71] "Failed to create sandbox for pod" err="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument" pod="kube-system/kube-scheduler-master-1"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800653 2809 kuberuntime_manager.go:772] "CreatePodSandbox for pod failed" err="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument" pod="kube-system/kube-scheduler-master-1"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800729 2809 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-master-1_kube-system(c8fdb264532b280b4098380e628d113d)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-master-1_kube-system(c8fdb264532b280b4098380e628d113d)\\\": rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument\"" pod="kube-system/kube-scheduler-master-1" podUID=c8fdb264532b280b4098380e628d113d
誰かが解決策や手がかりを持っているなら、私はあなたのトピックをフォローします。