A inicialização do kubeadm 1.25 falhou no Debian 11 com containerd -> conexão recusada

A inicialização do kubeadm 1.25 falhou no Debian 11 com containerd -> conexão recusada

Tento iniciar um nó mestre kubernetes em execução em um sistema Debian GNU/Linux 11 (bullseye) com kubeadm versão 1.25.4-00.

Segui a diretriz oficial em kubernetes.io. Eu instalei e containerdconfigurei .SystemdCgroup = true/etc/containerd/config.toml

  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      runtime_type = "io.containerd.runc.v2"
      runtime_engine = ""
      runtime_root = ""
      privileged_without_host_devices = false
      base_runtime_spec = ""
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
        SystemdCgroup = true

containerd parece estar bem:

$ sudo systemctl status containerd
● containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-11-21 08:12:35 UTC; 1min 7s ago
       Docs: https://containerd.io
   Main PID: 7897 (containerd)
      Tasks: 8
     Memory: 10.5M
        CPU: 470ms
     CGroup: /system.slice/containerd.service
             └─7897 /usr/bin/containerd

Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.900148031Z" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.900245191Z" level=info msg=serving... address=/run/containerd/containerd.sock
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.900338622Z" level=info msg="containerd successfully booted in 0.046780s"
Nov 21 08:12:35 master-1 systemd[1]: Started containerd container runtime.
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.909836633Z" level=info msg="Start subscribing containerd event"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.909931756Z" level=info msg="Start recovering state"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910044670Z" level=info msg="Start event monitor"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910056885Z" level=info msg="Start snapshots syncer"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910069145Z" level=info msg="Start cni network conf syncer"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910079607Z" level=info msg="Start streaming server"
....

Quando executo o kubeadm init, o sistema trava e atinge o tempo limite após 4 minutos:

$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16 -v=9

Parece não haver nenhum problema de firewall e o kubeadm parece detectar o containerd e os cgroups corretamente:

I1121 08:16:46.935270    8096 initconfiguration.go:117] detected and using CRI socket: unix:///var/run/containerd/containerd.sock
I1121 08:16:46.935936    8096 interface.go:432] Looking for default routes with IPv4 addresses
I1121 08:16:46.936037    8096 interface.go:437] Default route transits interface "eth0"
I1121 08:16:46.936268    8096 interface.go:209] Interface eth0 is up
I1121 08:16:46.936427    8096 interface.go:257] Interface "eth0" has 3 addresses :[x.x.y.y/32 .........::1/64 ......../64].
I1121 08:16:46.936525    8096 interface.go:224] Checking addr  x.x.y.y/32.
I1121 08:16:46.936596    8096 interface.go:231] IP found x.x.y.y
I1121 08:16:46.936616    8096 interface.go:263] Found valid IPv4 address x.x.y.y for interface "eth0".
I1121 08:16:46.936710    8096 interface.go:443] Found active IP x.x.y.y 
I1121 08:16:46.936803    8096 kubelet.go:218] the value of KubeletConfiguration.cgroupDriver is empty; setting it to "systemd"
I1121 08:16:46.948350    8096 version.go:186] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.txt
I1121 08:16:47.327247    8096 version.go:255] remote version is much newer: v1.25.4; falling back to: stable-1.24
I1121 08:16:47.327368    8096 version.go:186] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.25.txt
[init] Using Kubernetes version: v1.25.4
[preflight] Running pre-flight checks
I1121 08:16:47.716620    8096 checks.go:570] validating Kubernetes and kubeadm version
I1121 08:16:47.716770    8096 checks.go:170] validating if the firewall is enabled and active
I1121 08:16:47.731470    8096 checks.go:205] validating availability of port 6443
I1121 08:16:47.732017    8096 checks.go:205] validating availability of port 10259
....

O aviso a seguir aparece ao aguardar a inicialização do kubelet. Esta mensagem é mostrada até o tempo limite após 4 minutos:

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I1121 08:17:12.320743    8096 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.25.4 (linux/amd64) kubernetes/fdc7750" 'https://x.x.y.y:6443/healthz?timeout=10s'
I1121 08:17:12.321047    8096 round_trippers.go:508] HTTP Trace: Dial to tcp:x.x.y.y:6443 failed: dial tcp x.x.y.y:6443: connect: connection refused
I1121 08:17:12.321112    8096 round_trippers.go:553] GET https://x.x.y.y:6443/healthz?timeout=10s  in 0 milliseconds
I1121 08:17:12.321157    8096 round_trippers.go:570] HTTP Statistics: DNSLookup 0 ms Dial 0 ms TLSHandshake 0 ms Duration 0 ms
I1121 08:17:12.321209    8096 round_trippers.go:577] Response Headers:
I1121 08:17:12.821526    8096 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.25.4 (linux/amd64) kubernetes/fdc7750" 'https://x.x.y.y:6443/healthz?timeout=10s'
I1121 08:17:12.821882    8096 round_trippers.go:508] HTTP Trace: Dial to tcp:x.x.y.y:6443 failed: dial tcp x.x.y.y:6443: connect: connection refused
.....

A verificação do status do kublet mostra:

$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Mon 2022-11-21 08:17:12 UTC; 4min 30s ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 8228 (kubelet)
      Tasks: 14 (limit: 4556)
     Memory: 52.0M
        CPU: 6.246s
     CGroup: /system.slice/kubelet.service
             └─8228 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-ru>

Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.526642    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.626872    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.727919    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.829055    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.930002    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.959961    8228 eviction_manager.go:254] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"master->
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.029432    8228 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady >
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.030749    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.130874    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.231537    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"

Verificando o journalctl mostra:

$ sudo journalctl -xeu kubelet
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.585238    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.685464    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.786279    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.887211    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.987526    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:38 master-1 kubelet[8228]: E1121 08:22:38.045350    8228 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady >
Nov 21 08:22:38 master-1 kubelet[8228]: E1121 08:22:38.088201    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
....
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.500610    8228 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://x.x.y.y:6443/apis/coordin>
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.512026    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.613041    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:40 master-1 kubelet[8228]: I1121 08:22:40.700243    8228 kubelet_node_status.go:70] "Attempting to register node" node="master-1"
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.701021    8228 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://x.x.y.y:6443/api/v1/node>
...
.....

Como posso descobrir a raiz desse problema? Os arquivos de log não me forneceram nenhuma dica útil.

Observação:Se eu instalar o CRI-O em vez do containerd, o kubeadm funcionará perfeitamente.

Responder1

Tenho o mesmo problema aqui com kubeadm v1.25.4 e containerd v1.4.13.

O Containerd também parece estar bem e o serviço Kubelet está em estado ativo, mas o kubelet-api permanece inativo com todos os pods do plano de controle.

kubectl get pods --all-namespaces
The connection to the server localhost:8080 was refused - did you specify the right host or port?

E tenho outros logs em meu arquivo syslog:

Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.517592    2809 kubelet.go:2448] "Error getting node" err="node \"master-1\" not found"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.618103    2809 kubelet.go:2448] "Error getting node" err="node \"master-1\" not found"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.718895    2809 kubelet.go:2448] "Error getting node" err="node \"master-1\" not found"
Nov 25 09:39:08 master-1 containerd[450]: time="2022-11-25T09:39:08.774397538Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-scheduler-master-1,Uid:c8fdb264532b280b4098380e628d113d,Namespace:kube-system,Attempt:0,}"
Nov 25 09:39:08 master-1 containerd[450]: time="2022-11-25T09:39:08.774397563Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-apiserver-master-1,Uid:e8e76556f3e67024151f36c60b85b622,Namespace:kube-system,Attempt:0,}"
Nov 25 09:39:08 master-1 containerd[450]: time="2022-11-25T09:39:08.800116714Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-scheduler-master-1,Uid:c8fdb264532b280b4098380e628d113d,Namespace:kube-system,Attempt:0,} failed, error" error="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800548    2809 remote_runtime.go:233] "RunPodSandbox from runtime service failed" err="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800620    2809 kuberuntime_sandbox.go:71] "Failed to create sandbox for pod" err="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument" pod="kube-system/kube-scheduler-master-1"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800653    2809 kuberuntime_manager.go:772] "CreatePodSandbox for pod failed" err="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument" pod="kube-system/kube-scheduler-master-1"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800729    2809 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-master-1_kube-system(c8fdb264532b280b4098380e628d113d)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-master-1_kube-system(c8fdb264532b280b4098380e628d113d)\\\": rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument\"" pod="kube-system/kube-scheduler-master-1" podUID=c8fdb264532b280b4098380e628d113d

Sigo seu tópico se alguém tiver uma solução ou pista.

informação relacionada