
Implantei um cluster básico em duas máquinas virtuais (kvm), sendo uma delas designada como mestre com plano de controle usando kubeadm init
- tudo parece estar iniciando corretamente, mas quando tento fazer até mesmo as verificações mais básicas usando, kubectl
recebo um erro de conexão recusada.
ATUALIZAÇÃO 2:
Funcionou, mas não entendo por que - Originalmente, eu estava executando tudo como usuário dedicado com sudo. Assim que mudei para root (su root) e repeti os passos, tudo começou a funcionar. O que causou a mudança, está de alguma forma relacionado a estar no ambiente raiz em vez de no usuário? Diretório inicial diferente? Diretório de trabalho? Estou perdido aqui
ATUALIZAÇÃO 1: Exemplo de falha mínima:
Desta vez fiz outra máquina virtual rodando o Ubuntu 20.04 tentando torná-la uma réplica doEste tutorialo mais próximo possível do exemplo original, mas a última etapa falha exatamente como meu problema original. Este tutorial está realmente completo?
Execute passo a passo:
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt update
apt install -y kubelet=1.20.0-00 kubeadm=1.20.0-00 kubectl=1.20.0-00
apt-mark hold kubelet kubeadm kubectl
export VERSION=19.03 && curl -sSL get.docker.com | sh
cat <<EOF | sudo tee /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
systemctl enable docker
systemctl daemon-reload
systemctl restart docker
kubeadm init --ignore-preflight-errors=all
>>> At this point everything fails - some service is missing?
Mais sobre meio ambiente:
- host é ubuntu 20.04 LTS, executando kvm
- convidado (é aqui que eu instalo o k8s) é o servidor Ubuntu 22.04
- a rede host está em 192.168.xy
- ponte para VMs está na versão 10.0.1.x/24
- UFW éinativo
Rede:
ip route show
- default via 10.0.1.254 dev ens3 proto static # esta é a parte da ponte que conecta minha máquina virtual ao mundo exterior.
- 10.0.1.0/24 dev ens3 proto kernel scope link src 10.0.1.10 # Este é o IP da máquina. As máquinas que hospedam nós terão IPs terminando em 11, 12, 13, etc.
- 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown # gerado automaticamente pelo docker - nunca foi bom com o formato de máscara de rede moderno, então confio no docker /16 para funcionar bem. Eu não fiz nada com isso.
cat /etc/hosts
Eu adicionei primary
nos hosts, o resto é gerado automaticamente.
127.0.0.1 localhost
127.0.0.1 primary
127.0.1.1 k8-vm
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
configuração do daemon docker
cat /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
me@k8-vm:~$ kubectl obter pods
E0117 09:10:43.476675 13492 memcache.go:265] couldn't get current server API group list: Get "https://10.0.1.10:6443/api?timeout=32s": dial tcp 10.0.1.10:6443: connect: connection refused
E0117 09:10:43.477525 13492 memcache.go:265] couldn't get current server API group list: Get "https://10.0.1.10:6443/api?timeout=32s": dial tcp 10.0.1.10:6443: connect: connection refused
E0117 09:10:43.479198 13492 memcache.go:265] couldn't get current server API group list: Get "https://10.0.1.10:6443/api?timeout=32s": dial tcp 10.0.1.10:6443: connect: connection refused
E0117 09:10:43.479754 13492 memcache.go:265] couldn't get current server API group list: Get "https://10.0.1.10:6443/api?timeout=32s": dial tcp 10.0.1.10:6443: connect: connection refused
E0117 09:10:43.481822 13492 memcache.go:265] couldn't get current server API group list: Get "https://10.0.1.10:6443/api?timeout=32s": dial tcp 10.0.1.10:6443: connect: connection refused
The connection to the server 10.0.1.10:6443 was refused - did you specify the right host or port?
Eu configurei a configuração em ~/.kube/config :
cat ~/.kube/config
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: secret_stuff
server: https://10.0.1.10:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: secret_stuff
client-key-data: secret_stuff
Eu tentei liberar o iptables, masalgoestá restaurando-o de volta. Entradas suspeitas no iptables:
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- !localhost/8 localhost/8 /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT
Chain KUBE-FORWARD (1 references)
target prot opt source destination
DROP all -- anywhere anywhere ctstate INVALID
Chain KUBE-SERVICES (2 references)
target prot opt source destination
REJECT tcp -- anywhere 10.96.0.1 /* default/kubernetes:https has no endpoints */ reject-with icmp-port-unreachable
REJECT udp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns has no endpoints */ reject-with icmp-port-unreachable
REJECT tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns-tcp has no endpoints */ reject-with icmp-port-unreachable
REJECT tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:metrics has no endpoints */ reject-with icmp-port-unreachable
será que o kubectl está sendo bloqueado pelo iptables?
registros, saídas:
me@primary:~$ sudo kubeadm init --apiserver-advertise-address=172.105.148.95 --apiserver-cert-extra-sans=172.105.148.95 --pod-network-cidr=10.13.13.0/16 --node- nome primário
[init] Using Kubernetes version: v1.29.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0119 09:35:48.771333 2949 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local primary] and IPs [10.96.0.1 172.105.148.95]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost primary] and IPs [172.105.148.95 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost primary] and IPs [172.105.148.95 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
@user23204747 – respondendo sua pergunta:
journalctl -fu kubelet
Jan 19 09:42:58 primary kubelet[3063]: E0119 09:42:58.312695 3063 kubelet_node_status.go:96] "Unable to register node with API server" err="Post \"https://172.105.148.95:6443/api/v1/nodes\": dial tcp 172.105.148.95:6443: connect: connection refused" node="primary"
Jan 19 09:42:59 primary kubelet[3063]: E0119 09:42:59.476195 3063 event.go:355] "Unable to write event (may retry after sleeping)" err="Post \"https://172.105.148.95:6443/api/v1/namespaces/default/events\": dial tcp 172.105.148.95:6443: connect: connection refused" event="&Event{ObjectMeta:{primary.17abb5f60468180b default 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Node,Namespace:,Name:primary,UID:primary,APIVersion:,ResourceVersion:,FieldPath:,},Reason:NodeHasSufficientPID,Message:Node primary status is now: NodeHasSufficientPID,Source:EventSource{Component:kubelet,Host:primary,},FirstTimestamp:2024-01-19 09:35:52.130377739 +0000 UTC m=+0.287533886,LastTimestamp:2024-01-19 09:35:52.130377739 +0000 UTC m=+0.287533886,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:kubelet,ReportingInstance:primary,}"
Jan 19 09:43:02 primary kubelet[3063]: E0119 09:43:02.208913 3063 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"primary\" not found"
Jan 19 09:43:05 primary kubelet[3063]: E0119 09:43:05.289531 3063 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://172.105.148.95:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/primary?timeout=10s\": dial tcp 172.105.148.95:6443: connect: connection refused" interval="7s"
Jan 19 09:43:05 primary kubelet[3063]: I0119 09:43:05.315765 3063 kubelet_node_status.go:73] "Attempting to register node" node="primary"
Jan 19 09:43:05 primary kubelet[3063]: E0119 09:43:05.424960 3063 kubelet_node_status.go:96] "Unable to register node with API server" err="Post \"https://172.105.148.95:6443/api/v1/nodes\": dial tcp 172.105.148.95:6443: connect: connection refused" node="primary"
Jan 19 09:43:05 primary kubelet[3063]: W0119 09:43:05.450439 3063 reflector.go:539] vendor/k8s.io/client-go/informers/factory.go:159: failed to list *v1.Service: Get "https://172.105.148.95:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 172.105.148.95:6443: connect: connection refused
Jan 19 09:43:05 primary kubelet[3063]: E0119 09:43:05.450569 3063 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://172.105.148.95:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 172.105.148.95:6443: connect: connection refused
Jan 19 09:43:09 primary kubelet[3063]: I0119 09:43:09.129122 3063 scope.go:117] "RemoveContainer" containerID="af6057fbd0e43b4628685f316cffef08e6ea4a6236223120ce825681b53a45ad"
Jan 19 09:43:09 primary kubelet[3063]: E0119 09:43:09.129843 3063 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=etcd pod=etcd-primary_kube-system(b8faf6a4553601b5610c06ed8c93f9e3)\"" pod="kube-system/etcd-primary" podUID="b8faf6a4553601b5610c06ed8c93f9e3"
Jan 19 09:43:09 primary kubelet[3063]: E0119 09:43:09.585849 3063 event.go:355] "Unable to write event (may retry after sleeping)" err="Post \"https://172.105.148.95:6443/api/v1/namespaces/default/events\": dial tcp 172.105.148.95:6443: connect: connection refused" event="&Event{ObjectMeta:{primary.17abb5f60468180b default 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Node,Namespace:,Name:primary,UID:primary,APIVersion:,ResourceVersion:,FieldPath:,},Reason:NodeHasSufficientPID,Message:Node primary status is now: NodeHasSufficientPID,Source:EventSource{Component:kubelet,Host:primary,},FirstTimestamp:2024-01-19 09:35:52.130377739 +0000 UTC m=+0.287533886,LastTimestamp:2024-01-19 09:35:52.130377739 +0000 UTC m=+0.287533886,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:kubelet,ReportingInstance:primary,}"
Jan 19 09:43:12 primary kubelet[3063]: I0119 09:43:12.128675 3063 scope.go:117] "RemoveContainer" containerID="a9185c62d88f3192935626d7213b4378816563580b4afa1ac14b74f4029b548c"
Jan 19 09:43:12 primary kubelet[3063]: E0119 09:43:12.209179 3063 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"primary\" not found"
Jan 19 09:43:12 primary kubelet[3063]: E0119 09:43:12.398807 3063 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://172.105.148.95:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/primary?timeout=10s\": dial tcp 172.105.148.95:6443: connect: connection refused" interval="7s"
Jan 19 09:43:12 primary kubelet[3063]: I0119 09:43:12.427248 3063 kubelet_node_status.go:73] "Attempting to register node" node="primary"
Jan 19 09:43:12 primary kubelet[3063]: E0119 09:43:12.551820 3063 kubelet_node_status.go:96] "Unable to register node with API server" err="Post \"https://172.105.148.95:6443/api/v1/nodes\": dial tcp 172.105.148.95:6443: connect: connection refused" node="primary"
Jan 19 09:43:14 primary kubelet[3063]: W0119 09:43:14.542312 3063 reflector.go:539] vendor/k8s.io/client-go/informers/factory.go:159: failed to list *v1.CSIDriver: Get "https://172.105.148.95:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 172.105.148.95:6443: connect: connection refused
Jan 19 09:43:14 primary kubelet[3063]: E0119 09:43:14.542441 3063 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://172.105.148.95:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 172.105.148.95:6443: connect: connection refused
Jan 19 09:43:17 primary kubelet[3063]: W0119 09:43:17.648986 3063 reflector.go:539] vendor/k8s.io/client-go/informers/factory.go:159: failed to list *v1.RuntimeClass: Get "https://172.105.148.95:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 172.105.148.95:6443: connect: connection refused
Jan 19 09:43:17 primary kubelet[3063]: E0119 09:43:17.649087 3063 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://172.105.148.95:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 172.105.148.95:6443: connect: connection refused
Jan 19 09:43:19 primary kubelet[3063]: E0119 09:43:19.507898 3063 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://172.105.148.95:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/primary?timeout=10s\": dial tcp 172.105.148.95:6443: connect: connection refused" interval="7s"
Jan 19 09:43:19 primary kubelet[3063]: I0119 09:43:19.554658 3063 kubelet_node_status.go:73] "Attempting to register node" node="primary"
Jan 19 09:43:19 primary kubelet[3063]: E0119 09:43:19.679757 3063 kubelet_node_status.go:96] "Unable to register node with API server" err="Post \"https://172.105.148.95:6443/api/v1/nodes\": dial tcp 172.105.148.95:6443: connect: connection refused" node="primary"
Jan 19 09:43:19 primary kubelet[3063]: E0119 09:43:19.694847 3063 event.go:355] "Unable to write event (may retry after sleeping)" err="Post \"https://172.105.148.95:6443/api/v1/namespaces/default/events\": dial tcp 172.105.148.95:6443: connect: connection refused" event="&Event{ObjectMeta:{primary.17abb5f60468180b default 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Node,Namespace:,Name:primary,UID:primary,APIVersion:,ResourceVersion:,FieldPath:,},Reason:NodeHasSufficientPID,Message:Node primary status is now: NodeHasSufficientPID,Source:EventSource{Component:kubelet,Host:primary,},FirstTimestamp:2024-01-19 09:35:52.130377739 +0000 UTC m=+0.287533886,LastTimestamp:2024-01-19 09:35:52.130377739 +0000 UTC m=+0.287533886,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:kubelet,ReportingInstance:primary,}"
Jan 19 09:43:20 primary kubelet[3063]: W0119 09:43:20.761463 3063 reflector.go:539] vendor/k8s.io/client-go/informers/factory.go:159: failed to list *v1.Node: Get "https://172.105.148.95:6443/api/v1/nodes?fieldSelector=metadata.name%3Dprimary&limit=500&resourceVersion=0": dial tcp 172.105.148.95:6443: connect: connection refused
Jan 19 09:43:20 primary kubelet[3063]: E0119 09:43:20.761607 3063 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.105.148.95:6443/api/v1/nodes?fieldSelector=metadata.name%3Dprimary&limit=500&resourceVersion=0": dial tcp 172.105.148.95:6443: connect: connection refused
Jan 19 09:43:21 primary kubelet[3063]: E0119 09:43:21.126858 3063 certificate_manager.go:562] kubernetes.io/kube-apiserver-client-kubelet: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Post "https://172.105.148.95:6443/apis/certificates.k8s.io/v1/certificatesigningrequests": dial tcp 172.105.148.95:6443: connect: connection refused
Jan 19 09:43:22 primary kubelet[3063]: E0119 09:43:22.209975 3063 eviction_manager.go:282] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"primary\" not found"
Jan 19 09:43:24 primary kubelet[3063]: I0119 09:43:24.129352 3063 scope.go:117] "RemoveContainer" containerID="af6057fbd0e43b4628685f316cffef08e6ea4a6236223120ce825681b53a45ad"
Jan 19 09:43:24 primary kubelet[3063]: E0119 09:43:24.130066 3063 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=etcd pod=etcd-primary_kube-system(b8faf6a4553601b5610c06ed8c93f9e3)\"" pod="kube-system/etcd-primary"
podUID="b8faf6a4553601b5610c06ed8c93f9e3"
Saída CRI:
sudo crictl ps --all
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0000] validate service connection: validate CRI v1 runtime API for endpoint "unix:///var/run/dockershim.sock": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/run/dockershim.sock: connect: no such file or directory"
WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0000] validate service connection: validate CRI v1 image API for endpoint "unix:///var/run/dockershim.sock": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/run/dockershim.sock: connect: no such file or directory"
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
58dcdbbbbdb57 53b148a9d1963 About a minute ago Exited kube-apiserver 17 aa7a744e3afca kube-apiserver-primary
af6057fbd0e43 a0eed15eed449 3 minutes ago Exited etcd 19 12ffe5fa72b5c etcd-primary
a20b25ee9f45e 406945b511542 3 minutes ago Running kube-scheduler 5 507b8c9d8b7ab kube-scheduler-primary
049545de2ba82 79d451ca186a6 3 minutes ago Running kube-controller-manager 4 75616efa726fd kube-controller-manager-primary
8fbf3e41714a7 79d451ca186a6 8 minutes ago Exited kube-controller-manager 3 ae5e4208a01ef kube-controller-manager-primary
4b13c766ae730 406945b511542 8 minutes ago Exited kube-scheduler 4 7e4f0405277e4 kube-scheduler-primary
Logs de contêiner: são um pouco confusos, mas no final ele continua exibindo a mesma mensagem de erro, como se tentasse novamente e falhasse
W0119 09:43:29.388741 1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
W0119 09:43:30.103643 1 logging.go:59] [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
F0119 09:43:33.102160 1 instance.go:290] Error creating leases: error creating storage factory: context deadline exceeded
status do kubelet:
sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Fri 2024-01-19 13:36:51 UTC; 12min ago
Docs: https://kubernetes.io/docs/home/
Main PID: 8539 (kubelet)
Tasks: 12 (limit: 37581)
Memory: 33.6M
CPU: 19.659s
CGroup: /system.slice/kubelet.service
└─8539 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/p>
Jan 19 13:49:18 primary kubelet[8539]: W0119 13:49:18.366543 8539 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://master-node:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup master-node on 127.0.0.53:53: server m>
Jan 19 13:49:18 primary kubelet[8539]: E0119 13:49:18.366695 8539 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://master-node:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup master-no>
Jan 19 13:49:19 primary kubelet[8539]: E0119 13:49:19.481598 8539 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://master-node:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/primary?timeout=10s\": dial tcp: lookup master-node on 127.0.0.53:>
Jan 19 13:49:19 primary kubelet[8539]: I0119 13:49:19.902400 8539 kubelet_node_status.go:70] "Attempting to register node" node="primary"
Jan 19 13:49:19 primary kubelet[8539]: E0119 13:49:19.904293 8539 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://master-node:6443/api/v1/nodes\": dial tcp: lookup master-node on 127.0.0.53:53: server misbehaving" node="primary"
Jan 19 13:49:21 primary kubelet[8539]: E0119 13:49:21.741936 8539 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"primary\" not found"
Jan 19 13:49:26 primary kubelet[8539]: E0119 13:49:26.261529 8539 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"primary.17abc31ca21e74b4", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:>
Jan 19 13:49:26 primary kubelet[8539]: E0119 13:49:26.483811 8539 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://master-node:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/primary?timeout=10s\": dial tcp: lookup master-node on 127.0.0.53:>
Jan 19 13:49:26 primary kubelet[8539]: I0119 13:49:26.907141 8539 kubelet_node_status.go:70] "Attempting to register node" node="primary"
Jan 19 13:49:26 primary kubelet[8539]: E0119 13:49:26.909431 8539 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://master-node:6443/api/v1/nodes\": dial tcp: lookup master-node on 127.0.0.53:53: server misbehaving" node="primary"
Responder1
Eu primeiro verificaria se o pod apiserver está instalado e funcionando para excluir problemas nesse sentido. Experimente os abaixo no nó do plano de controle.
Primeiro, há algo suspeito nos logs do kubelet?
journalctl -fu kubelet
Em seguida, registre o tempo de execução do contêiner para o pod.
sudo crictl ps --all # get container_id
sudo crictl logs container_id
Você não incluiu o kubeadm init
comando completo que executou, portanto, certifique-se de defini-lo --apiserver-advertise-address
corretamente.
Responder2
O connection refused
erro normalmente significa que a solicitação chega ao servidor, mas não há nenhum serviço em execução na porta especificada. Tem certeza de que o api-server foi iniciado em seu nó mestre? Para verificar o status do servidor kube-api usando o seguinte comando:
Systemctl status kube-apiserver
Como o arquivo kubeconfig está localizado em
\~/.kube/config
, certifique-se de definir a variável de ambiente KUBECONFIG para apontar para o arquivo correto.Certifique-se de que haja conectividade de rede entre sua máquina cliente e o nó mestre. Verifique se você pode acessar o endereço IP do nó mestre a partir de sua máquina.
Verifique o firewall e os grupos de segurança. Certifique-se de que as portas necessárias para comunicação da API kubernetes estejam abertas. Por padrão, o servidor API kubernetes escuta a porta 6443.
Se você suspeitar que há entradas suspeitas no iptables, inspecione as regras do iptable usando o seguinte comando
iptables -L -n
- Você pode procurar quaisquer regras que possam estar interferindo na comunicação da API do Kubernetes. Você pode liberar as regras temporariamente usando o comandoiptables -F
- Verifique o status do kube-proxy no nó mestre usando
kubectl -n kube-system get pods -l k8s-app=kube-proxy
.
EDITAR
This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
A partir da mensagem de erro acima, o erro de conexão recusada pode ser causado por configurações incorretas no nó, como cgroups necessários desabilitados, certamente pode causar problemas com o kubelet.
- Verifique o status do serviço kubelet no nó executando o comando a seguir.
systemctl status kubelet
- Este comando mostra kubelet com falha ou não íntegro. Se o kubelet não estiver em execução, tente iniciá-lo usandosystemctl start kubelet
- Valide a configuração do nó para garantir que ele esteja configurado corretamente e verifique se é necessáriogruposestão desabilitados ou configurados incorretamente.
A. Se você estiver usando containerd como cri, siga este comando, altere SystemdCgroup = false
paraSystemdCgroup = true
sudo systemctl restart containerd
sudo systemctl restart kubelet
reinicie sua máquina (pode ser opcional)
B. Caso você esteja usando o docker como cri, siga o comando theses open vi /etc/docker/daemon.json altere isto ["native.cgroupdriver=systemd"] para "exec-opts": ["native.cgroupdriver=cgroupfs"],
sudo systemctl daemon-reload
sudo systemctl restart kubelet
reinicie sua máquina (pode ser opcional)