Tengo un clúster de Kubernetes con 5 nodos, 4 máquinas virtuales con motor de cómputo de Google (un controlador y 3 nodos de trabajo) y una máquina local básica en mi casa (nodo de trabajo de Kube). El clúster está en funcionamiento y todos los nodos están en estado Listo.
- Clúster autoadministrado configurado en base a:https://docs.projectcalico.org/getting-started/kubernetes/self-managed-public-cloud/gce
- Se agregan reglas de firewall para Ingress y Engress para todas las IP (0.0.0.0/0) y cualquier puerto.
- Anuncio el nodo maestro de Kube con la etiqueta **--control-plane-endpoint IP:PORT ** para la IP pública del nodo maestro y me uno a los nodos trabajadores en función de eso.
Problema:Tengo un problema cuando implemento una aplicación, todos los pods en el nodo trabajador local se atascan con el estado ContainerCreating mientras los contenedores en los trabajadores de GCE VM se están implementando correctamente. ¿Alguien sabe cuál es el problema con esta configuración y cómo puedo solucionarlo?
- Este es el resultado de los eventos de uno de mis pods parakubect describe la vainaproducción:
Eventos: social-network/home-timeline-redis-6f4c5d55fc-tql2l asignado exitosamente a volátil
Warning FailedCreatePodSandBox 3m14s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "32b64e6efcaff6401b7b0a6936f005a00a53c19a2061b0a14906b8bc3a81bf20" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Warning FailedCreatePodSandBox 102s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1e95fa10d49abf5edc8693345256b91e88c31d1b6414761de80e6038cd7696a4" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Normal SandboxChanged 11s (x3 over 3m14s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 11s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8f5959966e4c25f94bd49b82e1fa6da33a114b1680eae8898ba6685f22e7d37f" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
ACTUALIZAR
Restablecí kubeadm en todos los nodos, eliminé cilio y volví a crear calico cni. También cambié el contenedor y parece que resolvió el conflicto con el CDIR del host CDIR
. sudo kubeadm init --pod-network-cidr=20.96.0.0/12 --control-plane-endpoint "34.89.7.120:6443"
Pero aún así los pods en Volatile (máquina local) se atascan con ContainerCreating:
`
>root@controller:~# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-744cfdf676-bh2nc 1/1 Running 0 12m 20.109.133.129 worker-2 <none> <none>
calico-node-frv5r 1/1 Running 0 12m 10.240.0.11 controller <none> <none>
calico-node-lplx6 1/1 Running 0 12m 10.240.0.20 worker-0 <none> <none>
calico-node-lwrdr 1/1 Running 0 12m 10.240.0.21 worker-1 <none> <none>
calico-node-ppczn 0/1 CrashLoopBackOff 7 12m 130.239.41.206 volatile <none> <none>
calico-node-zplwx 1/1 Running 0 12m 10.240.0.22 worker-2 <none> <none>
coredns-74ff55c5b-69mn2 1/1 Running 0 14m 20.105.55.194 controller <none> <none>
coredns-74ff55c5b-djczf 1/1 Running 0 14m 20.105.55.193 controller <none> <none>
etcd-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-apiserver-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-controller-manager-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-proxy-5vzdf 1/1 Running 0 13m 10.240.0.20 worker-0 <none> <none>
kube-proxy-d22q4 1/1 Running 0 13m 10.240.0.22 worker-2 <none> <none>
kube-proxy-hml5c 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-proxy-hw8kl 1/1 Running 0 13m 10.240.0.21 worker-1 <none> <none>
kube-proxy-zb6t7 1/1 Running 0 13m 130.239.41.206 volatile <none> <none>
kube-scheduler-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system
:
> root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system
Name: calico-node-ppczn
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: volatile/130.239.41.206
Start Time: Mon, 04 Jan 2021 13:01:36 +0000
Labels: controller-revision-hash=89c447898
k8s-app=calico-node
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 130.239.41.206
IPs:
IP: 130.239.41.206
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://27f988847a484c5f74e000c4b8f473895b71ed49f27e0bf4fab4b425940951dc
Image: docker.io/calico/cni:v3.17.1
Image ID: docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Jan 2021 13:01:37 +0000
Finished: Mon, 04 Jan 2021 13:01:38 +0000
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
install-cni:
Container ID: docker://5629f6984cfe545864d187112a0c1f65e7bdb7dbfae9b4971579f420ab55b77b
Image: docker.io/calico/cni:v3.17.1
Image ID: docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/install
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Jan 2021 13:01:39 +0000
Finished: Mon, 04 Jan 2021 13:01:41 +0000
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
flexvol-driver:
Container ID: docker://3a4bf307a347926893aeb956717d84049af601fd4cc4aa7add6e182c85dc4e7c
Image: docker.io/calico/pod2daemon-flexvol:v3.17.1
Image ID: docker-pullable://calico/pod2daemon-flexvol@sha256:48f277d41c35dae051d7dd6f0ec8f64ac7ee6650e27102a41b0203a0c2ce6c6b
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Jan 2021 13:01:43 +0000
Finished: Mon, 04 Jan 2021 13:01:43 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Containers:
calico-node:
Container ID: docker://2576b2426c2a3fc4b6a972839a94872160c7ac5efa5b1159817be8d4ad4ddf60
Image: docker.io/calico/node:v3.17.1
Image ID: docker-pullable://calico/node@sha256:25e0b0495c0df3a7a06b6f9e92203c53e5b56c143ac1c885885ee84bf86285ff
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 04 Jan 2021 13:18:48 +0000
Finished: Mon, 04 Jan 2021 13:19:57 +0000
Ready: False
Restart Count: 9
Requests:
cpu: 250m
Liveness: exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
CALICO_IPV4POOL_VXLAN: Never
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_VXLANMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_WIREGUARDMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/sys/fs/ from sysfs (rw)
/var/lib/calico from var-lib-calico (rw)
/var/log/calico/cni from cni-log-dir (ro)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
sysfs:
Type: HostPath (bare host directory volume)
Path: /sys/fs/
HostPathType: DirectoryOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
cni-log-dir:
Type: HostPath (bare host directory volume)
Path: /var/log/calico/cni
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
HostPathType: DirectoryOrCreate
calico-node-token-8r94c:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-8r94c
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoSchedule op=Exists
:NoExecute op=Exists
CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22m default-scheduler Successfully assigned kube-system/calico-node-ppczn to volatile
Normal Pulled 22m kubelet Container image "docker.io/calico/cni:v3.17.1" already present on machine
Normal Created 22m kubelet Created container upgrade-ipam
Normal Started 22m kubelet Started container upgrade-ipam
Normal Pulled 21m kubelet Container image "docker.io/calico/cni:v3.17.1" already present on machine
Normal Started 21m kubelet Started container install-cni
Normal Created 21m kubelet Created container install-cni
Normal Pulled 21m kubelet Container image "docker.io/calico/pod2daemon-flexvol:v3.17.1" already present on machine
Normal Created 21m kubelet Created container flexvol-driver
Normal Started 21m kubelet Started container flexvol-driver
Normal Pulled 21m kubelet Container image "docker.io/calico/node:v3.17.1" already present on machine
Normal Created 21m kubelet Created container calico-node
Normal Started 21m kubelet Started container calico-node
Warning Unhealthy 21m (x2 over 21m) kubelet Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused
Warning Unhealthy 11m (x51 over 21m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
Warning DNSConfigForming 115s (x78 over 22m) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 130.239.40.2 130.239.40.3 2001:6b0:e:4040::2
registros calico-node-ppczn:
> root@controller:~# kubectl logs calico-node-ppczn -n kube-system
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 379: Early log level set to info
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 395: Using NODENAME environment for node name
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 407: Determined node name: volatile
2021-01-04 13:17:38.011 [INFO][8] startup/startup.go 439: Checking datastore connection
2021-01-04 13:18:08.011 [INFO][8] startup/startup.go 454: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout
EN la máquina local:
> root@volatile:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
39efaf54f558 k8s.gcr.io/pause:3.2 "/pause" 19 minutes ago Up 19 minutes k8s_POD_calico-node-ppczn_kube-system_7e98eb90-f581-4dbc-b877-da25bc2868f9_0
05bd9fa182e5 e3f6fcd87756 "/usr/local/bin/kube…" 20 minutes ago Up 20 minutes k8s_kube-proxy_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
ae11c77897b0 k8s.gcr.io/pause:3.2 "/pause" 20 minutes ago Up 20 minutes k8s_POD_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
> root@volatile:~# docker logs 39efaf54f558
> root@volatile:~# docker logs 05bd9fa182e5
I0104 13:00:51.131737 1 node.go:172] Successfully retrieved node IP: 130.239.41.206
I0104 13:00:51.132027 1 server_others.go:142] kube-proxy node IP is an IPv4 address (130.239.41.206), assume IPv4 operation
W0104 13:00:51.162536 1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy
I0104 13:00:51.162615 1 server_others.go:185] Using iptables Proxier.
I0104 13:00:51.162797 1 server.go:650] Version: v1.20.1
I0104 13:00:51.163080 1 conntrack.go:52] Setting nf_conntrack_max to 262144
I0104 13:00:51.163289 1 config.go:315] Starting service config controller
I0104 13:00:51.163300 1 config.go:224] Starting endpoint slice config controller
I0104 13:00:51.163304 1 shared_informer.go:240] Waiting for caches to sync for service config
I0104 13:00:51.163311 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0104 13:00:51.263469 1 shared_informer.go:247] Caches are synced for endpoint slice config
I0104 13:00:51.263487 1 shared_informer.go:247] Caches are synced for service config
> root@volatile:~# docker logs ae11c77897b0
root@volatile:~# ls /etc/cni/net.d/
10-calico.conflist calico-kubeconfig
root@volatile:~# ls /var/lib/calico/
root@volatile:~#
Respuesta1
En el host volatile
parece que tienes cilio configurado en /etc/cni/net.d/*.conf
. Es un complemento de red, uno de los muchos disponibles para Kubernetes. Uno de estos archivos probablemente contenga algo como:
{
"name": "cilium",
"type": "cilium-cni"
}
Si esto es accidental, elimine dicho archivo. Parece que ya estás ejecutando un complemento de red de la competencia de Project Calico, lo cual aparentemente es suficiente. Entonces, vuelva a crear los pod calico-kube-controllers en el espacio de nombres kube-system
, déjelo funcionar y luego vuelva a crear otros pods.
Si tiene la intención de utilizar Cilium en ese host, regrese a laGuía de instalación de Cillium. Si lo vuelve a hacer, probablemente verá que /var/run/cilium/cilium.sock se ha creado para usted.