Pods atascados con el contenedor Creando estado en un clúster de Kubernetes autoadministrado en Google Compute Engine (GCE) con un nodo de Kube externo

Pods atascados con el contenedor Creando estado en un clúster de Kubernetes autoadministrado en Google Compute Engine (GCE) con un nodo de Kube externo

Tengo un clúster de Kubernetes con 5 nodos, 4 máquinas virtuales con motor de cómputo de Google (un controlador y 3 nodos de trabajo) y una máquina local básica en mi casa (nodo de trabajo de Kube). El clúster está en funcionamiento y todos los nodos están en estado Listo.

  1. Clúster autoadministrado configurado en base a:https://docs.projectcalico.org/getting-started/kubernetes/self-managed-public-cloud/gce
  2. Se agregan reglas de firewall para Ingress y Engress para todas las IP (0.0.0.0/0) y cualquier puerto.
  3. Anuncio el nodo maestro de Kube con la etiqueta **--control-plane-endpoint IP:PORT ** para la IP pública del nodo maestro y me uno a los nodos trabajadores en función de eso.

Problema:Tengo un problema cuando implemento una aplicación, todos los pods en el nodo trabajador local se atascan con el estado ContainerCreating mientras los contenedores en los trabajadores de GCE VM se están implementando correctamente. ¿Alguien sabe cuál es el problema con esta configuración y cómo puedo solucionarlo?

  • Este es el resultado de los eventos de uno de mis pods parakubect describe la vainaproducción:

Eventos: social-network/home-timeline-redis-6f4c5d55fc-tql2l asignado exitosamente a volátil

Warning  FailedCreatePodSandBox  3m14s  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "32b64e6efcaff6401b7b0a6936f005a00a53c19a2061b0a14906b8bc3a81bf20" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
  
Warning  FailedCreatePodSandBox  102s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1e95fa10d49abf5edc8693345256b91e88c31d1b6414761de80e6038cd7696a4" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
  
Normal   SandboxChanged          11s (x3 over 3m14s)  kubelet  Pod sandbox changed, it will be killed and re-created.
 
Warning  FailedCreatePodSandBox  11s                  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8f5959966e4c25f94bd49b82e1fa6da33a114b1680eae8898ba6685f22e7d37f" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?

ACTUALIZAR

Restablecí kubeadm en todos los nodos, eliminé cilio y volví a crear calico cni. También cambié el contenedor y parece que resolvió el conflicto con el CDIR del host CDIR. sudo kubeadm init --pod-network-cidr=20.96.0.0/12 --control-plane-endpoint "34.89.7.120:6443"Pero aún así los pods en Volatile (máquina local) se atascan con ContainerCreating:

`

>root@controller:~# kubectl get pods -n kube-system -o wide
NAME                                       READY   STATUS             RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
calico-kube-controllers-744cfdf676-bh2nc   1/1     Running            0          12m   20.109.133.129   worker-2     <none>           <none>
calico-node-frv5r                          1/1     Running            0          12m   10.240.0.11      controller   <none>           <none>
calico-node-lplx6                          1/1     Running            0          12m   10.240.0.20      worker-0     <none>           <none>
calico-node-lwrdr                          1/1     Running            0          12m   10.240.0.21      worker-1     <none>           <none>
calico-node-ppczn                          0/1     CrashLoopBackOff   7          12m   130.239.41.206   volatile     <none>           <none>
calico-node-zplwx                          1/1     Running            0          12m   10.240.0.22      worker-2     <none>           <none>
coredns-74ff55c5b-69mn2                    1/1     Running            0          14m   20.105.55.194    controller   <none>           <none>
coredns-74ff55c5b-djczf                    1/1     Running            0          14m   20.105.55.193    controller   <none>           <none>
etcd-controller                            1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-apiserver-controller                  1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-controller-manager-controller         1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-proxy-5vzdf                           1/1     Running            0          13m   10.240.0.20      worker-0     <none>           <none>
kube-proxy-d22q4                           1/1     Running            0          13m   10.240.0.22      worker-2     <none>           <none>
kube-proxy-hml5c                           1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-proxy-hw8kl                           1/1     Running            0          13m   10.240.0.21      worker-1     <none>           <none>
kube-proxy-zb6t7                           1/1     Running            0          13m   130.239.41.206   volatile     <none>           <none>
kube-scheduler-controller                  1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>

root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system:

   > root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system
Name:                 calico-node-ppczn
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 volatile/130.239.41.206
Start Time:           Mon, 04 Jan 2021 13:01:36 +0000
Labels:               controller-revision-hash=89c447898
                      k8s-app=calico-node
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   130.239.41.206
IPs:
  IP:           130.239.41.206
Controlled By:  DaemonSet/calico-node
Init Containers:
  upgrade-ipam:
    Container ID:  docker://27f988847a484c5f74e000c4b8f473895b71ed49f27e0bf4fab4b425940951dc
    Image:         docker.io/calico/cni:v3.17.1
    Image ID:      docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/calico-ipam
      -upgrade
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 04 Jan 2021 13:01:37 +0000
      Finished:     Mon, 04 Jan 2021 13:01:38 +0000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      KUBERNETES_NODE_NAME:        (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:  <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/lib/cni/networks from host-local-net-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
  install-cni:
    Container ID:  docker://5629f6984cfe545864d187112a0c1f65e7bdb7dbfae9b4971579f420ab55b77b
    Image:         docker.io/calico/cni:v3.17.1
    Image ID:      docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 04 Jan 2021 13:01:39 +0000
      Finished:     Mon, 04 Jan 2021 13:01:41 +0000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:         10-calico.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
      CNI_MTU:               <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      SLEEP:                 false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
  flexvol-driver:
    Container ID:   docker://3a4bf307a347926893aeb956717d84049af601fd4cc4aa7add6e182c85dc4e7c
    Image:          docker.io/calico/pod2daemon-flexvol:v3.17.1
    Image ID:       docker-pullable://calico/pod2daemon-flexvol@sha256:48f277d41c35dae051d7dd6f0ec8f64ac7ee6650e27102a41b0203a0c2ce6c6b
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 04 Jan 2021 13:01:43 +0000
      Finished:     Mon, 04 Jan 2021 13:01:43 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /host/driver from flexvol-driver-host (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Containers:
  calico-node:
    Container ID:   docker://2576b2426c2a3fc4b6a972839a94872160c7ac5efa5b1159817be8d4ad4ddf60
    Image:          docker.io/calico/node:v3.17.1
    Image ID:       docker-pullable://calico/node@sha256:25e0b0495c0df3a7a06b6f9e92203c53e5b56c143ac1c885885ee84bf86285ff
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Mon, 04 Jan 2021 13:18:48 +0000
      Finished:     Mon, 04 Jan 2021 13:19:57 +0000
    Ready:          False
    Restart Count:  9
    Requests:
      cpu:      250m
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      DATASTORE_TYPE:                     kubernetes
      WAIT_FOR_DATASTORE:                 true
      NODENAME:                            (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       k8s,bgp
      IP:                                 autodetect
      CALICO_IPV4POOL_IPIP:               Always
      CALICO_IPV4POOL_VXLAN:              Never
      FELIX_IPINIPMTU:                    <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_VXLANMTU:                     <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_WIREGUARDMTU:                 <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_LOGSEVERITYSCREEN:            info
      FELIX_HEALTHENABLED:                true
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/ from sysfs (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/log/calico/cni from cni-log-dir (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:  
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:  
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  sysfs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/
    HostPathType:  DirectoryOrCreate
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
  cni-log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/calico/cni
    HostPathType:  
  host-local-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks
    HostPathType:  
  policysync:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/nodeagent
    HostPathType:  DirectoryOrCreate
  flexvol-driver-host:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
    HostPathType:  DirectoryOrCreate
  calico-node-token-8r94c:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-node-token-8r94c
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     :NoSchedule op=Exists
                 :NoExecute op=Exists
                 CriticalAddonsOnly op=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Normal   Scheduled         22m                  default-scheduler  Successfully assigned kube-system/calico-node-ppczn to volatile
  Normal   Pulled            22m                  kubelet            Container image "docker.io/calico/cni:v3.17.1" already present on machine
  Normal   Created           22m                  kubelet            Created container upgrade-ipam
  Normal   Started           22m                  kubelet            Started container upgrade-ipam
  Normal   Pulled            21m                  kubelet            Container image "docker.io/calico/cni:v3.17.1" already present on machine
  Normal   Started           21m                  kubelet            Started container install-cni
  Normal   Created           21m                  kubelet            Created container install-cni
  Normal   Pulled            21m                  kubelet            Container image "docker.io/calico/pod2daemon-flexvol:v3.17.1" already present on machine
  Normal   Created           21m                  kubelet            Created container flexvol-driver
  Normal   Started           21m                  kubelet            Started container flexvol-driver
  Normal   Pulled            21m                  kubelet            Container image "docker.io/calico/node:v3.17.1" already present on machine
  Normal   Created           21m                  kubelet            Created container calico-node
  Normal   Started           21m                  kubelet            Started container calico-node
  Warning  Unhealthy         21m (x2 over 21m)    kubelet            Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused
  Warning  Unhealthy         11m (x51 over 21m)   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
  Warning  DNSConfigForming  115s (x78 over 22m)  kubelet            Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 130.239.40.2 130.239.40.3 2001:6b0:e:4040::2

registros calico-node-ppczn:

> root@controller:~# kubectl logs  calico-node-ppczn -n kube-system
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 379: Early log level set to info
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 395: Using NODENAME environment for node name
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 407: Determined node name: volatile
2021-01-04 13:17:38.011 [INFO][8] startup/startup.go 439: Checking datastore connection
2021-01-04 13:18:08.011 [INFO][8] startup/startup.go 454: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout

EN la máquina local:

 > root@volatile:~# docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS               NAMES
39efaf54f558        k8s.gcr.io/pause:3.2   "/pause"                 19 minutes ago      Up 19 minutes                           k8s_POD_calico-node-ppczn_kube-system_7e98eb90-f581-4dbc-b877-da25bc2868f9_0
05bd9fa182e5        e3f6fcd87756           "/usr/local/bin/kube…"   20 minutes ago      Up 20 minutes                           k8s_kube-proxy_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
ae11c77897b0        k8s.gcr.io/pause:3.2   "/pause"                 20 minutes ago      Up 20 minutes                           k8s_POD_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
> root@volatile:~# docker logs 39efaf54f558
> root@volatile:~# docker logs 05bd9fa182e5
I0104 13:00:51.131737       1 node.go:172] Successfully retrieved node IP: 130.239.41.206
I0104 13:00:51.132027       1 server_others.go:142] kube-proxy node IP is an IPv4 address (130.239.41.206), assume IPv4 operation
W0104 13:00:51.162536       1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy
I0104 13:00:51.162615       1 server_others.go:185] Using iptables Proxier.
I0104 13:00:51.162797       1 server.go:650] Version: v1.20.1
I0104 13:00:51.163080       1 conntrack.go:52] Setting nf_conntrack_max to 262144
I0104 13:00:51.163289       1 config.go:315] Starting service config controller
I0104 13:00:51.163300       1 config.go:224] Starting endpoint slice config controller
I0104 13:00:51.163304       1 shared_informer.go:240] Waiting for caches to sync for service config
I0104 13:00:51.163311       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0104 13:00:51.263469       1 shared_informer.go:247] Caches are synced for endpoint slice config 
I0104 13:00:51.263487       1 shared_informer.go:247] Caches are synced for service config 
> root@volatile:~# docker logs ae11c77897b0
root@volatile:~# ls /etc/cni/net.d/
10-calico.conflist  calico-kubeconfig
root@volatile:~# ls /var/lib/calico/
root@volatile:~# 

Respuesta1

En el host volatileparece que tienes cilio configurado en /etc/cni/net.d/*.conf. Es un complemento de red, uno de los muchos disponibles para Kubernetes. Uno de estos archivos probablemente contenga algo como:

{
    "name": "cilium",
    "type": "cilium-cni"
}

Si esto es accidental, elimine dicho archivo. Parece que ya estás ejecutando un complemento de red de la competencia de Project Calico, lo cual aparentemente es suficiente. Entonces, vuelva a crear los pod calico-kube-controllers en el espacio de nombres kube-system, déjelo funcionar y luego vuelva a crear otros pods.

Si tiene la intención de utilizar Cilium en ese host, regrese a laGuía de instalación de Cillium. Si lo vuelve a hacer, probablemente verá que /var/run/cilium/cilium.sock se ha creado para usted.

información relacionada