외부 kube 노드가 있는 Google Compute Engine(GCE)의 자체 관리형 Kubernetes 클러스터에서 컨테이너 생성 상태로 포드가 멈춤

외부 kube 노드가 있는 Google Compute Engine(GCE)의 자체 관리형 Kubernetes 클러스터에서 컨테이너 생성 상태로 포드가 멈춤

집에는 노드 5개, Google 컴퓨팅 엔진 VM 4개(컨트롤러 1개, 작업자 노드 3개), 베어메탈 로컬 머신 1개(kube 작업자 노드)가 있는 Kubernetes 클러스터가 있습니다. 클러스터가 실행 중이고 모든 노드가 준비 상태입니다.

  1. 다음을 기반으로 구성된 자체 관리형 클러스터:https://docs.projectcalico.org/getting-started/kubernetes/self-managed-public-cloud/gce
  2. 모든 IP(0.0.0.0/0) 및 모든 포트에 수신 및 수신에 대한 방화벽 규칙이 추가됩니다.
  3. 마스터 노드의 공용 IP에 대해 **--control-plane-endpoint IP:PORT ** 태그를 사용하여 kube 마스터 노드를 광고하고 이를 기반으로 작업자 노드에 가입합니다.

문제:애플리케이션을 배포할 때 문제가 발생합니다. GCE VM 작업자의 컨테이너가 올바르게 배포되는 동안 로컬 작업자 노드의 모든 Pod가 ContainerCreating 상태로 멈춰 있습니다. 이 설정의 문제점이 무엇인지, 이 문제를 어떻게 해결할 수 있는지 아는 사람이 있습니까?

  • 이것은 내 포드 중 하나의 이벤트 출력입니다.kubect 설명 포드산출:

이벤트: social-network/home-timeline-redis-6f4c5d55fc-tql2l을 휘발성에 성공적으로 할당했습니다.

Warning  FailedCreatePodSandBox  3m14s  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "32b64e6efcaff6401b7b0a6936f005a00a53c19a2061b0a14906b8bc3a81bf20" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
  
Warning  FailedCreatePodSandBox  102s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1e95fa10d49abf5edc8693345256b91e88c31d1b6414761de80e6038cd7696a4" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
  
Normal   SandboxChanged          11s (x3 over 3m14s)  kubelet  Pod sandbox changed, it will be killed and re-created.
 
Warning  FailedCreatePodSandBox  11s                  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8f5959966e4c25f94bd49b82e1fa6da33a114b1680eae8898ba6685f22e7d37f" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?

업데이트

모든 노드에서 kubeadm을 재설정하고 cilium을 제거하고 calico cni를 다시 생성합니다. 또한 컨테이너를 CDIR다음으로 변경했는데 sudo kubeadm init --pod-network-cidr=20.96.0.0/12 --control-plane-endpoint "34.89.7.120:6443"호스트 CDIR과의 충돌이 해결된 것 같습니다. 그러나 Volatile(로컬 머신)의 Pod는 여전히 ContainerCreating으로 인해 멈춰 있습니다.

`

>root@controller:~# kubectl get pods -n kube-system -o wide
NAME                                       READY   STATUS             RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
calico-kube-controllers-744cfdf676-bh2nc   1/1     Running            0          12m   20.109.133.129   worker-2     <none>           <none>
calico-node-frv5r                          1/1     Running            0          12m   10.240.0.11      controller   <none>           <none>
calico-node-lplx6                          1/1     Running            0          12m   10.240.0.20      worker-0     <none>           <none>
calico-node-lwrdr                          1/1     Running            0          12m   10.240.0.21      worker-1     <none>           <none>
calico-node-ppczn                          0/1     CrashLoopBackOff   7          12m   130.239.41.206   volatile     <none>           <none>
calico-node-zplwx                          1/1     Running            0          12m   10.240.0.22      worker-2     <none>           <none>
coredns-74ff55c5b-69mn2                    1/1     Running            0          14m   20.105.55.194    controller   <none>           <none>
coredns-74ff55c5b-djczf                    1/1     Running            0          14m   20.105.55.193    controller   <none>           <none>
etcd-controller                            1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-apiserver-controller                  1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-controller-manager-controller         1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-proxy-5vzdf                           1/1     Running            0          13m   10.240.0.20      worker-0     <none>           <none>
kube-proxy-d22q4                           1/1     Running            0          13m   10.240.0.22      worker-2     <none>           <none>
kube-proxy-hml5c                           1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-proxy-hw8kl                           1/1     Running            0          13m   10.240.0.21      worker-1     <none>           <none>
kube-proxy-zb6t7                           1/1     Running            0          13m   130.239.41.206   volatile     <none>           <none>
kube-scheduler-controller                  1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>

root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system:

   > root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system
Name:                 calico-node-ppczn
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 volatile/130.239.41.206
Start Time:           Mon, 04 Jan 2021 13:01:36 +0000
Labels:               controller-revision-hash=89c447898
                      k8s-app=calico-node
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   130.239.41.206
IPs:
  IP:           130.239.41.206
Controlled By:  DaemonSet/calico-node
Init Containers:
  upgrade-ipam:
    Container ID:  docker://27f988847a484c5f74e000c4b8f473895b71ed49f27e0bf4fab4b425940951dc
    Image:         docker.io/calico/cni:v3.17.1
    Image ID:      docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/calico-ipam
      -upgrade
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 04 Jan 2021 13:01:37 +0000
      Finished:     Mon, 04 Jan 2021 13:01:38 +0000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      KUBERNETES_NODE_NAME:        (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:  <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/lib/cni/networks from host-local-net-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
  install-cni:
    Container ID:  docker://5629f6984cfe545864d187112a0c1f65e7bdb7dbfae9b4971579f420ab55b77b
    Image:         docker.io/calico/cni:v3.17.1
    Image ID:      docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 04 Jan 2021 13:01:39 +0000
      Finished:     Mon, 04 Jan 2021 13:01:41 +0000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:         10-calico.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
      CNI_MTU:               <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      SLEEP:                 false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
  flexvol-driver:
    Container ID:   docker://3a4bf307a347926893aeb956717d84049af601fd4cc4aa7add6e182c85dc4e7c
    Image:          docker.io/calico/pod2daemon-flexvol:v3.17.1
    Image ID:       docker-pullable://calico/pod2daemon-flexvol@sha256:48f277d41c35dae051d7dd6f0ec8f64ac7ee6650e27102a41b0203a0c2ce6c6b
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 04 Jan 2021 13:01:43 +0000
      Finished:     Mon, 04 Jan 2021 13:01:43 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /host/driver from flexvol-driver-host (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Containers:
  calico-node:
    Container ID:   docker://2576b2426c2a3fc4b6a972839a94872160c7ac5efa5b1159817be8d4ad4ddf60
    Image:          docker.io/calico/node:v3.17.1
    Image ID:       docker-pullable://calico/node@sha256:25e0b0495c0df3a7a06b6f9e92203c53e5b56c143ac1c885885ee84bf86285ff
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Mon, 04 Jan 2021 13:18:48 +0000
      Finished:     Mon, 04 Jan 2021 13:19:57 +0000
    Ready:          False
    Restart Count:  9
    Requests:
      cpu:      250m
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      DATASTORE_TYPE:                     kubernetes
      WAIT_FOR_DATASTORE:                 true
      NODENAME:                            (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       k8s,bgp
      IP:                                 autodetect
      CALICO_IPV4POOL_IPIP:               Always
      CALICO_IPV4POOL_VXLAN:              Never
      FELIX_IPINIPMTU:                    <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_VXLANMTU:                     <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_WIREGUARDMTU:                 <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_LOGSEVERITYSCREEN:            info
      FELIX_HEALTHENABLED:                true
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/ from sysfs (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/log/calico/cni from cni-log-dir (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:  
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:  
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  sysfs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/
    HostPathType:  DirectoryOrCreate
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
  cni-log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/calico/cni
    HostPathType:  
  host-local-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks
    HostPathType:  
  policysync:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/nodeagent
    HostPathType:  DirectoryOrCreate
  flexvol-driver-host:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
    HostPathType:  DirectoryOrCreate
  calico-node-token-8r94c:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-node-token-8r94c
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     :NoSchedule op=Exists
                 :NoExecute op=Exists
                 CriticalAddonsOnly op=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Normal   Scheduled         22m                  default-scheduler  Successfully assigned kube-system/calico-node-ppczn to volatile
  Normal   Pulled            22m                  kubelet            Container image "docker.io/calico/cni:v3.17.1" already present on machine
  Normal   Created           22m                  kubelet            Created container upgrade-ipam
  Normal   Started           22m                  kubelet            Started container upgrade-ipam
  Normal   Pulled            21m                  kubelet            Container image "docker.io/calico/cni:v3.17.1" already present on machine
  Normal   Started           21m                  kubelet            Started container install-cni
  Normal   Created           21m                  kubelet            Created container install-cni
  Normal   Pulled            21m                  kubelet            Container image "docker.io/calico/pod2daemon-flexvol:v3.17.1" already present on machine
  Normal   Created           21m                  kubelet            Created container flexvol-driver
  Normal   Started           21m                  kubelet            Started container flexvol-driver
  Normal   Pulled            21m                  kubelet            Container image "docker.io/calico/node:v3.17.1" already present on machine
  Normal   Created           21m                  kubelet            Created container calico-node
  Normal   Started           21m                  kubelet            Started container calico-node
  Warning  Unhealthy         21m (x2 over 21m)    kubelet            Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused
  Warning  Unhealthy         11m (x51 over 21m)   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
  Warning  DNSConfigForming  115s (x78 over 22m)  kubelet            Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 130.239.40.2 130.239.40.3 2001:6b0:e:4040::2

calico-node-ppczn 로그:

> root@controller:~# kubectl logs  calico-node-ppczn -n kube-system
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 379: Early log level set to info
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 395: Using NODENAME environment for node name
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 407: Determined node name: volatile
2021-01-04 13:17:38.011 [INFO][8] startup/startup.go 439: Checking datastore connection
2021-01-04 13:18:08.011 [INFO][8] startup/startup.go 454: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout

로컬 머신에서:

 > root@volatile:~# docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS               NAMES
39efaf54f558        k8s.gcr.io/pause:3.2   "/pause"                 19 minutes ago      Up 19 minutes                           k8s_POD_calico-node-ppczn_kube-system_7e98eb90-f581-4dbc-b877-da25bc2868f9_0
05bd9fa182e5        e3f6fcd87756           "/usr/local/bin/kube…"   20 minutes ago      Up 20 minutes                           k8s_kube-proxy_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
ae11c77897b0        k8s.gcr.io/pause:3.2   "/pause"                 20 minutes ago      Up 20 minutes                           k8s_POD_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
> root@volatile:~# docker logs 39efaf54f558
> root@volatile:~# docker logs 05bd9fa182e5
I0104 13:00:51.131737       1 node.go:172] Successfully retrieved node IP: 130.239.41.206
I0104 13:00:51.132027       1 server_others.go:142] kube-proxy node IP is an IPv4 address (130.239.41.206), assume IPv4 operation
W0104 13:00:51.162536       1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy
I0104 13:00:51.162615       1 server_others.go:185] Using iptables Proxier.
I0104 13:00:51.162797       1 server.go:650] Version: v1.20.1
I0104 13:00:51.163080       1 conntrack.go:52] Setting nf_conntrack_max to 262144
I0104 13:00:51.163289       1 config.go:315] Starting service config controller
I0104 13:00:51.163300       1 config.go:224] Starting endpoint slice config controller
I0104 13:00:51.163304       1 shared_informer.go:240] Waiting for caches to sync for service config
I0104 13:00:51.163311       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0104 13:00:51.263469       1 shared_informer.go:247] Caches are synced for endpoint slice config 
I0104 13:00:51.263487       1 shared_informer.go:247] Caches are synced for service config 
> root@volatile:~# docker logs ae11c77897b0
root@volatile:~# ls /etc/cni/net.d/
10-calico.conflist  calico-kubeconfig
root@volatile:~# ls /var/lib/calico/
root@volatile:~# 

답변1

호스트에서는 volatilecilium이 구성되어 있는 것으로 보입니다 /etc/cni/net.d/*.conf. 이는 Kubernetes에 사용할 수 있는 많은 플러그인 중 하나인 네트워킹 플러그인입니다. 이러한 파일 중 하나에는 다음과 같은 내용이 포함될 수 있습니다.

{
    "name": "cilium",
    "type": "cilium-cni"
}

실수로 발생한 경우 해당 파일을 제거하십시오. 이미 Project Calico의 경쟁 네트워킹 플러그인을 실행하고 있는 것 같습니다. 이것만으로도 충분해 보입니다. 따라서 네임스페이스에서 calico-kube-controllers 포드를 다시 생성하고 kube-system성공하도록 한 다음 다른 포드를 다시 생성합니다.

해당 호스트에서 Cilium을 사용하려는 경우Cillium 설치 가이드. 다시 실행하면 아마도 /var/run/cilium/cilium.sock이 생성된 것을 볼 수 있을 것입니다.

관련 정보