집에는 노드 5개, Google 컴퓨팅 엔진 VM 4개(컨트롤러 1개, 작업자 노드 3개), 베어메탈 로컬 머신 1개(kube 작업자 노드)가 있는 Kubernetes 클러스터가 있습니다. 클러스터가 실행 중이고 모든 노드가 준비 상태입니다.
- 다음을 기반으로 구성된 자체 관리형 클러스터:https://docs.projectcalico.org/getting-started/kubernetes/self-managed-public-cloud/gce
- 모든 IP(0.0.0.0/0) 및 모든 포트에 수신 및 수신에 대한 방화벽 규칙이 추가됩니다.
- 마스터 노드의 공용 IP에 대해 **--control-plane-endpoint IP:PORT ** 태그를 사용하여 kube 마스터 노드를 광고하고 이를 기반으로 작업자 노드에 가입합니다.
문제:애플리케이션을 배포할 때 문제가 발생합니다. GCE VM 작업자의 컨테이너가 올바르게 배포되는 동안 로컬 작업자 노드의 모든 Pod가 ContainerCreating 상태로 멈춰 있습니다. 이 설정의 문제점이 무엇인지, 이 문제를 어떻게 해결할 수 있는지 아는 사람이 있습니까?
- 이것은 내 포드 중 하나의 이벤트 출력입니다.kubect 설명 포드산출:
이벤트: social-network/home-timeline-redis-6f4c5d55fc-tql2l을 휘발성에 성공적으로 할당했습니다.
Warning FailedCreatePodSandBox 3m14s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "32b64e6efcaff6401b7b0a6936f005a00a53c19a2061b0a14906b8bc3a81bf20" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Warning FailedCreatePodSandBox 102s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1e95fa10d49abf5edc8693345256b91e88c31d1b6414761de80e6038cd7696a4" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Normal SandboxChanged 11s (x3 over 3m14s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 11s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8f5959966e4c25f94bd49b82e1fa6da33a114b1680eae8898ba6685f22e7d37f" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
업데이트
모든 노드에서 kubeadm을 재설정하고 cilium을 제거하고 calico cni를 다시 생성합니다. 또한 컨테이너를 CDIR
다음으로 변경했는데 sudo kubeadm init --pod-network-cidr=20.96.0.0/12 --control-plane-endpoint "34.89.7.120:6443"
호스트 CDIR과의 충돌이 해결된 것 같습니다. 그러나 Volatile(로컬 머신)의 Pod는 여전히 ContainerCreating으로 인해 멈춰 있습니다.
`
>root@controller:~# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-744cfdf676-bh2nc 1/1 Running 0 12m 20.109.133.129 worker-2 <none> <none>
calico-node-frv5r 1/1 Running 0 12m 10.240.0.11 controller <none> <none>
calico-node-lplx6 1/1 Running 0 12m 10.240.0.20 worker-0 <none> <none>
calico-node-lwrdr 1/1 Running 0 12m 10.240.0.21 worker-1 <none> <none>
calico-node-ppczn 0/1 CrashLoopBackOff 7 12m 130.239.41.206 volatile <none> <none>
calico-node-zplwx 1/1 Running 0 12m 10.240.0.22 worker-2 <none> <none>
coredns-74ff55c5b-69mn2 1/1 Running 0 14m 20.105.55.194 controller <none> <none>
coredns-74ff55c5b-djczf 1/1 Running 0 14m 20.105.55.193 controller <none> <none>
etcd-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-apiserver-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-controller-manager-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-proxy-5vzdf 1/1 Running 0 13m 10.240.0.20 worker-0 <none> <none>
kube-proxy-d22q4 1/1 Running 0 13m 10.240.0.22 worker-2 <none> <none>
kube-proxy-hml5c 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-proxy-hw8kl 1/1 Running 0 13m 10.240.0.21 worker-1 <none> <none>
kube-proxy-zb6t7 1/1 Running 0 13m 130.239.41.206 volatile <none> <none>
kube-scheduler-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system
:
> root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system
Name: calico-node-ppczn
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: volatile/130.239.41.206
Start Time: Mon, 04 Jan 2021 13:01:36 +0000
Labels: controller-revision-hash=89c447898
k8s-app=calico-node
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 130.239.41.206
IPs:
IP: 130.239.41.206
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://27f988847a484c5f74e000c4b8f473895b71ed49f27e0bf4fab4b425940951dc
Image: docker.io/calico/cni:v3.17.1
Image ID: docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Jan 2021 13:01:37 +0000
Finished: Mon, 04 Jan 2021 13:01:38 +0000
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
install-cni:
Container ID: docker://5629f6984cfe545864d187112a0c1f65e7bdb7dbfae9b4971579f420ab55b77b
Image: docker.io/calico/cni:v3.17.1
Image ID: docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/install
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Jan 2021 13:01:39 +0000
Finished: Mon, 04 Jan 2021 13:01:41 +0000
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
flexvol-driver:
Container ID: docker://3a4bf307a347926893aeb956717d84049af601fd4cc4aa7add6e182c85dc4e7c
Image: docker.io/calico/pod2daemon-flexvol:v3.17.1
Image ID: docker-pullable://calico/pod2daemon-flexvol@sha256:48f277d41c35dae051d7dd6f0ec8f64ac7ee6650e27102a41b0203a0c2ce6c6b
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Jan 2021 13:01:43 +0000
Finished: Mon, 04 Jan 2021 13:01:43 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Containers:
calico-node:
Container ID: docker://2576b2426c2a3fc4b6a972839a94872160c7ac5efa5b1159817be8d4ad4ddf60
Image: docker.io/calico/node:v3.17.1
Image ID: docker-pullable://calico/node@sha256:25e0b0495c0df3a7a06b6f9e92203c53e5b56c143ac1c885885ee84bf86285ff
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 04 Jan 2021 13:18:48 +0000
Finished: Mon, 04 Jan 2021 13:19:57 +0000
Ready: False
Restart Count: 9
Requests:
cpu: 250m
Liveness: exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
CALICO_IPV4POOL_VXLAN: Never
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_VXLANMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_WIREGUARDMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/sys/fs/ from sysfs (rw)
/var/lib/calico from var-lib-calico (rw)
/var/log/calico/cni from cni-log-dir (ro)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
sysfs:
Type: HostPath (bare host directory volume)
Path: /sys/fs/
HostPathType: DirectoryOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
cni-log-dir:
Type: HostPath (bare host directory volume)
Path: /var/log/calico/cni
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
HostPathType: DirectoryOrCreate
calico-node-token-8r94c:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-8r94c
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoSchedule op=Exists
:NoExecute op=Exists
CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22m default-scheduler Successfully assigned kube-system/calico-node-ppczn to volatile
Normal Pulled 22m kubelet Container image "docker.io/calico/cni:v3.17.1" already present on machine
Normal Created 22m kubelet Created container upgrade-ipam
Normal Started 22m kubelet Started container upgrade-ipam
Normal Pulled 21m kubelet Container image "docker.io/calico/cni:v3.17.1" already present on machine
Normal Started 21m kubelet Started container install-cni
Normal Created 21m kubelet Created container install-cni
Normal Pulled 21m kubelet Container image "docker.io/calico/pod2daemon-flexvol:v3.17.1" already present on machine
Normal Created 21m kubelet Created container flexvol-driver
Normal Started 21m kubelet Started container flexvol-driver
Normal Pulled 21m kubelet Container image "docker.io/calico/node:v3.17.1" already present on machine
Normal Created 21m kubelet Created container calico-node
Normal Started 21m kubelet Started container calico-node
Warning Unhealthy 21m (x2 over 21m) kubelet Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused
Warning Unhealthy 11m (x51 over 21m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
Warning DNSConfigForming 115s (x78 over 22m) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 130.239.40.2 130.239.40.3 2001:6b0:e:4040::2
calico-node-ppczn 로그:
> root@controller:~# kubectl logs calico-node-ppczn -n kube-system
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 379: Early log level set to info
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 395: Using NODENAME environment for node name
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 407: Determined node name: volatile
2021-01-04 13:17:38.011 [INFO][8] startup/startup.go 439: Checking datastore connection
2021-01-04 13:18:08.011 [INFO][8] startup/startup.go 454: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout
로컬 머신에서:
> root@volatile:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
39efaf54f558 k8s.gcr.io/pause:3.2 "/pause" 19 minutes ago Up 19 minutes k8s_POD_calico-node-ppczn_kube-system_7e98eb90-f581-4dbc-b877-da25bc2868f9_0
05bd9fa182e5 e3f6fcd87756 "/usr/local/bin/kube…" 20 minutes ago Up 20 minutes k8s_kube-proxy_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
ae11c77897b0 k8s.gcr.io/pause:3.2 "/pause" 20 minutes ago Up 20 minutes k8s_POD_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
> root@volatile:~# docker logs 39efaf54f558
> root@volatile:~# docker logs 05bd9fa182e5
I0104 13:00:51.131737 1 node.go:172] Successfully retrieved node IP: 130.239.41.206
I0104 13:00:51.132027 1 server_others.go:142] kube-proxy node IP is an IPv4 address (130.239.41.206), assume IPv4 operation
W0104 13:00:51.162536 1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy
I0104 13:00:51.162615 1 server_others.go:185] Using iptables Proxier.
I0104 13:00:51.162797 1 server.go:650] Version: v1.20.1
I0104 13:00:51.163080 1 conntrack.go:52] Setting nf_conntrack_max to 262144
I0104 13:00:51.163289 1 config.go:315] Starting service config controller
I0104 13:00:51.163300 1 config.go:224] Starting endpoint slice config controller
I0104 13:00:51.163304 1 shared_informer.go:240] Waiting for caches to sync for service config
I0104 13:00:51.163311 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0104 13:00:51.263469 1 shared_informer.go:247] Caches are synced for endpoint slice config
I0104 13:00:51.263487 1 shared_informer.go:247] Caches are synced for service config
> root@volatile:~# docker logs ae11c77897b0
root@volatile:~# ls /etc/cni/net.d/
10-calico.conflist calico-kubeconfig
root@volatile:~# ls /var/lib/calico/
root@volatile:~#
답변1
호스트에서는 volatile
cilium이 구성되어 있는 것으로 보입니다 /etc/cni/net.d/*.conf
. 이는 Kubernetes에 사용할 수 있는 많은 플러그인 중 하나인 네트워킹 플러그인입니다. 이러한 파일 중 하나에는 다음과 같은 내용이 포함될 수 있습니다.
{
"name": "cilium",
"type": "cilium-cni"
}
실수로 발생한 경우 해당 파일을 제거하십시오. 이미 Project Calico의 경쟁 네트워킹 플러그인을 실행하고 있는 것 같습니다. 이것만으로도 충분해 보입니다. 따라서 네임스페이스에서 calico-kube-controllers 포드를 다시 생성하고 kube-system
성공하도록 한 다음 다른 포드를 다시 생성합니다.
해당 호스트에서 Cilium을 사용하려는 경우Cillium 설치 가이드. 다시 실행하면 아마도 /var/run/cilium/cilium.sock이 생성된 것을 볼 수 있을 것입니다.