
이전에 Rancher를 설치한 적이 없지만 온프레미스 HA RKE2 클러스터에 Rancher 환경을 설정하려고 합니다. 로드 밸런서로 F5가 있고 포트 80, 443, 6443 및 9345를 처리하도록 설정되어 있습니다. rancher-demo.localdomain.local이라는 DNS 레코드는 로드 밸런서의 IP 주소를 가리킵니다. 내 자체 인증서 파일을 제공하고 내부 CA를 통해 해당 인증서를 만들었습니다.
클러스터 자체가 작동되고 작동합니다. 첫 번째 노드가 아닌 다른 노드에서 설치를 실행했을 때 LB IP를 가리키는 DNS 이름을 사용했기 때문에 LB의 일부가 작동한다는 것을 알 수 있습니다.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
rancher0001.localdomain.local Ready control-plane,etcd,master 25h v1.26.12+rke2r1
rancher0002.localdomain.local Ready control-plane,etcd,master 25h v1.26.12+rke2r1
rancher0003.localdomain.local Ready control-plane,etcd,master 25h v1.26.12+rke2r1
Rancher를 설치하기 전에 다음 명령을 실행했습니다.
kubectl create namespace cattle-system
kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=~/tls.crt --key=~/tls.key
kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem=~/cacerts.pem
마지막으로 Rancher를 설치했습니다.
helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=rancher-demo.localdomain.local --set bootstrapPassword=passwordgoeshere --set ingress.tls.source=secret --set privateCA=true
오류는 기억나지 않지만 설치를 실행한 후 곧 시간 초과 오류가 표시되었습니다. 확실히 *일부* 설치가 수행되었습니다.
kubectl -n cattle-system rollout status deploy/rancher
deployment "rancher" successfully rolled out
kubectl get ns
NAME STATUS AGE
cattle-fleet-clusters-system Active 5h18m
cattle-fleet-system Active 5h24m
cattle-global-data Active 5h25m
cattle-global-nt Active 5h25m
cattle-impersonation-system Active 5h24m
cattle-provisioning-capi-system Active 5h6m
cattle-system Active 5h29m
cluster-fleet-local-local-1a3d67d0a899 Active 5h18m
default Active 25h
fleet-default Active 5h25m
fleet-local Active 5h26m
kube-node-lease Active 25h
kube-public Active 25h
kube-system Active 25h
local Active 5h25m
p-c94zp Active 5h24m
p-m64sb Active 5h24m
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-fleet-system fleet-controller-56968b86b6-6xdng 1/1 Running 0 5h19m
cattle-fleet-system gitjob-7d68454468-tvcrt 1/1 Running 0 5h19m
cattle-system rancher-64bdc898c7-56fpm 1/1 Running 0 5h27m
cattle-system rancher-64bdc898c7-dl4cz 1/1 Running 0 5h27m
cattle-system rancher-64bdc898c7-z55lh 1/1 Running 1 (5h25m ago) 5h27m
cattle-system rancher-webhook-58d68fb97d-zpg2p 1/1 Running 0 5h17m
kube-system cloud-controller-manager-rancher0001.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system cloud-controller-manager-rancher0002.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system cloud-controller-manager-rancher0003.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system etcd-rancher0001.localdomain.local 1/1 Running 0 25h
kube-system etcd-rancher0002.localdomain.local 1/1 Running 3 (22h ago) 25h
kube-system etcd-rancher0003.localdomain.local 1/1 Running 3 (22h ago) 25h
kube-system kube-apiserver-rancher0001.localdomain.local 1/1 Running 0 25h
kube-system kube-apiserver-rancher0002.localdomain.local 1/1 Running 0 25h
kube-system kube-apiserver-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system kube-controller-manager-rancher0001.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system kube-controller-manager-rancher0002.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system kube-controller-manager-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system kube-proxy-rancher0001.localdomain.local 1/1 Running 0 25h
kube-system kube-proxy-rancher0002.localdomain.local 1/1 Running 0 25h
kube-system kube-proxy-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system kube-scheduler-rancher0001.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system kube-scheduler-rancher0002.localdomain.local 1/1 Running 0 25h
kube-system kube-scheduler-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system rke2-canal-2jngw 2/2 Running 0 25h
kube-system rke2-canal-6qrc4 2/2 Running 0 25h
kube-system rke2-canal-bk2f8 2/2 Running 0 25h
kube-system rke2-coredns-rke2-coredns-565dfc7d75-87pjr 1/1 Running 0 25h
kube-system rke2-coredns-rke2-coredns-565dfc7d75-wh64f 1/1 Running 0 25h
kube-system rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-mlcln 1/1 Running 0 25h
kube-system rke2-ingress-nginx-controller-6p8ll 1/1 Running 0 22h
kube-system rke2-ingress-nginx-controller-7pm5c 1/1 Running 0 5h22m
kube-system rke2-ingress-nginx-controller-brfwh 1/1 Running 0 22h
kube-system rke2-metrics-server-c9c78bd66-f5vrb 1/1 Running 0 25h
kube-system rke2-snapshot-controller-6f7bbb497d-vqg9s 1/1 Running 0 22h
kube-system rke2-snapshot-validation-webhook-65b5675d5c-dt22h 1/1 Running 0 22h
그러나 분명히(내가https://rancher-demo.localdomain.local) 일이 제대로 작동하지 않습니다.
이전에 이 설정을 한 번도 해본 적이 없어서 이 문제를 해결하는 방법을 잘 모르겠습니다. 나는 다양한 게시물을 검색하는 데 몇 시간을 보냈지만 내가 찾은 어떤 것도 이 특정 문제와 일치하지 않는 것 같습니다.
내가 찾은 몇 가지:
kubectl -n cattle-system logs -f rancher-64bdc898c7-56fpm
2024/01/17 21:13:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
(repeats every 15 seconds)
kubectl get ingress --all-namespaces
No resources found
(I *know* there was an ingress at some point, I believe in cattle-system; now it's gone. I didn't remove it.)
kubectl -n cattle-system describe service rancher
Name: rancher
Namespace: cattle-system
Labels: app=rancher
app.kubernetes.io/managed-by=Helm
chart=rancher-2.7.9
heritage=Helm
release=rancher
Annotations: meta.helm.sh/release-name: rancher
meta.helm.sh/release-namespace: cattle-system
Selector: app=rancher
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.43.199.3
IPs: 10.43.199.3
Port: http 80/TCP
TargetPort: 80/TCP
Endpoints: 10.42.0.26:80,10.42.1.22:80,10.42.1.23:80
Port: https-internal 443/TCP
TargetPort: 444/TCP
Endpoints: 10.42.0.26:444,10.42.1.22:444,10.42.1.23:444
Session Affinity: None
Events: <none>
kubectl -n cattle-system logs -l app=rancher
2024/01/17 21:17:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:17:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:40 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:45.551484 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.646038 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] [updateClusterHealth] Failed to update cluster [local]: Internal error occurred: failed calling webhook "rancher.cattle.io.clusters.management.cattle.io": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/clusters.management.cattle.io?timeout=10s": context deadline exceeded
E0117 21:19:52.882877 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:53.061671 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:53 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.23/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.23:443: i/o timeout
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:37.826713 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:37.918579 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:37 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
E0117 21:19:45.604537 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.713901 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.22]: dial tcp 10.42.0.26:443: i/o timeout
E0117 21:19:52.899035 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:52.968048 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:52 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
내가 뭔가 잘못했다고 확신하지만 무엇을 잘못했는지도 모르고 이 문제를 더 이상 해결하는 방법도 모르겠습니다.