
Ich habe Rancher noch nie installiert, versuche aber, eine Rancher-Umgebung auf einem lokalen HA RKE2-Cluster einzurichten. Ich habe einen F5 als Lastenausgleich und er ist so eingerichtet, dass er die Ports 80, 443, 6443 und 9345 handhaben kann. Ein DNS-Eintrag namens rancher-demo.localdomain.local verweist auf die IP-Adresse des Lastenausgleichs. Ich möchte meine eigenen Zertifikatsdateien bereitstellen und habe ein solches Zertifikat über unsere interne Zertifizierungsstelle erstellt.
Der Cluster selbst wurde betriebsbereit gemacht und funktioniert. Als ich die Installation auf den anderen Knoten als dem ersten ausführte, verwendeten sie den DNS-Namen, der auf die LB-IP verweist, sodass ich weiß, dass dieser Teil des LB funktioniert.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
rancher0001.localdomain.local Ready control-plane,etcd,master 25h v1.26.12+rke2r1
rancher0002.localdomain.local Ready control-plane,etcd,master 25h v1.26.12+rke2r1
rancher0003.localdomain.local Ready control-plane,etcd,master 25h v1.26.12+rke2r1
Vor der Installation von Rancher habe ich die folgenden Befehle ausgeführt:
kubectl create namespace cattle-system
kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=~/tls.crt --key=~/tls.key
kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem=~/cacerts.pem
Schließlich habe ich Rancher installiert:
helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=rancher-demo.localdomain.local --set bootstrapPassword=passwordgoeshere --set ingress.tls.source=secret --set privateCA=true
Ich erinnere mich nicht an den Fehler, aber kurz nach der Installation wurde ein Timeout-Fehler angezeigt. Es hat definitiv *einige* der Installationsvorgänge ausgelöst:
kubectl -n cattle-system rollout status deploy/rancher
deployment "rancher" successfully rolled out
kubectl get ns
NAME STATUS AGE
cattle-fleet-clusters-system Active 5h18m
cattle-fleet-system Active 5h24m
cattle-global-data Active 5h25m
cattle-global-nt Active 5h25m
cattle-impersonation-system Active 5h24m
cattle-provisioning-capi-system Active 5h6m
cattle-system Active 5h29m
cluster-fleet-local-local-1a3d67d0a899 Active 5h18m
default Active 25h
fleet-default Active 5h25m
fleet-local Active 5h26m
kube-node-lease Active 25h
kube-public Active 25h
kube-system Active 25h
local Active 5h25m
p-c94zp Active 5h24m
p-m64sb Active 5h24m
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-fleet-system fleet-controller-56968b86b6-6xdng 1/1 Running 0 5h19m
cattle-fleet-system gitjob-7d68454468-tvcrt 1/1 Running 0 5h19m
cattle-system rancher-64bdc898c7-56fpm 1/1 Running 0 5h27m
cattle-system rancher-64bdc898c7-dl4cz 1/1 Running 0 5h27m
cattle-system rancher-64bdc898c7-z55lh 1/1 Running 1 (5h25m ago) 5h27m
cattle-system rancher-webhook-58d68fb97d-zpg2p 1/1 Running 0 5h17m
kube-system cloud-controller-manager-rancher0001.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system cloud-controller-manager-rancher0002.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system cloud-controller-manager-rancher0003.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system etcd-rancher0001.localdomain.local 1/1 Running 0 25h
kube-system etcd-rancher0002.localdomain.local 1/1 Running 3 (22h ago) 25h
kube-system etcd-rancher0003.localdomain.local 1/1 Running 3 (22h ago) 25h
kube-system kube-apiserver-rancher0001.localdomain.local 1/1 Running 0 25h
kube-system kube-apiserver-rancher0002.localdomain.local 1/1 Running 0 25h
kube-system kube-apiserver-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system kube-controller-manager-rancher0001.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system kube-controller-manager-rancher0002.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system kube-controller-manager-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system kube-proxy-rancher0001.localdomain.local 1/1 Running 0 25h
kube-system kube-proxy-rancher0002.localdomain.local 1/1 Running 0 25h
kube-system kube-proxy-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system kube-scheduler-rancher0001.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system kube-scheduler-rancher0002.localdomain.local 1/1 Running 0 25h
kube-system kube-scheduler-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system rke2-canal-2jngw 2/2 Running 0 25h
kube-system rke2-canal-6qrc4 2/2 Running 0 25h
kube-system rke2-canal-bk2f8 2/2 Running 0 25h
kube-system rke2-coredns-rke2-coredns-565dfc7d75-87pjr 1/1 Running 0 25h
kube-system rke2-coredns-rke2-coredns-565dfc7d75-wh64f 1/1 Running 0 25h
kube-system rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-mlcln 1/1 Running 0 25h
kube-system rke2-ingress-nginx-controller-6p8ll 1/1 Running 0 22h
kube-system rke2-ingress-nginx-controller-7pm5c 1/1 Running 0 5h22m
kube-system rke2-ingress-nginx-controller-brfwh 1/1 Running 0 22h
kube-system rke2-metrics-server-c9c78bd66-f5vrb 1/1 Running 0 25h
kube-system rke2-snapshot-controller-6f7bbb497d-vqg9s 1/1 Running 0 22h
kube-system rke2-snapshot-validation-webhook-65b5675d5c-dt22h 1/1 Running 0 22h
Allerdings ist es offensichtlich (angesichts der 404 Not Found-Seite, wenn ich zuhttps://rancher-demo.localdomain.local) Dinge funktionieren nicht richtig.
Ich habe das noch nie zuvor eingerichtet und bin mir daher nicht sicher, wie ich das Problem beheben kann. Ich habe Stunden damit verbracht, verschiedene Beiträge durchzuforsten, aber nichts, was ich gefunden habe, scheint zu diesem speziellen Problem zu passen.
Einige Dinge, die ich gefunden habe:
kubectl -n cattle-system logs -f rancher-64bdc898c7-56fpm
2024/01/17 21:13:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
(repeats every 15 seconds)
kubectl get ingress --all-namespaces
No resources found
(I *know* there was an ingress at some point, I believe in cattle-system; now it's gone. I didn't remove it.)
kubectl -n cattle-system describe service rancher
Name: rancher
Namespace: cattle-system
Labels: app=rancher
app.kubernetes.io/managed-by=Helm
chart=rancher-2.7.9
heritage=Helm
release=rancher
Annotations: meta.helm.sh/release-name: rancher
meta.helm.sh/release-namespace: cattle-system
Selector: app=rancher
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.43.199.3
IPs: 10.43.199.3
Port: http 80/TCP
TargetPort: 80/TCP
Endpoints: 10.42.0.26:80,10.42.1.22:80,10.42.1.23:80
Port: https-internal 443/TCP
TargetPort: 444/TCP
Endpoints: 10.42.0.26:444,10.42.1.22:444,10.42.1.23:444
Session Affinity: None
Events: <none>
kubectl -n cattle-system logs -l app=rancher
2024/01/17 21:17:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:17:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:40 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:45.551484 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.646038 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] [updateClusterHealth] Failed to update cluster [local]: Internal error occurred: failed calling webhook "rancher.cattle.io.clusters.management.cattle.io": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/clusters.management.cattle.io?timeout=10s": context deadline exceeded
E0117 21:19:52.882877 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:53.061671 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:53 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.23/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.23:443: i/o timeout
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:37.826713 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:37.918579 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:37 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
E0117 21:19:45.604537 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.713901 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.22]: dial tcp 10.42.0.26:443: i/o timeout
E0117 21:19:52.899035 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:52.968048 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:52 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
Ich bin sicher, dass ich etwas falsch gemacht habe, aber ich weiß nicht, was und wie ich das Problem weiter beheben kann.