Ich habe Probleme mit der Installation einer neuen Rancher HA RKE2 und bekomme die Seite „404 nicht gefunden“. Wie kann ich das Problem beheben?

2024-6-23 • tag-icon

Ich habe Probleme mit der Installation einer neuen Rancher HA RKE2 und bekomme die Seite „404 nicht gefunden“. Wie kann ich das Problem beheben?

Ich habe Rancher noch nie installiert, versuche aber, eine Rancher-Umgebung auf einem lokalen HA RKE2-Cluster einzurichten. Ich habe einen F5 als Lastenausgleich und er ist so eingerichtet, dass er die Ports 80, 443, 6443 und 9345 handhaben kann. Ein DNS-Eintrag namens rancher-demo.localdomain.local verweist auf die IP-Adresse des Lastenausgleichs. Ich möchte meine eigenen Zertifikatsdateien bereitstellen und habe ein solches Zertifikat über unsere interne Zertifizierungsstelle erstellt.

Der Cluster selbst wurde betriebsbereit gemacht und funktioniert. Als ich die Installation auf den anderen Knoten als dem ersten ausführte, verwendeten sie den DNS-Namen, der auf die LB-IP verweist, sodass ich weiß, dass dieser Teil des LB funktioniert.

kubectl get nodes

NAME                             STATUS   ROLES                       AGE   VERSION
rancher0001.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1
rancher0002.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1
rancher0003.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1

Vor der Installation von Rancher habe ich die folgenden Befehle ausgeführt:

kubectl create namespace cattle-system
kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=~/tls.crt --key=~/tls.key
kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem=~/cacerts.pem

Schließlich habe ich Rancher installiert:

helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=rancher-demo.localdomain.local --set bootstrapPassword=passwordgoeshere --set ingress.tls.source=secret --set privateCA=true

Ich erinnere mich nicht an den Fehler, aber kurz nach der Installation wurde ein Timeout-Fehler angezeigt. Es hat definitiv *einige* der Installationsvorgänge ausgelöst:

kubectl -n cattle-system rollout status deploy/rancher
deployment "rancher" successfully rolled out

kubectl get ns
NAME                                     STATUS   AGE
cattle-fleet-clusters-system             Active   5h18m
cattle-fleet-system                      Active   5h24m
cattle-global-data                       Active   5h25m
cattle-global-nt                         Active   5h25m
cattle-impersonation-system              Active   5h24m
cattle-provisioning-capi-system          Active   5h6m
cattle-system                            Active   5h29m
cluster-fleet-local-local-1a3d67d0a899   Active   5h18m
default                                  Active   25h
fleet-default                            Active   5h25m
fleet-local                              Active   5h26m
kube-node-lease                          Active   25h
kube-public                              Active   25h
kube-system                              Active   25h
local                                    Active   5h25m
p-c94zp                                  Active   5h24m
p-m64sb                                  Active   5h24m

kubectl get pods --all-namespaces
NAMESPACE             NAME                                                      READY   STATUS    RESTARTS        AGE
cattle-fleet-system   fleet-controller-56968b86b6-6xdng                         1/1     Running   0               5h19m
cattle-fleet-system   gitjob-7d68454468-tvcrt                                   1/1     Running   0               5h19m
cattle-system         rancher-64bdc898c7-56fpm                                  1/1     Running   0               5h27m
cattle-system         rancher-64bdc898c7-dl4cz                                  1/1     Running   0               5h27m
cattle-system         rancher-64bdc898c7-z55lh                                  1/1     Running   1 (5h25m ago)   5h27m
cattle-system         rancher-webhook-58d68fb97d-zpg2p                          1/1     Running   0               5h17m
kube-system           cloud-controller-manager-rancher0001.localdomain.local    1/1     Running   1 (22h ago)     25h
kube-system           cloud-controller-manager-rancher0002.localdomain.local    1/1     Running   1 (22h ago)     25h
kube-system           cloud-controller-manager-rancher0003.localdomain.local    1/1     Running   1 (22h ago)     25h
kube-system           etcd-rancher0001.localdomain.local                        1/1     Running   0               25h
kube-system           etcd-rancher0002.localdomain.local                        1/1     Running   3 (22h ago)     25h
kube-system           etcd-rancher0003.localdomain.local                        1/1     Running   3 (22h ago)     25h
kube-system           kube-apiserver-rancher0001.localdomain.local              1/1     Running   0               25h
kube-system           kube-apiserver-rancher0002.localdomain.local              1/1     Running   0               25h
kube-system           kube-apiserver-rancher0003.localdomain.local              1/1     Running   0               25h
kube-system           kube-controller-manager-rancher0001.localdomain.local     1/1     Running   1 (22h ago)     25h
kube-system           kube-controller-manager-rancher0002.localdomain.local     1/1     Running   1 (22h ago)     25h
kube-system           kube-controller-manager-rancher0003.localdomain.local     1/1     Running   0               25h
kube-system           kube-proxy-rancher0001.localdomain.local                  1/1     Running   0               25h
kube-system           kube-proxy-rancher0002.localdomain.local                  1/1     Running   0               25h
kube-system           kube-proxy-rancher0003.localdomain.local                  1/1     Running   0               25h
kube-system           kube-scheduler-rancher0001.localdomain.local              1/1     Running   1 (22h ago)     25h
kube-system           kube-scheduler-rancher0002.localdomain.local              1/1     Running   0               25h
kube-system           kube-scheduler-rancher0003.localdomain.local              1/1     Running   0               25h
kube-system           rke2-canal-2jngw                                          2/2     Running   0               25h
kube-system           rke2-canal-6qrc4                                          2/2     Running   0               25h
kube-system           rke2-canal-bk2f8                                          2/2     Running   0               25h
kube-system           rke2-coredns-rke2-coredns-565dfc7d75-87pjr                1/1     Running   0               25h
kube-system           rke2-coredns-rke2-coredns-565dfc7d75-wh64f                1/1     Running   0               25h
kube-system           rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-mlcln     1/1     Running   0               25h
kube-system           rke2-ingress-nginx-controller-6p8ll                       1/1     Running   0               22h
kube-system           rke2-ingress-nginx-controller-7pm5c                       1/1     Running   0               5h22m
kube-system           rke2-ingress-nginx-controller-brfwh                       1/1     Running   0               22h
kube-system           rke2-metrics-server-c9c78bd66-f5vrb                       1/1     Running   0               25h
kube-system           rke2-snapshot-controller-6f7bbb497d-vqg9s                 1/1     Running   0               22h
kube-system           rke2-snapshot-validation-webhook-65b5675d5c-dt22h         1/1     Running   0               22h

Allerdings ist es offensichtlich (angesichts der 404 Not Found-Seite, wenn ich zuhttps://rancher-demo.localdomain.local) Dinge funktionieren nicht richtig.

Ich habe das noch nie zuvor eingerichtet und bin mir daher nicht sicher, wie ich das Problem beheben kann. Ich habe Stunden damit verbracht, verschiedene Beiträge durchzuforsten, aber nichts, was ich gefunden habe, scheint zu diesem speziellen Problem zu passen.

Einige Dinge, die ich gefunden habe:

kubectl -n cattle-system logs -f rancher-64bdc898c7-56fpm
2024/01/17 21:13:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
(repeats every 15 seconds)

kubectl get ingress --all-namespaces
No resources found
(I *know* there was an ingress at some point, I believe in cattle-system; now it's gone. I didn't remove it.)

kubectl -n cattle-system describe service rancher
Name:              rancher
Namespace:         cattle-system
Labels:            app=rancher
                   app.kubernetes.io/managed-by=Helm
                   chart=rancher-2.7.9
                   heritage=Helm
                   release=rancher
Annotations:       meta.helm.sh/release-name: rancher
                   meta.helm.sh/release-namespace: cattle-system
Selector:          app=rancher
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.43.199.3
IPs:               10.43.199.3
Port:              http  80/TCP
TargetPort:        80/TCP
Endpoints:         10.42.0.26:80,10.42.1.22:80,10.42.1.23:80
Port:              https-internal  443/TCP
TargetPort:        444/TCP
Endpoints:         10.42.0.26:444,10.42.1.22:444,10.42.1.23:444
Session Affinity:  None
Events:            <none>

kubectl -n cattle-system logs -l app=rancher
2024/01/17 21:17:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:17:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:40 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:45.551484      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.646038      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] [updateClusterHealth] Failed to update cluster [local]: Internal error occurred: failed calling webhook "rancher.cattle.io.clusters.management.cattle.io": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/clusters.management.cattle.io?timeout=10s": context deadline exceeded
E0117 21:19:52.882877      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:53.061671      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:53 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.23/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.23:443: i/o timeout
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:37.826713      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:37.918579      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:37 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
E0117 21:19:45.604537      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.713901      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.22]: dial tcp 10.42.0.26:443: i/o timeout
E0117 21:19:52.899035      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:52.968048      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:52 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]

Ich bin sicher, dass ich etwas falsch gemacht habe, aber ich weiß nicht, was und wie ich das Problem weiter beheben kann.

verwandte Informationen