Tengo problemas con una nueva instalación de Rancher HA RKE2 y aparece la página "404 no encontrado". ¿Cómo puedo solucionar este problema?

2024-6-23 • tag-icon

Tengo problemas con una nueva instalación de Rancher HA RKE2 y aparece la página "404 no encontrado". ¿Cómo puedo solucionar este problema?

Nunca antes había instalado Rancher, pero estoy intentando configurar un entorno Rancher en un clúster HA RKE2 local. Tengo un F5 como equilibrador de carga y está configurado para manejar los puertos 80, 443, 6443 y 9345. Un registro DNS llamado rancher-demo.localdomain.local apunta a la dirección IP del equilibrador de carga. Quiero proporcionar mis propios archivos de certificado y he creado dicho certificado a través de nuestra CA interna.

El grupo en sí se puso en funcionamiento y funciona. Cuando ejecuté la instalación en nodos distintos al primero, usaron el nombre DNS que apunta a la IP del LB, así que sé que parte del LB funciona.

kubectl get nodes

NAME                             STATUS   ROLES                       AGE   VERSION
rancher0001.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1
rancher0002.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1
rancher0003.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1

Antes de instalar Rancher, ejecuté los siguientes comandos:

kubectl create namespace cattle-system
kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=~/tls.crt --key=~/tls.key
kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem=~/cacerts.pem

Finalmente, instalé Rancher:

helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=rancher-demo.localdomain.local --set bootstrapPassword=passwordgoeshere --set ingress.tls.source=secret --set privateCA=true

No recuerdo el error, pero vi un error de tiempo de espera poco después de ejecutar la instalación. Definitivamente hizo *parte* de la instalación:

kubectl -n cattle-system rollout status deploy/rancher
deployment "rancher" successfully rolled out

kubectl get ns
NAME                                     STATUS   AGE
cattle-fleet-clusters-system             Active   5h18m
cattle-fleet-system                      Active   5h24m
cattle-global-data                       Active   5h25m
cattle-global-nt                         Active   5h25m
cattle-impersonation-system              Active   5h24m
cattle-provisioning-capi-system          Active   5h6m
cattle-system                            Active   5h29m
cluster-fleet-local-local-1a3d67d0a899   Active   5h18m
default                                  Active   25h
fleet-default                            Active   5h25m
fleet-local                              Active   5h26m
kube-node-lease                          Active   25h
kube-public                              Active   25h
kube-system                              Active   25h
local                                    Active   5h25m
p-c94zp                                  Active   5h24m
p-m64sb                                  Active   5h24m

kubectl get pods --all-namespaces
NAMESPACE             NAME                                                      READY   STATUS    RESTARTS        AGE
cattle-fleet-system   fleet-controller-56968b86b6-6xdng                         1/1     Running   0               5h19m
cattle-fleet-system   gitjob-7d68454468-tvcrt                                   1/1     Running   0               5h19m
cattle-system         rancher-64bdc898c7-56fpm                                  1/1     Running   0               5h27m
cattle-system         rancher-64bdc898c7-dl4cz                                  1/1     Running   0               5h27m
cattle-system         rancher-64bdc898c7-z55lh                                  1/1     Running   1 (5h25m ago)   5h27m
cattle-system         rancher-webhook-58d68fb97d-zpg2p                          1/1     Running   0               5h17m
kube-system           cloud-controller-manager-rancher0001.localdomain.local    1/1     Running   1 (22h ago)     25h
kube-system           cloud-controller-manager-rancher0002.localdomain.local    1/1     Running   1 (22h ago)     25h
kube-system           cloud-controller-manager-rancher0003.localdomain.local    1/1     Running   1 (22h ago)     25h
kube-system           etcd-rancher0001.localdomain.local                        1/1     Running   0               25h
kube-system           etcd-rancher0002.localdomain.local                        1/1     Running   3 (22h ago)     25h
kube-system           etcd-rancher0003.localdomain.local                        1/1     Running   3 (22h ago)     25h
kube-system           kube-apiserver-rancher0001.localdomain.local              1/1     Running   0               25h
kube-system           kube-apiserver-rancher0002.localdomain.local              1/1     Running   0               25h
kube-system           kube-apiserver-rancher0003.localdomain.local              1/1     Running   0               25h
kube-system           kube-controller-manager-rancher0001.localdomain.local     1/1     Running   1 (22h ago)     25h
kube-system           kube-controller-manager-rancher0002.localdomain.local     1/1     Running   1 (22h ago)     25h
kube-system           kube-controller-manager-rancher0003.localdomain.local     1/1     Running   0               25h
kube-system           kube-proxy-rancher0001.localdomain.local                  1/1     Running   0               25h
kube-system           kube-proxy-rancher0002.localdomain.local                  1/1     Running   0               25h
kube-system           kube-proxy-rancher0003.localdomain.local                  1/1     Running   0               25h
kube-system           kube-scheduler-rancher0001.localdomain.local              1/1     Running   1 (22h ago)     25h
kube-system           kube-scheduler-rancher0002.localdomain.local              1/1     Running   0               25h
kube-system           kube-scheduler-rancher0003.localdomain.local              1/1     Running   0               25h
kube-system           rke2-canal-2jngw                                          2/2     Running   0               25h
kube-system           rke2-canal-6qrc4                                          2/2     Running   0               25h
kube-system           rke2-canal-bk2f8                                          2/2     Running   0               25h
kube-system           rke2-coredns-rke2-coredns-565dfc7d75-87pjr                1/1     Running   0               25h
kube-system           rke2-coredns-rke2-coredns-565dfc7d75-wh64f                1/1     Running   0               25h
kube-system           rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-mlcln     1/1     Running   0               25h
kube-system           rke2-ingress-nginx-controller-6p8ll                       1/1     Running   0               22h
kube-system           rke2-ingress-nginx-controller-7pm5c                       1/1     Running   0               5h22m
kube-system           rke2-ingress-nginx-controller-brfwh                       1/1     Running   0               22h
kube-system           rke2-metrics-server-c9c78bd66-f5vrb                       1/1     Running   0               25h
kube-system           rke2-snapshot-controller-6f7bbb497d-vqg9s                 1/1     Running   0               22h
kube-system           rke2-snapshot-validation-webhook-65b5675d5c-dt22h         1/1     Running   0               22h

Sin embargo, obviamente (dada la página 404 No encontrado cuando voy ahttps://rancher-demo.localdomain.local) las cosas no funcionan bien.

Nunca he configurado esto antes, así que no estoy seguro de cómo solucionarlo. Pasé horas revisando varias publicaciones, pero nada de lo que encontré parece coincidir con este problema en particular.

Algunas cosas que he encontrado:

kubectl -n cattle-system logs -f rancher-64bdc898c7-56fpm
2024/01/17 21:13:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
(repeats every 15 seconds)

kubectl get ingress --all-namespaces
No resources found
(I *know* there was an ingress at some point, I believe in cattle-system; now it's gone. I didn't remove it.)

kubectl -n cattle-system describe service rancher
Name:              rancher
Namespace:         cattle-system
Labels:            app=rancher
                   app.kubernetes.io/managed-by=Helm
                   chart=rancher-2.7.9
                   heritage=Helm
                   release=rancher
Annotations:       meta.helm.sh/release-name: rancher
                   meta.helm.sh/release-namespace: cattle-system
Selector:          app=rancher
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.43.199.3
IPs:               10.43.199.3
Port:              http  80/TCP
TargetPort:        80/TCP
Endpoints:         10.42.0.26:80,10.42.1.22:80,10.42.1.23:80
Port:              https-internal  443/TCP
TargetPort:        444/TCP
Endpoints:         10.42.0.26:444,10.42.1.22:444,10.42.1.23:444
Session Affinity:  None
Events:            <none>

kubectl -n cattle-system logs -l app=rancher
2024/01/17 21:17:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:17:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:40 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:45.551484      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.646038      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] [updateClusterHealth] Failed to update cluster [local]: Internal error occurred: failed calling webhook "rancher.cattle.io.clusters.management.cattle.io": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/clusters.management.cattle.io?timeout=10s": context deadline exceeded
E0117 21:19:52.882877      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:53.061671      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:53 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.23/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.23:443: i/o timeout
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:37.826713      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:37.918579      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:37 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
E0117 21:19:45.604537      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.713901      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.22]: dial tcp 10.42.0.26:443: i/o timeout
E0117 21:19:52.899035      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:52.968048      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:52 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]

Estoy seguro de que hice algo mal, pero no sé qué ni cómo solucionarlo.

información relacionada