Lutando com uma nova instalação do Rancher HA RKE2, obtendo uma página “404 Not Found”. Como posso solucionar isso?

2024-6-23 • tag-icon

Lutando com uma nova instalação do Rancher HA RKE2, obtendo uma página “404 Not Found”. Como posso solucionar isso?

Nunca instalei o Rancher antes, mas estou tentando configurar um ambiente Rancher em um cluster HA RKE2 local. Eu tenho um F5 como balanceador de carga e ele está configurado para lidar com as portas 80, 443, 6443 e 9345. Um registro DNS chamado rancher-demo.localdomain.local aponta para o endereço IP do balanceador de carga. Desejo fornecer meus próprios arquivos de certificado e criei esse certificado por meio de nossa CA interna.

O cluster em si foi operacionalizado e funciona. Quando executei a instalação nos nós diferentes do primeiro, eles usaram o nome DNS que aponta para o IP do LB, então sei que parte do LB funciona.

kubectl get nodes

NAME                             STATUS   ROLES                       AGE   VERSION
rancher0001.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1
rancher0002.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1
rancher0003.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1

Antes de instalar o Rancher, executei os seguintes comandos:

kubectl create namespace cattle-system
kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=~/tls.crt --key=~/tls.key
kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem=~/cacerts.pem

Finalmente, instalei o Rancher:

helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=rancher-demo.localdomain.local --set bootstrapPassword=passwordgoeshere --set ingress.tls.source=secret --set privateCA=true

Não me lembro do erro, mas vi um erro de tempo limite logo após executar a instalação. Definitivamente fez *parte* da instalação:

kubectl -n cattle-system rollout status deploy/rancher
deployment "rancher" successfully rolled out

kubectl get ns
NAME                                     STATUS   AGE
cattle-fleet-clusters-system             Active   5h18m
cattle-fleet-system                      Active   5h24m
cattle-global-data                       Active   5h25m
cattle-global-nt                         Active   5h25m
cattle-impersonation-system              Active   5h24m
cattle-provisioning-capi-system          Active   5h6m
cattle-system                            Active   5h29m
cluster-fleet-local-local-1a3d67d0a899   Active   5h18m
default                                  Active   25h
fleet-default                            Active   5h25m
fleet-local                              Active   5h26m
kube-node-lease                          Active   25h
kube-public                              Active   25h
kube-system                              Active   25h
local                                    Active   5h25m
p-c94zp                                  Active   5h24m
p-m64sb                                  Active   5h24m

kubectl get pods --all-namespaces
NAMESPACE             NAME                                                      READY   STATUS    RESTARTS        AGE
cattle-fleet-system   fleet-controller-56968b86b6-6xdng                         1/1     Running   0               5h19m
cattle-fleet-system   gitjob-7d68454468-tvcrt                                   1/1     Running   0               5h19m
cattle-system         rancher-64bdc898c7-56fpm                                  1/1     Running   0               5h27m
cattle-system         rancher-64bdc898c7-dl4cz                                  1/1     Running   0               5h27m
cattle-system         rancher-64bdc898c7-z55lh                                  1/1     Running   1 (5h25m ago)   5h27m
cattle-system         rancher-webhook-58d68fb97d-zpg2p                          1/1     Running   0               5h17m
kube-system           cloud-controller-manager-rancher0001.localdomain.local    1/1     Running   1 (22h ago)     25h
kube-system           cloud-controller-manager-rancher0002.localdomain.local    1/1     Running   1 (22h ago)     25h
kube-system           cloud-controller-manager-rancher0003.localdomain.local    1/1     Running   1 (22h ago)     25h
kube-system           etcd-rancher0001.localdomain.local                        1/1     Running   0               25h
kube-system           etcd-rancher0002.localdomain.local                        1/1     Running   3 (22h ago)     25h
kube-system           etcd-rancher0003.localdomain.local                        1/1     Running   3 (22h ago)     25h
kube-system           kube-apiserver-rancher0001.localdomain.local              1/1     Running   0               25h
kube-system           kube-apiserver-rancher0002.localdomain.local              1/1     Running   0               25h
kube-system           kube-apiserver-rancher0003.localdomain.local              1/1     Running   0               25h
kube-system           kube-controller-manager-rancher0001.localdomain.local     1/1     Running   1 (22h ago)     25h
kube-system           kube-controller-manager-rancher0002.localdomain.local     1/1     Running   1 (22h ago)     25h
kube-system           kube-controller-manager-rancher0003.localdomain.local     1/1     Running   0               25h
kube-system           kube-proxy-rancher0001.localdomain.local                  1/1     Running   0               25h
kube-system           kube-proxy-rancher0002.localdomain.local                  1/1     Running   0               25h
kube-system           kube-proxy-rancher0003.localdomain.local                  1/1     Running   0               25h
kube-system           kube-scheduler-rancher0001.localdomain.local              1/1     Running   1 (22h ago)     25h
kube-system           kube-scheduler-rancher0002.localdomain.local              1/1     Running   0               25h
kube-system           kube-scheduler-rancher0003.localdomain.local              1/1     Running   0               25h
kube-system           rke2-canal-2jngw                                          2/2     Running   0               25h
kube-system           rke2-canal-6qrc4                                          2/2     Running   0               25h
kube-system           rke2-canal-bk2f8                                          2/2     Running   0               25h
kube-system           rke2-coredns-rke2-coredns-565dfc7d75-87pjr                1/1     Running   0               25h
kube-system           rke2-coredns-rke2-coredns-565dfc7d75-wh64f                1/1     Running   0               25h
kube-system           rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-mlcln     1/1     Running   0               25h
kube-system           rke2-ingress-nginx-controller-6p8ll                       1/1     Running   0               22h
kube-system           rke2-ingress-nginx-controller-7pm5c                       1/1     Running   0               5h22m
kube-system           rke2-ingress-nginx-controller-brfwh                       1/1     Running   0               22h
kube-system           rke2-metrics-server-c9c78bd66-f5vrb                       1/1     Running   0               25h
kube-system           rke2-snapshot-controller-6f7bbb497d-vqg9s                 1/1     Running   0               22h
kube-system           rke2-snapshot-validation-webhook-65b5675d5c-dt22h         1/1     Running   0               22h

No entanto, obviamente (dada a página 404 Not Found quando vou parahttps://rancher-demo.localdomain.local) as coisas não estão funcionando direito.

Eu nunca configurei isso antes, então não tenho certeza de como solucionar isso. Passei horas pesquisando várias postagens, mas nada que encontrei parece corresponder a esse problema específico.

Algumas coisas que descobri:

kubectl -n cattle-system logs -f rancher-64bdc898c7-56fpm
2024/01/17 21:13:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
(repeats every 15 seconds)

kubectl get ingress --all-namespaces
No resources found
(I *know* there was an ingress at some point, I believe in cattle-system; now it's gone. I didn't remove it.)

kubectl -n cattle-system describe service rancher
Name:              rancher
Namespace:         cattle-system
Labels:            app=rancher
                   app.kubernetes.io/managed-by=Helm
                   chart=rancher-2.7.9
                   heritage=Helm
                   release=rancher
Annotations:       meta.helm.sh/release-name: rancher
                   meta.helm.sh/release-namespace: cattle-system
Selector:          app=rancher
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.43.199.3
IPs:               10.43.199.3
Port:              http  80/TCP
TargetPort:        80/TCP
Endpoints:         10.42.0.26:80,10.42.1.22:80,10.42.1.23:80
Port:              https-internal  443/TCP
TargetPort:        444/TCP
Endpoints:         10.42.0.26:444,10.42.1.22:444,10.42.1.23:444
Session Affinity:  None
Events:            <none>

kubectl -n cattle-system logs -l app=rancher
2024/01/17 21:17:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:17:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:40 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:45.551484      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.646038      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] [updateClusterHealth] Failed to update cluster [local]: Internal error occurred: failed calling webhook "rancher.cattle.io.clusters.management.cattle.io": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/clusters.management.cattle.io?timeout=10s": context deadline exceeded
E0117 21:19:52.882877      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:53.061671      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:53 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.23/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.23:443: i/o timeout
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:37.826713      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:37.918579      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:37 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
E0117 21:19:45.604537      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.713901      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.22]: dial tcp 10.42.0.26:443: i/o timeout
E0117 21:19:52.899035      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:52.968048      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:52 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]

Tenho certeza de que fiz algo errado, mas não sei o que e não sei como solucionar isso ainda mais.

informação relacionada