
Nunca instalei o Rancher antes, mas estou tentando configurar um ambiente Rancher em um cluster HA RKE2 local. Eu tenho um F5 como balanceador de carga e ele está configurado para lidar com as portas 80, 443, 6443 e 9345. Um registro DNS chamado rancher-demo.localdomain.local aponta para o endereço IP do balanceador de carga. Desejo fornecer meus próprios arquivos de certificado e criei esse certificado por meio de nossa CA interna.
O cluster em si foi operacionalizado e funciona. Quando executei a instalação nos nós diferentes do primeiro, eles usaram o nome DNS que aponta para o IP do LB, então sei que parte do LB funciona.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
rancher0001.localdomain.local Ready control-plane,etcd,master 25h v1.26.12+rke2r1
rancher0002.localdomain.local Ready control-plane,etcd,master 25h v1.26.12+rke2r1
rancher0003.localdomain.local Ready control-plane,etcd,master 25h v1.26.12+rke2r1
Antes de instalar o Rancher, executei os seguintes comandos:
kubectl create namespace cattle-system
kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=~/tls.crt --key=~/tls.key
kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem=~/cacerts.pem
Finalmente, instalei o Rancher:
helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=rancher-demo.localdomain.local --set bootstrapPassword=passwordgoeshere --set ingress.tls.source=secret --set privateCA=true
Não me lembro do erro, mas vi um erro de tempo limite logo após executar a instalação. Definitivamente fez *parte* da instalação:
kubectl -n cattle-system rollout status deploy/rancher
deployment "rancher" successfully rolled out
kubectl get ns
NAME STATUS AGE
cattle-fleet-clusters-system Active 5h18m
cattle-fleet-system Active 5h24m
cattle-global-data Active 5h25m
cattle-global-nt Active 5h25m
cattle-impersonation-system Active 5h24m
cattle-provisioning-capi-system Active 5h6m
cattle-system Active 5h29m
cluster-fleet-local-local-1a3d67d0a899 Active 5h18m
default Active 25h
fleet-default Active 5h25m
fleet-local Active 5h26m
kube-node-lease Active 25h
kube-public Active 25h
kube-system Active 25h
local Active 5h25m
p-c94zp Active 5h24m
p-m64sb Active 5h24m
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-fleet-system fleet-controller-56968b86b6-6xdng 1/1 Running 0 5h19m
cattle-fleet-system gitjob-7d68454468-tvcrt 1/1 Running 0 5h19m
cattle-system rancher-64bdc898c7-56fpm 1/1 Running 0 5h27m
cattle-system rancher-64bdc898c7-dl4cz 1/1 Running 0 5h27m
cattle-system rancher-64bdc898c7-z55lh 1/1 Running 1 (5h25m ago) 5h27m
cattle-system rancher-webhook-58d68fb97d-zpg2p 1/1 Running 0 5h17m
kube-system cloud-controller-manager-rancher0001.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system cloud-controller-manager-rancher0002.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system cloud-controller-manager-rancher0003.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system etcd-rancher0001.localdomain.local 1/1 Running 0 25h
kube-system etcd-rancher0002.localdomain.local 1/1 Running 3 (22h ago) 25h
kube-system etcd-rancher0003.localdomain.local 1/1 Running 3 (22h ago) 25h
kube-system kube-apiserver-rancher0001.localdomain.local 1/1 Running 0 25h
kube-system kube-apiserver-rancher0002.localdomain.local 1/1 Running 0 25h
kube-system kube-apiserver-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system kube-controller-manager-rancher0001.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system kube-controller-manager-rancher0002.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system kube-controller-manager-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system kube-proxy-rancher0001.localdomain.local 1/1 Running 0 25h
kube-system kube-proxy-rancher0002.localdomain.local 1/1 Running 0 25h
kube-system kube-proxy-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system kube-scheduler-rancher0001.localdomain.local 1/1 Running 1 (22h ago) 25h
kube-system kube-scheduler-rancher0002.localdomain.local 1/1 Running 0 25h
kube-system kube-scheduler-rancher0003.localdomain.local 1/1 Running 0 25h
kube-system rke2-canal-2jngw 2/2 Running 0 25h
kube-system rke2-canal-6qrc4 2/2 Running 0 25h
kube-system rke2-canal-bk2f8 2/2 Running 0 25h
kube-system rke2-coredns-rke2-coredns-565dfc7d75-87pjr 1/1 Running 0 25h
kube-system rke2-coredns-rke2-coredns-565dfc7d75-wh64f 1/1 Running 0 25h
kube-system rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-mlcln 1/1 Running 0 25h
kube-system rke2-ingress-nginx-controller-6p8ll 1/1 Running 0 22h
kube-system rke2-ingress-nginx-controller-7pm5c 1/1 Running 0 5h22m
kube-system rke2-ingress-nginx-controller-brfwh 1/1 Running 0 22h
kube-system rke2-metrics-server-c9c78bd66-f5vrb 1/1 Running 0 25h
kube-system rke2-snapshot-controller-6f7bbb497d-vqg9s 1/1 Running 0 22h
kube-system rke2-snapshot-validation-webhook-65b5675d5c-dt22h 1/1 Running 0 22h
No entanto, obviamente (dada a página 404 Not Found quando vou parahttps://rancher-demo.localdomain.local) as coisas não estão funcionando direito.
Eu nunca configurei isso antes, então não tenho certeza de como solucionar isso. Passei horas pesquisando várias postagens, mas nada que encontrei parece corresponder a esse problema específico.
Algumas coisas que descobri:
kubectl -n cattle-system logs -f rancher-64bdc898c7-56fpm
2024/01/17 21:13:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:13:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
(repeats every 15 seconds)
kubectl get ingress --all-namespaces
No resources found
(I *know* there was an ingress at some point, I believe in cattle-system; now it's gone. I didn't remove it.)
kubectl -n cattle-system describe service rancher
Name: rancher
Namespace: cattle-system
Labels: app=rancher
app.kubernetes.io/managed-by=Helm
chart=rancher-2.7.9
heritage=Helm
release=rancher
Annotations: meta.helm.sh/release-name: rancher
meta.helm.sh/release-namespace: cattle-system
Selector: app=rancher
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.43.199.3
IPs: 10.43.199.3
Port: http 80/TCP
TargetPort: 80/TCP
Endpoints: 10.42.0.26:80,10.42.1.22:80,10.42.1.23:80
Port: https-internal 443/TCP
TargetPort: 444/TCP
Endpoints: 10.42.0.26:444,10.42.1.22:444,10.42.1.23:444
Session Affinity: None
Events: <none>
kubectl -n cattle-system logs -l app=rancher
2024/01/17 21:17:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:17:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:18:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout
2024/01/17 21:19:40 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:45.551484 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.646038 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] [updateClusterHealth] Failed to update cluster [local]: Internal error occurred: failed calling webhook "rancher.cattle.io.clusters.management.cattle.io": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/clusters.management.cattle.io?timeout=10s": context deadline exceeded
E0117 21:19:52.882877 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:53.061671 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:53 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.23/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.23:443: i/o timeout
2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout
E0117 21:19:37.826713 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:37.918579 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:37 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
E0117 21:19:45.604537 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:45.713901 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
2024/01/17 21:19:49 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.22]: dial tcp 10.42.0.26:443: i/o timeout
E0117 21:19:52.899035 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0117 21:19:52.968048 34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2024/01/17 21:19:52 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
Tenho certeza de que fiz algo errado, mas não sei o que e não sei como solucionar isso ainda mais.