servidor de métricas CrashLoopBackOff en k8s v1.11.1

servidor de métricas CrashLoopBackOff en k8s v1.11.1

Usando kubeadmy flannelmás de 4 nodos ejecutándoseRHEL 7

Hice lo siguiente:

y luego subí CrashLoopBackOffal servidor de métricas

NAME                                          READY     STATUS             RESTARTS   AGE
coredns-78fcdf6894-4q7ct                      1/1       Running            10         7d
coredns-78fcdf6894-7tj52                      1/1       Running            10         7d
etcd-thalia0.ahc.umn.edu                      1/1       Running            0          7d
kube-apiserver-thalia0.ahc.umn.edu            1/1       Running            0          7d
kube-controller-manager-thalia0.ahc.umn.edu   1/1       Running            0          7d
kube-flannel-ds-amd64-78hbk                   1/1       Running            0          7d
kube-flannel-ds-amd64-gdttr                   1/1       Running            0          7d
kube-flannel-ds-amd64-rzhm2                   1/1       Running            0          7d
kube-flannel-ds-amd64-xc2n7                   1/1       Running            0          7d
kube-proxy-b86kn                              1/1       Running            0          7d
kube-proxy-g27sk                              1/1       Running            0          7d
kube-proxy-rtgtp                              1/1       Running            0          7d
kube-proxy-x2pp7                              1/1       Running            0          7d
kube-scheduler-thalia0.ahc.umn.edu            1/1       Running            0          7d
kubernetes-dashboard-7b7cb74c5c-wgt8f         1/1       Running            0          6d
metrics-server-85ff8f7b84-2x5th               0/1       CrashLoopBackOff   8          23m
  1. Corriókubectl -n kube-system logs $(kubectl get pods --namespace=kube-system -l k8s-app=metrics-server -o name)

y obtuve salida:

I0828 19:26:41.686932       1 heapster.go:71] /metrics-server --source=kubernetes:https://kubernetes.default
I0828 19:26:41.687023       1 heapster.go:72] Metrics Server version v0.2.1
I0828 19:26:41.687360       1 configs.go:61] Using Kubernetes client with master "https://kubernetes.default" and version
I0828 19:26:41.687388       1 configs.go:62] Using kubelet port 10255
E0828 19:27:01.692571       1 kubelet.go:331] Failed to load nodes: Get https://kubernetes.default/api/v1/nodes: dial tcp: lookup kubernetes.default on 10.96.0.10:53: read udp 10.244.2.4:34644->10.96.0.10:53: read: no route to host
I0828 19:27:01.692700       1 heapster.go:128] Starting with Metric Sink
I0828 19:27:02.500852       1 serving.go:308] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
W0828 19:27:04.381151       1 authentication.go:222] Unable to get configmap/extension-apiserver-authentication in kube-system.  Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
F0828 19:27:04.381187       1 heapster.go:97] Could not create the API server: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: getsockopt: no route to host

También miré los distintos registros y noté que en los flannelpods recibía una gran cantidad de estos errores:

E0829 19:41:32.636680       1 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:295: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?resourceVersion=0: net/http: TLS handshake timeout

Además, aparece este error en el pod del programador:

E0829 19:41:32.637368       1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:129: Failed to list *core.Service: Get https://134.84.53.162:6443/api/v1/services?limit=500&resourceVersion=0: net/http: TLS handshake timeout

EDITAR 1

Reconstruí el clúster después de desmontarlo y agregar una regla en el firewall local para permitir el puerto 443 (para tratar kubectl proxy).

La salida de kubectl get services --namespace=kube-systemes

NAME                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
kube-dns               ClusterIP   10.96.0.10     <none>        53/UDP,53/TCP   15h
kubernetes-dashboard   ClusterIP   10.98.72.170   <none>        443/TCP         20m
metrics-server         ClusterIP   10.111.155.9   <none>        443/TCP         1m

Además, es de destacar que, después del desmontaje y la reinicialización del clúster, ni los módulos de franela ni los del programador arrojan el error. Solo recibo el error en el pod del servidor de métricas, junto con este nuevo error en el pod del apiserver:

: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0830 20:43:38.101286       1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I0830 20:45:38.101548       1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E0830 20:45:38.101757       1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0830 20:45:38.101779       1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0830 20:45:44.532250       1 available_controller.go:311] v1beta1.metrics.k8s.io failed with: Get https://10.111.155.9:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0830 20:45:48.894505       1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E0830 20:45:48.894693       1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable

Además, profundizar en el error W0828 19:27:04.381151 1 authentication.go:222] Unable to get configmap/extension-apiserver-authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'

Corrí kubectl get roles -n kube-system extension-apiserver-authentication-reader -o yamly obtuve el siguiente resultado:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: 2018-08-30T00:58:35Z
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: extension-apiserver-authentication-reader
  namespace: kube-system
  resourceVersion: "132"
  selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/kube-system/roles/extension-apiserver-authentication-reader
  uid: d2f1c80c-abef-11e8-95cc-005056891f42
rules:
- apiGroups:
  - ""
  resourceNames:
  - extension-apiserver-authentication
  resources:
  - configmaps
  verbs:
  - get

Por último, la salida kubectl get apiservice v1beta1.metrics.k8s.io -o yamlde es

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  creationTimestamp: 2018-08-30T22:41:26Z
  name: v1beta1.metrics.k8s.io
  resourceVersion: "119754"
  selfLink: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  uid: d403e18f-aca5-11e8-95cc-005056891f42
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100
status:
  conditions:
  - lastTransitionTime: 2018-08-30T22:41:26Z
    message: endpoints for service/metrics-server in "kube-system" have no addresses
    reason: MissingEndpoints
    status: "False"
    type: Available

Esto parece un problema de red obvio (¿firewall?), pero no estoy seguro de cómo proceder. ¿Es esto flannelun corednsproblema de configuración?

Respuesta1

Cambié el CNI de flannela calicoy eso parece haber resuelto los problemas que tenía (tampoco pude hacer que el argo workflowcontrolador se iniciara en mi clúster).

información relacionada