Kubernetes falló en AttachVolume y FailedMount

Kubernetes falló en AttachVolume y FailedMount

Tengo un clúster de Kubernetes en la nube de OVH. Hoy, nginx respondió con un error 503 de repente al llamar al sitio web. Luego verifiqué el clúster de Kubernetes kubectl get podsy pude ver que todos los pods asociados con un volumen en particular ya no estaban listos. Todos los pods muestran errores FailedAttachVolume y FailedMount en los eventos. Como ejemplo, el registro de eventos de un pod:

      Warning  FailedAttachVolume  15m                  attachdetach-controller  AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-b1820e9f-935b-442e-b68e-efe7de0feb35)"}}
      Warning  FailedAttachVolume  12m (x2 over 14m)    attachdetach-controller  AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-c6e51d31-8646-44a2-ba75-7069e3ed87fa)"}}
      Warning  FailedAttachVolume  10m                  attachdetach-controller  AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-a5c9c89e-5578-4b6c-8722-acf583dea1a8)"}}
      Warning  FailedMount         3m18s (x4 over 12m)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[files-volume], unattached volumes=[kube-api-access-q9pjc files-volume]: timed out waiting for the condition
      Warning  FailedMount         63s (x3 over 14m)    kubelet                  Unable to attach or mount volumes: unmounted volumes=[files-volume], unattached volumes=[files-volume kube-api-access-q9pjc]: timed out waiting for the condition
      Warning  FailedAttachVolume  11s (x5 over 8m21s)  attachdetach-controller  (combined from similar events): AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-39554882-3f5c-40e9-aad2-482ff427c632)"}}

El pvc se integra en todos los despliegues de la siguiente manera:

spec:
  ...
  template:
    spec:
      ...
      containers:
      - name: ...
        ...
        volumeMounts:
        - name: files-volume
          mountPath: /files
        ...
      volumes:
        - name: files-volume
          persistentVolumeClaim:
            claimName: pv-files-claim

El pvc queda así:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pv-files-claim
spec:
  storageClassName: csi-cinder-high-speed
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

¿Cómo puede ocurrir este error y cómo lo soluciono? ¿Y cómo puedo evitar el error en el futuro?

Mientras tanto, todos los pods excepto uno se han vuelto a conectar al volumen por sí solos. Sin embargo, no parece funcionar con un solo módulo.

información relacionada