Tengo un clúster de Kubernetes en la nube de OVH. Hoy, nginx respondió con un error 503 de repente al llamar al sitio web. Luego verifiqué el clúster de Kubernetes kubectl get pods
y pude ver que todos los pods asociados con un volumen en particular ya no estaban listos. Todos los pods muestran errores FailedAttachVolume y FailedMount en los eventos. Como ejemplo, el registro de eventos de un pod:
Warning FailedAttachVolume 15m attachdetach-controller AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-b1820e9f-935b-442e-b68e-efe7de0feb35)"}}
Warning FailedAttachVolume 12m (x2 over 14m) attachdetach-controller AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-c6e51d31-8646-44a2-ba75-7069e3ed87fa)"}}
Warning FailedAttachVolume 10m attachdetach-controller AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-a5c9c89e-5578-4b6c-8722-acf583dea1a8)"}}
Warning FailedMount 3m18s (x4 over 12m) kubelet Unable to attach or mount volumes: unmounted volumes=[files-volume], unattached volumes=[kube-api-access-q9pjc files-volume]: timed out waiting for the condition
Warning FailedMount 63s (x3 over 14m) kubelet Unable to attach or mount volumes: unmounted volumes=[files-volume], unattached volumes=[files-volume kube-api-access-q9pjc]: timed out waiting for the condition
Warning FailedAttachVolume 11s (x5 over 8m21s) attachdetach-controller (combined from similar events): AttachVolume.Attach failed for volume "example-managed-kubernetes-mrx2n8-pvc-ca435065-1111-aaaa-0123-543465516bb2" : rpc error: code = Internal desc = [ControllerPublishVolume] Attach Volume failed with error failed to attach 0655d400-3333-2222-1111-6fbcc2b62f94 volume to 5d51a18b-abcd-w2re-wfe2-a94d2b4ca988 compute: Bad request with: [POST https://compute.de1.cloud.example.net/v2.1/e2b2680af21e4q9n3e8hfoc39rpgowpd/servers/5d51a18b-abcd-w2re-wfe2-a94d2b4ca988/os-volume_attachments], error message: {"badRequest": {"code": 400, "message": "Invalid input received: Invalid volume: Volume 0655d400-3333-2222-1111-6fbcc2b62f94 status must be available or downloading to reserve, but the current status is in-use. (HTTP 400) (Request-ID: req-39554882-3f5c-40e9-aad2-482ff427c632)"}}
El pvc se integra en todos los despliegues de la siguiente manera:
spec:
...
template:
spec:
...
containers:
- name: ...
...
volumeMounts:
- name: files-volume
mountPath: /files
...
volumes:
- name: files-volume
persistentVolumeClaim:
claimName: pv-files-claim
El pvc queda así:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-files-claim
spec:
storageClassName: csi-cinder-high-speed
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
¿Cómo puede ocurrir este error y cómo lo soluciono? ¿Y cómo puedo evitar el error en el futuro?
Mientras tanto, todos los pods excepto uno se han vuelto a conectar al volumen por sí solos. Sin embargo, no parece funcionar con un solo módulo.