PROXMOX se reinicia periódicamente

PROXMOX se reinicia periódicamente

Tengo un cluster Proxmox con dos servidores físicos, pve y pve2. Son Dell R710 idénticos con 96 GB de memoria y 1 TB (RAID-10). Por alguna razón que aún tengo que identificar, pve2 se reiniciará. Revisé los registros de HW a través de iDRAC y no hay alarmas ni errores.

Soy bastante nuevo en Proxmox, por lo que no sé dónde buscar registros de errores fuera de los lugares habituales de Linux comoregistro del sistemaydmesg.

Aquí hay un fragmento de mi syslog cuando ocurrió el reinicio: (@ 30 de diciembre 16:54:01)

Dec 30 16:50:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:50:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:50:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:51:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:51:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:51:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:52:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:52:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:52:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:53:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:53:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:53:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:54:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:54:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:54:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:57:42 pve2 dmeventd[492]: dmeventd ready for processing.
Dec 30 16:57:42 pve2 kernel: [    0.000000] Linux version 5.4.73-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.73-1 (Mon, 16 Nov 2020 10:52:16 +0100) ()
Dec 30 16:57:42 pve2 kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.73-1-pve root=/dev/mapper/pve-root ro quiet
Dec 30 16:57:42 pve2 kernel: [    0.000000] KERNEL supported cpus:
Dec 30 16:57:42 pve2 systemd-modules-load[483]: Inserted module 'iscsi_tcp'
Dec 30 16:57:42 pve2 kernel: [    0.000000]   Intel GenuineIntel
Dec 30 16:57:42 pve2 kernel: [    0.000000]   AMD AuthenticAMD
Dec 30 16:57:42 pve2 kernel: [    0.000000]   Hygon HygonGenuine
Dec 30 16:57:42 pve2 kernel: [    0.000000]   Centaur CentaurHauls
Dec 30 16:57:42 pve2 kernel: [    0.000000]   zhaoxin   Shanghai
Dec 30 16:57:42 pve2 systemd[1]: Starting Flush Journal to Persistent Storage...
Dec 30 16:57:42 pve2 kernel: [    0.000000] x86/fpu: x87 FPU will use FXSAVE
Dec 30 16:57:42 pve2 kernel: [    0.000000] BIOS-provided physical RAM map:
Dec 30 16:57:42 pve2 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
Dec 30 16:57:42 pve2 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bf378fff] usable
Dec 30 16:57:42 pve2 kernel: [    0.000000] BIOS-e820: [mem 0x00000000bf379000-0x00000000bf38efff] reserved
Dec 30 16:57:42 pve2 systemd[1]: Started udev Coldplug all Devices.
Dec 30 16:57:42 pve2 kernel: [    0.000000] BIOS-e820: [mem 0x00000000bf38f000-0x00000000bf3cdfff] ACPI data
Dec 30 16:57:42 pve2 kernel: [    0.000000] BIOS-e820: [mem 0x00000000bf3ce000-0x00000000bfffffff] reserved
Dec 30 16:57:42 pve2 kernel: [    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
Dec 30 16:57:42 pve2 systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Dec 30 16:57:42 pve2 kernel: [    0.000000] BIOS-e820: [mem 0x00000000fe000000-0x00000000ffffffff] reserved
Dec 30 16:57:42 pve2 kernel: [    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000183fffffff] usable
Dec 30 16:57:42 pve2 kernel: [    0.000000] NX (Execute Disable) protection: active
Dec 30 16:57:42 pve2 kernel: [    0.000000] SMBIOS 2.6 present.
Dec 30 16:57:42 pve2 kernel: [    0.000000] DMI: Dell Inc. PowerEdge R710/0Y7JM4, BIOS 6.3.0 07/24/2012

¿Alguna sugerencia sobre lo que puede ser o dónde puedo comprobarlo?

Respuesta1

Puede ser un fallo de hardware, el más común es una RAM defectuosa.

Ejecute una prueba de memoria en su servidor para averiguarlo.

información relacionada