Desafortunadamente, tengo un problema con mi servidor HP ProLiant DL380e G8 que ejecuta Fedora Server 34. Sospecho que se trata de errores de memoria o que un DIMM está funcionando mal, sin embargo, no estoy seguro.
¡Los comentarios son bienvenidos!
Ejecuté journalctl -r
, que devuelve el siguiente resultado en el enlace PasteBin (un fragmento de lo que parece fuera de lo común):https://pastebin.com/KPUZHceD
¡Se agradece toda ayuda e ideas!
Atentamente
Editar: en respuesta al comentario de @Michael Hampton: El resultado publicado aquí:
<27>Sep 7 17:03:51 mcelog: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1304]: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1303]: <27>Sep 7 17:03:51 mcelog: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1303]: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1304]: <27>Sep 7 17:03:51 mcelog: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1304]: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1303]: <27>Sep 7 17:03:51 mcelog: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1303]: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 2 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 1 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 7
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 3 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 13 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 6
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 0 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 0 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 5
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: Running trigger `dimm-error-trigger' (reporter: memdb)
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 6 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 3 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 4
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID a SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801c00400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d2213fa689118800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 5 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 3
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 5 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801bd8400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d2213f0649118800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 14 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 2
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 1 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801bec400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d221196e09118800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 12 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 1
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 0 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c0107b4000010093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c0107b4000010093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: CPU 0 BANK 5
Sep 07 17:03:51 turbo mcelog[1067]: MCE 0
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: mcelog: mcelog read: Input/output error
Sep 07 17:03:51 turbo kernel: ERST: [Firmware Warn]: Firmware does not respond in time.
Sep 07 17:03:51 turbo kernel: mce: [Hardware Error]: Machine check events logged
Sep 07 17:03:51 turbo kernel: mce: [Hardware Error]: Machine check events logged
Sep 07 17:03:51 turbo kernel: mce_notify_irq: 6 callbacks suppressed
Respuesta1
Esta publicación se solucionó quitando 2 unidades de RAM defectuosas del servidor y volviendo a colocar la CPU, ya que tampoco hacía un buen contacto.
¡Gracias por toda la ayuda!