К сожалению, у меня возникла проблема с моим сервером HP ProLiant DL380e G8, работающим под управлением Fedora Server 34. Я подозреваю, что это ошибки памяти или неисправность модуля DIMM, однако я в этом не уверен.
Обратная связь очень приветствуется!
Я запустил journalctl -r
, и он возвращает следующий вывод в ссылке PasteBin (фрагмент того, что выглядит необычно):https://pastebin.com/KPUZHceD
Любая помощь и идеи приветствуются!
С уважением
Редактировать: В ответ на комментарий @Michael Hampton: Вывод опубликован здесь:
<27>Sep 7 17:03:51 mcelog: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1304]: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1303]: <27>Sep 7 17:03:51 mcelog: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1303]: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1304]: <27>Sep 7 17:03:51 mcelog: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1304]: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1303]: <27>Sep 7 17:03:51 mcelog: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1303]: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 2 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 1 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 7
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 3 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 13 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 6
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 0 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 0 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 5
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: Running trigger `dimm-error-trigger' (reporter: memdb)
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 6 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 3 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 4
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID a SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801c00400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d2213fa689118800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 5 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 3
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 5 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801bd8400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d2213f0649118800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 14 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 2
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 1 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801bec400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d221196e09118800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 12 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 1
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 0 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c0107b4000010093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c0107b4000010093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: CPU 0 BANK 5
Sep 07 17:03:51 turbo mcelog[1067]: MCE 0
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: mcelog: mcelog read: Input/output error
Sep 07 17:03:51 turbo kernel: ERST: [Firmware Warn]: Firmware does not respond in time.
Sep 07 17:03:51 turbo kernel: mce: [Hardware Error]: Machine check events logged
Sep 07 17:03:51 turbo kernel: mce: [Hardware Error]: Machine check events logged
Sep 07 17:03:51 turbo kernel: mce_notify_irq: 6 callbacks suppressed
решение1
Эта проблема была исправлена путем извлечения двух неисправных планок оперативной памяти из сервера и переустановки процессора, поскольку он также не обеспечивал хорошего контакта.
Спасибо за помощь!